From hashcollision at gmail.com Wed Aug 1 06:55:43 2007 From: hashcollision at gmail.com (hashcollision) Date: Wed, 1 Aug 2007 00:55:43 -0400 Subject: [Python-3000] renaming suggestion Message-ID: <37f76d50707312155h2bc297eci5591c777c8331f2d@mail.gmail.com> I think that WeakKeyDictionary and should be renamed to WeakKeyDict (same with WeakValueDictionary). This will make it consistent with dict and collections.defaultdict. Sincerely -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070801/9b2e42f1/attachment.htm From fdrake at acm.org Wed Aug 1 13:54:56 2007 From: fdrake at acm.org (Fred Drake) Date: Wed, 1 Aug 2007 07:54:56 -0400 Subject: [Python-3000] renaming suggestion In-Reply-To: <37f76d50707312155h2bc297eci5591c777c8331f2d@mail.gmail.com> References: <37f76d50707312155h2bc297eci5591c777c8331f2d@mail.gmail.com> Message-ID: <60A87254-7A7B-4D5E-B219-02D5DE30846D@acm.org> On Aug 1, 2007, at 12:55 AM, hashcollision wrote: > I think that WeakKeyDictionary and should be renamed to WeakKeyDict > (same with WeakValueDictionary). This will make it consistent with > dict and collections.defaultdict. Hmm. I'm not opposed to providing new names for these classes if that really helps, though I'm not convinced that it does. The old names should be preserved for backward compatibility. If we're looking for some sort of consistency, it seems that having both CamelCase and righteouscase doesn't help. There's precedent for both, but the general trend seems to be toward CamelCase for classes that aren't built-in. (The use of a C implementation for defaultdict isn't a consideration, IMO.) Would you consider weakkeydict and weakvaluedict better than WeakKeyDict and WeakValueDict? If not, I suspect that consistency isn't the underlying motivation. -Fred -- Fred Drake From greg.ewing at canterbury.ac.nz Thu Aug 2 02:31:40 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 02 Aug 2007 12:31:40 +1200 Subject: [Python-3000] renaming suggestion In-Reply-To: <60A87254-7A7B-4D5E-B219-02D5DE30846D@acm.org> References: <37f76d50707312155h2bc297eci5591c777c8331f2d@mail.gmail.com> <60A87254-7A7B-4D5E-B219-02D5DE30846D@acm.org> Message-ID: <46B125EC.6070706@canterbury.ac.nz> Fred Drake wrote: > Would you consider weakkeydict and weakvaluedict better than > WeakKeyDict and WeakValueDict? I like WeakKeyDict and WeakValDict because the existing names seem excessively long-winded, given that 'dict' or 'Dict' is a well-established abbreviation. But I actually think that at least some of the weakref stuff should be builtin, seeing as support for it is wired directly into the core implementation. In that case, all-lowercase names would be more appropriate. -- Greg From talin at acm.org Thu Aug 2 04:01:02 2007 From: talin at acm.org (Talin) Date: Wed, 01 Aug 2007 19:01:02 -0700 Subject: [Python-3000] More PEP 3101 changes incoming Message-ID: <46B13ADE.7080901@acm.org> I had a long discussion with Guido today, where he pointed out numerous flaws and inconsistencies in my PEP that I had overlooked. I won't go into all of the details of what he found, but I'd like to focus on what came out of the discussion. I'm going to be updating the PEP to incorporate the latest thinking, but I thought I would post it on Py3K first to see what people think. The first change is to divide the conversion specifiers into two parts, which we will call "alignment specifiers" and "format specifiers". So the new syntax for a format field will be: valueSpec [,alignmentSpec] [:formatSpec] In other words, alignmentSpec is introduced by a comma, and conversion spec is introduced by a colon. This use of comma and colon is taken directly from .Net. although our alignment and conversion specifiers themselves look nothing like the ones in .Net. Alignment specifiers now includes the former 'fill', 'align' and 'width' properties. So for example, to indicate a field width of 8: "Property count {0,8}".format(propertyCount) The 'formatSpec' now includes the former 'sign' and 'type' parameters: "Number of errors: {0:+d}".format(errCount) In the preceding example, this would indicate an integer field preceded by a sign for both positive and negative numbers. There are still some things to be worked out. For example, there are currently 3 different meanings of 'width': Minimum width, maximum width, and number of digits of decimal precision. The previous version of the PEP followed the 2.x convention, which was 'n.n' - 'min.prec' for floats, and 'min.max' for everything else. However, that seems confusing. (I'm actually still working out the details - and in fact a little bit of a bikeshed discussion would be welcome at this point, as I could use some help ironing out these kinds of little inconsistencies.) In general, you can think of the difference between format specifier and alignment specifier as: Format Specifier: Controls how the value is converted to a string. Alignment Specifier: Controls how the string is placed on the line. Another change in the behavior is that the __format__ special method can only be used to override the format specifier - it can't be used to override the alignment specifier. The reason is simple: __format__ is used to control how your object is string-ified. It shouldn't get involved in things like left/right alignment or field width, which are really properties of the field, not the object being printed. The __format__ special method can basically completely change how the format specifier is interpreted. So for example for Date objects you can have a format specifier that looks like the input to strftime(). However, there are times when you want to override the __format__ hook. The primary use case is the 'r' conversion specifier, which is used to get the repr() of an object. At the moment I'm leaning towards using the exclamation mark ('!') to indicate this, in a way that's analogous to the CSS "! important" flag - it basically means "No, I really mean it!" Two possible syntax alternatives are: "The repr is {0!r}".format(obj) "The repr is {0:r!}".format(obj) In the first option, we use '!' in place of the colon. In the second case, we use '!' as a suffix. Another change suggested by Guido is explicit support for the Decimal type. Under the current proposal, a format specifier of 'f' will cause the Decimal object to be coerced to float before printing. That's not what we want, because it will cause a loss of precision. Instead, the rule should be that Decimal can use all of the same formatting types as float, but it won't try to convert the Decimal to float as an intermediate step. Here's some pseudo code outlining how the new formatting algorithm for fields will work: def format_field(value, alignmentSpec, formatSpec): if value has a __format__ attribute, and no '!' flag: s = value.__format__(value, formatSpec) else: if the formatSpec is 'r': s = repr(value) else if the formatSpec is 'd' or one of the integer types: # Coerce to int s = formatInteger(int(value), formatSpec) else if the formatSpec is 'f' or one of the float types: if value is a Decimal: s = formatDecimal(value, formatSpec) else: # Coerce to float s = formatFloat(float(value), formatSpec) else: s = str(value) # Now that we have 's', apply the alignment options return applyAlignment(s, alignmentSpec) My goal is that some time in the next several weeks I would like to get working a C implementation of just this function. Most of the complexity of the PEP implementation is right here IMHO. Before I edit the PEP I'm going to let this marinate for a week and see what the discussion brings up. -- Talin From rrr at ronadam.com Thu Aug 2 06:58:40 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 01 Aug 2007 23:58:40 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B13ADE.7080901@acm.org> References: <46B13ADE.7080901@acm.org> Message-ID: <46B16480.7050502@ronadam.com> Talin wrote: > I had a long discussion with Guido today, where he pointed out numerous > flaws and inconsistencies in my PEP that I had overlooked. I won't go > into all of the details of what he found, but I'd like to focus on what > came out of the discussion. I'm going to be updating the PEP to > incorporate the latest thinking, but I thought I would post it on Py3K > first to see what people think. > > The first change is to divide the conversion specifiers into two parts, > which we will call "alignment specifiers" and "format specifiers". So > the new syntax for a format field will be: > > valueSpec [,alignmentSpec] [:formatSpec] > > In other words, alignmentSpec is introduced by a comma, and conversion > spec is introduced by a colon. This use of comma and colon is taken > directly from .Net. although our alignment and conversion specifiers > themselves look nothing like the ones in .Net. > > Alignment specifiers now includes the former 'fill', 'align' and 'width' > properties. So for example, to indicate a field width of 8: > > "Property count {0,8}".format(propertyCount) How would I specify align right, and width 42? > The 'formatSpec' now includes the former 'sign' and 'type' parameters: > > "Number of errors: {0:+d}".format(errCount) > > In the preceding example, this would indicate an integer field preceded > by a sign for both positive and negative numbers. How can I get negative numbers to print in red? Just kidding. ;-) (I recently was frustrated by not being able to have text of two different colors in a single tkinter button.) > There are still some things to be worked out. For example, there are > currently 3 different meanings of 'width': Minimum width, maximum width, > and number of digits of decimal precision. The previous version of the > PEP followed the 2.x convention, which was 'n.n' - 'min.prec' for > floats, and 'min.max' for everything else. However, that seems confusing. Yep, enough so that I need to look it up more often than I like. > (I'm actually still working out the details - and in fact a little bit > of a bikeshed discussion would be welcome at this point, as I could use > some help ironing out these kinds of little inconsistencies.) > > In general, you can think of the difference between format specifier and > alignment specifier as: > > Format Specifier: Controls how the value is converted to a string. > Alignment Specifier: Controls how the string is placed on the line. Keeping related terms together is important I think. The order] {item, alignment: format} Splits the item and it's formatter. The alignment is more of a container property or feild property as you pointed out further down. So maybe if you group the related values together... {valuespec:format, alignment:width} This has a nice dictionary feel and maybe that may be useful as well. Reusing things I'm familiar with does make it easier. So it would be possible to create a dictionary and use it's repr() function as a formatter. Nice little bonus. ;-) string_format = {0:'i', 'R':12} string_format.repr().format(number) Well almost, there is the tiny problem of the quotes inside it. :/ So in order to right justify an integer... "Property count {0:i, R:8}".format(propertyCount) The precision for floats is not part of the field width, so it belongs in formatter term. "Total cost {0:f2, R:12}".format(totalcost) I'm not sure if it should be 'f.2' or just "f2". > Another change in the behavior is that the __format__ special method can > only be used to override the format specifier - it can't be used to > override the alignment specifier. The reason is simple: __format__ is > used to control how your object is string-ified. It shouldn't get > involved in things like left/right alignment or field width, which are > really properties of the field, not the object being printed. Right, so maybe there should be both a __format__, and an __alignment__ method? > The __format__ special method can basically completely change how the > format specifier is interpreted. So for example for Date objects you can > have a format specifier that looks like the input to strftime(). > > However, there are times when you want to override the __format__ hook. > The primary use case is the 'r' conversion specifier, which is used to > get the repr() of an object. > > At the moment I'm leaning towards using the exclamation mark ('!') to > indicate this, in a way that's analogous to the CSS "! important" flag - > it basically means "No, I really mean it!" Two possible syntax > alternatives are: > > "The repr is {0!r}".format(obj) > "The repr is {0:r!}".format(obj) > > In the first option, we use '!' in place of the colon. In the second > case, we use '!' as a suffix. -1 This doesn't feel right to me. It seems to me if you do this, then we will see all sorts of weird __repr__ methods that return things completely different than we get now. As an alternative ... Some sort of indirect formatting should be possible, but maybe it can be part of the data passed to the format method. "{0:s, L:10} = {1:!, L:12}".format(obj.name, (obj, rept)) Another possibility is to chain them some how. "{0:!, L:10} = {1:!, L:12}".formatfn(str, repr).format(obj, obj) Same could be done with the alignments if there is a use case for it. > Another change suggested by Guido is explicit support for the Decimal > type. Under the current proposal, a format specifier of 'f' will cause > the Decimal object to be coerced to float before printing. That's not > what we want, because it will cause a loss of precision. Instead, the > rule should be that Decimal can use all of the same formatting types as > float, but it won't try to convert the Decimal to float as an > intermediate step. +1 I'm hoping at some point (in the not too far future) I will be able to tell python to use Decimal in place of floats and not have to change nearly every function that produces a float literal. This is a step in that direction. > Here's some pseudo code outlining how the new formatting algorithm for > fields will work: > > def format_field(value, alignmentSpec, formatSpec): > if value has a __format__ attribute, and no '!' flag: > s = value.__format__(value, formatSpec) > else: > if the formatSpec is 'r': > s = repr(value) > else if the formatSpec is 'd' or one of the integer types: > # Coerce to int > s = formatInteger(int(value), formatSpec) > else if the formatSpec is 'f' or one of the float types: > if value is a Decimal: > s = formatDecimal(value, formatSpec) > else: > # Coerce to float > s = formatFloat(float(value), formatSpec) > else: > s = str(value) > > # Now that we have 's', apply the alignment options > return applyAlignment(s, alignmentSpec) > > My goal is that some time in the next several weeks I would like to get > working a C implementation of just this function. Most of the complexity > of the PEP implementation is right here IMHO. > > Before I edit the PEP I'm going to let this marinate for a week and see > what the discussion brings up. > > -- Talin Great work Talin, I cheer your efforts at keeping this moving given the many directions and turns it has taken! Cheers, Ron From talin at acm.org Thu Aug 2 07:32:55 2007 From: talin at acm.org (Talin) Date: Wed, 01 Aug 2007 22:32:55 -0700 Subject: [Python-3000] PEP 3115 chaining rules (was Re: pep 3124 plans) In-Reply-To: <20070731162912.3E2C53A40A7@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> <20070730201511.C14ED3A406B@sparrow.telecommunity.com> <46AF037C.9050902@gmail.com> <20070731162912.3E2C53A40A7@sparrow.telecommunity.com> Message-ID: <46B16C87.7060807@acm.org> Phillip J. Eby wrote: > At 07:40 PM 7/31/2007 +1000, Nick Coghlan wrote: >> Phillip J. Eby wrote: >>> In other words, a class' metaclass has to be a derivative of all >>> its bases' metaclasses; ISTM that a __prepare__ namespace needs to >>> be a derivative in some sense of all its bases' __prepare__ >>> results. This probably isn't enforceable, but the pattern should >>> be documented such that e.g. the overloading metaclass' __prepare__ >>> would return a mapping that delegates operations to the mapping >>> returned by its super()'s __prepare__, and the actual class >>> creation would be similarly chained. PEP 3115 probably needs a >>> section to explain these issues and recommend best practices for >>> implementing __prepare__ and class creation on that basis. I'll >>> write something up after I've thought this through some more. >> A variant of the metaclass rule specific to __prepare__ might look >> something like: >> A class's metaclass providing the __prepare__ method must be a >> subclass of all of the class's base classes providing __prepare__ methods. > > That doesn't really work; among other things, it would require > everything to be a dict subclass, since type.__prepare__() will > presumably return a dict. Therefore, it really does need to be > delegation instead of inheritance, or it becomes very difficult to > provide any "interesting" properties. I think you are on to something here. I think that in order to 'mix' metaclasses, each metaclass needs to get a crack at the members as they are defined. The 'dict' object really isn't important - what's important is to be able to overload the creation of a class member. I can think of a couple ways to accomplish this. 1) The first, and most brute force idea is to pass to a metaclass's __prepare__ statement an extra parameter containing the result of the previous metaclass's __prepare__ method. This would allow the __prepare__ statement to *wrap* the earlier metaclass's dict, intercepting the insertion operations or passing them through as needed. In fact, you could even make it so that the first __prepare__ in the chain gets passed in a regular dict. So __prepare__ always gets a dict which it is supposed to wrap, although it can choose to ignore it. 2) The second idea is to recognize the fact that we were never all that interested in creating a special dict subclass; The reason we chose that is because it seemed like an easy way to hook in to the addition of new members, by overridding the dict's insertion function. In other words, what the metaclass wants is the ability to override *insertion*. So you could change the metaclass interface to make it so that insertion is overridable, but in an "event chain" way, so that each metaclass gets a shot at the insertion event as it occurs. The problem here is that we need to support more than just insertion, but lookup (and possibly deletion) as well. This also leads to the third idea, which I am sure that you - of all people - have already thought of, which is: 3) Use something like your generic function 'next method' pattern. In fact, go the whole way and say that"add_class_member(cls:Metaclass, name, member, next:next_method)" is a generic function, and then call next_method to inform the next metaclass in the chain. There are two obvious problems here: First, we can't dispatch on 'cls' since it's not been created yet. Second, the metaclass machinery is deep down inside the interpreter and operates at the very heart of Python. Which means that in order to use generic functions, they would have to be built-in to the heart of Python as well. While I would love to see that happen some day, I am not comfortable giving an untried, brand new module such 'blessed' status. -- Talin From talin at acm.org Thu Aug 2 07:40:32 2007 From: talin at acm.org (Talin) Date: Wed, 01 Aug 2007 22:40:32 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B16480.7050502@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B16480.7050502@ronadam.com> Message-ID: <46B16E50.9050308@acm.org> Ron Adam wrote: > Splits the item and it's formatter. The alignment is more of a > container property or feild property as you pointed out further down. > > So maybe if you group the related values together... > > {valuespec:format, alignment:width} > > This has a nice dictionary feel and maybe that may be useful as well. > Reusing things I'm familiar with does make it easier. I'm certainly open to switching the order of things around. Remember, though, that we have *5* fields (6 depending on how you count) of formatting options to deal with (see the PEP for details): -- alignment -- padding char -- minwidth -- maxwidth -- fractional digits -- type ...And we need to be able to represent these succinctly. That last is important and here's why: None of these formatting codes are necessary at all, because in the final analysis you can get the same effect by wrapping the arguments to format() with the appropriate padding, alignment, and type conversion function calls. In other words, the whole point of these format codes is that they are convenient shortcuts. And shortcuts by definition need to be *short*. So we need to strike a balance between convenience and readability. -- Talin From rrr at ronadam.com Thu Aug 2 12:31:54 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 02 Aug 2007 05:31:54 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B16E50.9050308@acm.org> References: <46B13ADE.7080901@acm.org> <46B16480.7050502@ronadam.com> <46B16E50.9050308@acm.org> Message-ID: <46B1B29A.3020209@ronadam.com> Talin wrote: > Ron Adam wrote: > >> Splits the item and it's formatter. The alignment is more of a >> container property or feild property as you pointed out further down. >> >> So maybe if you group the related values together... >> >> {valuespec:format, alignment:width} >> >> This has a nice dictionary feel and maybe that may be useful as well. >> Reusing things I'm familiar with does make it easier. > > I'm certainly open to switching the order of things around. Remember, > though, that we have *5* fields (6 depending on how you count) of > formatting options to deal with (see the PEP for details): > > -- alignment > -- padding char > -- minwidth > -- maxwidth > -- fractional digits > -- type > > ...And we need to be able to represent these succinctly. That last is > important and here's why: None of these formatting codes are necessary > at all, because in the final analysis you can get the same effect by > wrapping the arguments to format() with the appropriate padding, > alignment, and type conversion function calls. > > In other words, the whole point of these format codes is that they are > convenient shortcuts. And shortcuts by definition need to be *short*. > > So we need to strike a balance between convenience and readability. > > -- Talin I wasn't thinking we would treat each option separately, just type, and alignment groups. Within those, we would have pretty much what you have proposed. Maybe it will help more to go a little slower instead of jumping in and offering a bunch of new alternatives. What code specifies Decimals, "D"? Any chance of thousands separators? And what about exchanging comma's and decimals? Does the following look complete, or needs anything added or removed? Format Specifiers: As string: type # [s|r] As integer: sign # [-|+|()] fill_char # character \__ +07d -> +0000123 fill_width # number / fills with zeros type # [b|c|d|o|x|X] (*) As fixed point: sign # [-|+|()] fill_char # character fill_width # number precision # number (fractional digits) type # [e|E|f|F|g|G|n|%] Alignment Specifiers: If formatted value is is shorter than min_width: align # [<|>|^] min_width # number padding_char # character If formatted value is longer than max_width: max_width # number Cheers, Ron From guido at python.org Thu Aug 2 16:46:32 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Aug 2007 07:46:32 -0700 Subject: [Python-3000] PEP 3115 chaining rules (was Re: pep 3124 plans) In-Reply-To: <46B16C87.7060807@acm.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> <20070730201511.C14ED3A406B@sparrow.telecommunity.com> <46AF037C.9050902@gmail.com> <20070731162912.3E2C53A40A7@sparrow.telecommunity.com> <46B16C87.7060807@acm.org> Message-ID: On 8/1/07, Talin wrote: > I think that in order to 'mix' metaclasses, each metaclass needs to get > a crack at the members as they are defined. The 'dict' object really > isn't important - what's important is to be able to overload the > creation of a class member. > > I can think of a couple ways to accomplish this. > > 1) The first, and most brute force idea is to pass to a metaclass's > __prepare__ statement an extra parameter containing the result of the > previous metaclass's __prepare__ method. This would allow the > __prepare__ statement to *wrap* the earlier metaclass's dict, > intercepting the insertion operations or passing them through as needed. I'm confused. The only way to mix metaclasses is by explicitly multiply inheriting them. So the normal "super" machinery should work, shouldn't it? (Except for 'type' not defining __prepare__(), but that can be fixed.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From stargaming at gmail.com Thu Aug 2 17:03:42 2007 From: stargaming at gmail.com (Stargaming) Date: Thu, 2 Aug 2007 15:03:42 +0000 (UTC) Subject: [Python-3000] optimizing [x]range References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> Message-ID: On Sat, 28 Jul 2007 17:06:50 +0200, tomer filiba wrote: > currently, testing for "x in xrange(y)" is an O(n) operation. > > since xrange objects (which would become range in py3k) are not real > lists, there's no reason that __contains__ be an O(n). it can easily be > made into an O(1) operation. here's a demo code (it should be trivial to > implement this in CPython) > > > class xxrange(object): > def __init__(self, *args): > if len(args) == 1: > self.start, self.stop, self.step = (0, args[0], 1) > elif len(args) == 2: > self.start, self.stop, self.step = (args[0], args[1], 1) > elif len(args) == 3: > self.start, self.stop, self.step = args > else: > raise TypeError("invalid number of args") > > def __iter__(self): > i = self.start > while i < self.stop: > yield i > i += self.step > > def __contains__(self, num): > if num < self.start or num > self.stop: > return False > return (num - self.start) % self.step == 0 > > > print list(xxrange(7)) # [0, 1, 2, 3, 4, 5, 6] print > list(xxrange(0, 7, 2)) # [0, 2, 4, 6] print list(xxrange(1, 7, 2)) > # [1, 3, 5] print 98 in xxrange(100) # True print 98 in > xxrange(0, 100, 2) # True print 99 in xxrange(0, 100, 2) # False > print 98 in xxrange(1, 100, 2) # False print 99 in xxrange(1, 100, 2) > # True > > > > -tomer I gave the implementation a try. I cannot guarantee that it follows every guideline for the CPython core but it works, it's fast and passes all tests:: >>> 98 in xrange(0, 100, 2) True >>> 99 in xrange(0, 100, 2) False >>> 98 in xrange(1, 100, 2) False >>> 99 in xrange(1, 100, 2) True Note: test_xrange wasn't really helpful with validating xrange's functionality. No tests for negative steps, at least it didn't warn me while it didn't work. ;) It is basically the algorithm you provided, with a fix for negative steps. The patch is based on the latest trunk/ checkout, Python 2.6. I don't think this is a problem if nobody else made any effort towards making xrange more sequence-like in the Python 3000 branch. The C source might require some tab/space cleanup. Speed comparison ================ $ ./python -V Python 2.6a0 $ python -V Python 2.5.1 ./python is my local build, with the patch applied. $ ./python -mtimeit "0 in xrange(100)" 1000000 loops, best of 3: 0.641 usec per loop $ python -mtimeit "0 in xrange(100)" 1000000 loops, best of 3: 0.717 usec per loop Well, with a lot of ignorance, this is still the same. $ ./python -mtimeit "99 in xrange(100)" 1000000 loops, best of 3: 0.638 usec per loop $ python -mtimeit "99 in xrange(100)" 100000 loops, best of 3: 6.17 usec per loop Notice the difference in the magnitude of loops! $ ./python -mtimeit -s "from sys import maxint" "maxint in xrange(maxint)" 1000000 loops, best of 3: 0.622 usec per loop $ python -mtimeit -s "from sys import maxint" "maxint in xrange(maxint)" Still waiting.. ;) Index: Objects/rangeobject.c =================================================================== --- Objects/rangeobject.c (revision 56666) +++ Objects/rangeobject.c (working copy) @@ -129,12 +129,31 @@ return rtn; } +static int +range_contains(rangeobject *r, PyIntObject *key) { + if (PyInt_Check(key)) { + int keyval = key->ob_ival; + int start = r->start; + int step = r->step; + int end = start + (r->len * step); + + if ((step < 0 && keyval <= start && keyval > end) \ + || (step > 0 && keyval >= start && keyval < end)) { + return ((keyval - start) % step) == 0; + } + } + return 0; +} + static PySequenceMethods range_as_sequence = { (lenfunc)range_length, /* sq_length */ 0, /* sq_concat */ 0, /* sq_repeat */ (ssizeargfunc)range_item, /* sq_item */ 0, /* sq_slice */ + 0, /* sq_ass_item */ + 0, /* sq_ass_slice */ + (objobjproc)range_contains, /* sq_contains */ }; static PyObject * range_iter(PyObject *seq); Test suite ========== OK 288 tests OK. CAUTION: stdout isn't compared in verbose mode: a test that passes in verbose mode may fail without it. 1 test failed: test_nis # due to verbosity, I guess 38 tests skipped: test_aepack test_al test_applesingle test_bsddb test_bsddb185 test_bsddb3 test_bz2 test_cd test_cl test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_curses test_dbm test_gdbm test_gl test_imageop test_imgfile test_linuxaudiodev test_macostools test_normalization test_ossaudiodev test_pep277 test_plistlib test_scriptpackages test_socketserver test_sqlite test_startfile test_sunaudiodev test_tcl test_timeout test_urllib2net test_urllibnet test_winreg test_winsound test_zipfile64 5 skips unexpected on linux2: test_tcl test_dbm test_bz2 test_gdbm test_bsddb Should I submit the patch to the SF patch manager as well? Regards, Stargaming From stargaming at gmail.com Thu Aug 2 18:19:29 2007 From: stargaming at gmail.com (Stargaming) Date: Thu, 2 Aug 2007 16:19:29 +0000 (UTC) Subject: [Python-3000] optimizing [x]range References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> Message-ID: On Thu, 02 Aug 2007 15:03:42 +0000, Stargaming wrote: > On Sat, 28 Jul 2007 17:06:50 +0200, tomer filiba wrote: > >> currently, testing for "x in xrange(y)" is an O(n) operation. >> >> since xrange objects (which would become range in py3k) are not real >> lists, there's no reason that __contains__ be an O(n). it can easily be >> made into an O(1) operation. here's a demo code (it should be trivial >> to implement this in CPython) [snipped algorithm] > > I gave the implementation a try. [snipped patch details] > > Should I submit the patch to the SF patch manager as well? Guido> Yes, please submit to SF. Submitted to the SF patch manager as patch #1766304. It is marked as a Python 2.6 item. http://sourceforge.net/tracker/index.php? func=detail&aid=1766304&group_id=5470&atid=305470 Regards, Stargaming From pje at telecommunity.com Thu Aug 2 18:24:04 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 02 Aug 2007 12:24:04 -0400 Subject: [Python-3000] PEP 3115 chaining rules (was Re: pep 3124 plans) In-Reply-To: References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> <20070730201511.C14ED3A406B@sparrow.telecommunity.com> <46AF037C.9050902@gmail.com> <20070731162912.3E2C53A40A7@sparrow.telecommunity.com> <46B16C87.7060807@acm.org> Message-ID: <20070802162146.2B0DB3A406B@sparrow.telecommunity.com> At 07:46 AM 8/2/2007 -0700, Guido van Rossum wrote: >On 8/1/07, Talin wrote: > > I think that in order to 'mix' metaclasses, each metaclass needs to get > > a crack at the members as they are defined. The 'dict' object really > > isn't important - what's important is to be able to overload the > > creation of a class member. > > > > I can think of a couple ways to accomplish this. > > > > 1) The first, and most brute force idea is to pass to a metaclass's > > __prepare__ statement an extra parameter containing the result of the > > previous metaclass's __prepare__ method. This would allow the > > __prepare__ statement to *wrap* the earlier metaclass's dict, > > intercepting the insertion operations or passing them through as needed. > >I'm confused. The only way to mix metaclasses is by explicitly >multiply inheriting them. So the normal "super" machinery should work, >shouldn't it? Yes. As I said in the email Talin was replying to, it's sufficient to document the fact that a metaclass should call its super()'s __prepare__ and delegate operations to it; the additional stuff Talin is suggesting is unnecessary. > (Except for 'type' not defining __prepare__(), but that >can be fixed.) Yep. From guido at python.org Thu Aug 2 18:48:27 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Aug 2007 09:48:27 -0700 Subject: [Python-3000] PEP 3115 chaining rules (was Re: pep 3124 plans) In-Reply-To: <20070802162146.2B0DB3A406B@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> <20070730201511.C14ED3A406B@sparrow.telecommunity.com> <46AF037C.9050902@gmail.com> <20070731162912.3E2C53A40A7@sparrow.telecommunity.com> <46B16C87.7060807@acm.org> <20070802162146.2B0DB3A406B@sparrow.telecommunity.com> Message-ID: On 8/2/07, Phillip J. Eby wrote: > At 07:46 AM 8/2/2007 -0700, Guido van Rossum wrote: > >On 8/1/07, Talin wrote: > > > I think that in order to 'mix' metaclasses, each metaclass needs to get > > > a crack at the members as they are defined. The 'dict' object really > > > isn't important - what's important is to be able to overload the > > > creation of a class member. > > > > > > I can think of a couple ways to accomplish this. > > > > > > 1) The first, and most brute force idea is to pass to a metaclass's > > > __prepare__ statement an extra parameter containing the result of the > > > previous metaclass's __prepare__ method. This would allow the > > > __prepare__ statement to *wrap* the earlier metaclass's dict, > > > intercepting the insertion operations or passing them through as needed. > > > >I'm confused. The only way to mix metaclasses is by explicitly > >multiply inheriting them. So the normal "super" machinery should work, > >shouldn't it? > > Yes. As I said in the email Talin was replying to, it's sufficient > to document the fact that a metaclass should call its super()'s > __prepare__ and delegate operations to it; the additional stuff Talin > is suggesting is unnecessary. > > > (Except for 'type' not defining __prepare__(), but that > >can be fixed.) > > Yep. Committed revision 56672. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nicko at nicko.org Thu Aug 2 18:47:45 2007 From: nicko at nicko.org (Nicko van Someren) Date: Thu, 2 Aug 2007 17:47:45 +0100 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B13ADE.7080901@acm.org> References: <46B13ADE.7080901@acm.org> Message-ID: <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> On 2 Aug 2007, at 03:01, Talin wrote: > In general, you can think of the difference between format > specifier and > alignment specifier as: > > Format Specifier: Controls how the value is converted to a > string. > Alignment Specifier: Controls how the string is placed on the > line. > > Another change in the behavior is that the __format__ special > method can > only be used to override the format specifier - it can't be used to > override the alignment specifier. The reason is simple: __format__ is > used to control how your object is string-ified. It shouldn't get > involved in things like left/right alignment or field width, which are > really properties of the field, not the object being printed. Say I format numbers in an accounting system and, in the absence of being able to colour my losses in red, I choose the parentheses sign representation style (). In this case I'd like to be able to have my numbers align thus: 1000 200 (3000) 40 (50000) I.e. with the bulk of the padding applied before the number but conditional padding after the number if there is no closing bracket. If the placement is done entirely outside the __format__ method then you to make sure that it is documented that, when using the () style of sign indicator, positive numbers need to have a space placed either side, e.g. -100 goes to "(100)" but +100 does to " 100 ". If you do this then it should all come out in the wash, but I think it deserves a note somewhere. Cheers, Nicko From guido at python.org Thu Aug 2 20:30:58 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Aug 2007 11:30:58 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> Message-ID: Personally I think support for the various accounting-style output is not worth it. I betcha any accounting system worth the name would not use this and instead have its own custom code for formatting anyway. My personal suggestion is to stay close to the .NET formatting language: name_specifier [',' width_specifier] [':' conversion_specifier] where width_specifier is a positive or negative number giving the minimum width (negative for left-alignment) and conversion_specifier is passed uninterpreted to the object's __format__ method. In order to support the use cases for %s and %r, I propose to allow appending a single letter 's', 'r' or 'f' to the width_specifier (*not* the conversion_specifier): 'r' always calls repr() on the object; 's' always calls str() on the object; 'f' calls the object's __format__() method passing it the conversion_specifier, or if it has no __format__() method, calls repr() on it. This is also the default. If no __format__() method was called (either because 'r' or 's' was used, or because there was no __format__() method on the object), the conversion_specifier (if given) is a *maximum* length; this handles the pretty common use cases of %.20s and %.20r (limiting the size of a printed value). The numeric types are the main types that must provide __format__(). (I also propose that for datetime types the format string ought to be interpreted as a strftime format string.) I think that float.__format__() should *not* support the integer formatting codes (d, x, o etc.) -- I find the current '%d' % 3.14 == '3' an abomination which is most likely an incidental effect of calling int() on the argument (should really be __index__()). But int.__format__() should support the float formatting codes; I think '%6.3f' % 12 should return ' 12.000'. This is in line with 1/2 returning 0.5; int values should produce results identical to the corresponding float values when used in the same context. I think this should be solved inside int.__format__() though; the generic formatting code should not have to know about this. --Guido On 8/2/07, Nicko van Someren wrote: > On 2 Aug 2007, at 03:01, Talin wrote: > > In general, you can think of the difference between format > > specifier and > > alignment specifier as: > > > > Format Specifier: Controls how the value is converted to a > > string. > > Alignment Specifier: Controls how the string is placed on the > > line. > > > > Another change in the behavior is that the __format__ special > > method can > > only be used to override the format specifier - it can't be used to > > override the alignment specifier. The reason is simple: __format__ is > > used to control how your object is string-ified. It shouldn't get > > involved in things like left/right alignment or field width, which are > > really properties of the field, not the object being printed. > > Say I format numbers in an accounting system and, in the absence of > being able to colour my losses in red, I choose the parentheses sign > representation style (). In this case I'd like to be able to have my > numbers align thus: > 1000 > 200 > (3000) > 40 > (50000) > I.e. with the bulk of the padding applied before the number but > conditional padding after the number if there is no closing bracket. > > If the placement is done entirely outside the __format__ method then > you to make sure that it is documented that, when using the () style > of sign indicator, positive numbers need to have a space placed > either side, e.g. -100 goes to "(100)" but +100 does to " 100 ". If > you do this then it should all come out in the wash, but I think it > deserves a note somewhere. > > Cheers, > Nicko > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From eric+python-dev at trueblade.com Thu Aug 2 20:47:44 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 02 Aug 2007 11:47:44 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B13ADE.7080901@acm.org> References: <46B13ADE.7080901@acm.org> Message-ID: <46B226D0.5010505@trueblade.com> Talin wrote: > The first change is to divide the conversion specifiers into two parts, > which we will call "alignment specifiers" and "format specifiers". So > the new syntax for a format field will be: > > valueSpec [,alignmentSpec] [:formatSpec] > > In other words, alignmentSpec is introduced by a comma, and conversion > spec is introduced by a colon. This use of comma and colon is taken > directly from .Net. although our alignment and conversion specifiers > themselves look nothing like the ones in .Net. Should the .format() method (and underlying machinery) interpret the formatSpec (and/or the alignmentSpec) at all? I believe .NET doesn't proscribe any meaning to its format specifiers, but instead passes them in to the object for parsing and interpretation. You would lose the ability to implicitely convert to ints, floats and strings, but maybe that should be explicit, anyway. And if we do that, why not require all objects to support a __format__ method? Maybe if it's not present we could convert to a string, and use the default string __format__ method. This way, there would be less special purpose machinery, and .format() could just parse out the {}'s, extract the object from the parameters, and call the underlying object's __format__ method. Eric. From jyasskin at gmail.com Thu Aug 2 20:53:18 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Thu, 2 Aug 2007 11:53:18 -0700 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy for Numbers In-Reply-To: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> Message-ID: <5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com> After some more discussion, I have another version of the PEP with a draft, partial implementation. Let me know what you think. PEP: 3141 Title: A Type Hierarchy for Numbers Version: $Revision: 56646 $ Last-Modified: $Date: 2007-08-01 10:11:55 -0700 (Wed, 01 Aug 2007) $ Author: Jeffrey Yasskin Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 23-Apr-2007 Post-History: 25-Apr-2007, 16-May-2007, 02-Aug-2007 Abstract ======== This proposal defines a hierarchy of Abstract Base Classes (ABCs) (PEP 3119) to represent number-like classes. It proposes a hierarchy of ``Number :> Complex :> Real :> Rational :> Integral`` where ``A :> B`` means "A is a supertype of B", and a pair of ``Exact``/``Inexact`` classes to capture the difference between ``floats`` and ``ints``. These types are significantly inspired by Scheme's numeric tower [#schemetower]_. Rationale ========= Functions that take numbers as arguments should be able to determine the properties of those numbers, and if and when overloading based on types is added to the language, should be overloadable based on the types of the arguments. For example, slicing requires its arguments to be ``Integrals``, and the functions in the ``math`` module require their arguments to be ``Real``. Specification ============= This PEP specifies a set of Abstract Base Classes, and suggests a general strategy for implementing some of the methods. It uses terminology from PEP 3119, but the hierarchy is intended to be meaningful for any systematic method of defining sets of classes. The type checks in the standard library should use these classes instead of the concrete built-ins. Numeric Classes --------------- We begin with a Number class to make it easy for people to be fuzzy about what kind of number they expect. This class only helps with overloading; it doesn't provide any operations. :: class Number(metaclass=ABCMeta): pass Most implementations of complex numbers will be hashable, but if you need to rely on that, you'll have to check it explicitly: mutable numbers are supported by this hierarchy. **Open issue:** Should __pos__ coerce the argument to be an instance of the type it's defined on? Why do the builtins do this? :: class Complex(Number): """Complex defines the operations that work on the builtin complex type. In short, those are: a conversion to complex, .real, .imag, +, -, *, /, abs(), .conjugate, ==, and !=. If it is given heterogenous arguments, and doesn't have special knowledge about them, it should fall back to the builtin complex type as described below. """ @abstractmethod def __complex__(self): """Return a builtin complex instance.""" def __bool__(self): """True if self != 0.""" return self != 0 @abstractproperty def real(self): """Retrieve the real component of this number. This should subclass Real. """ raise NotImplementedError @abstractproperty def imag(self): """Retrieve the real component of this number. This should subclass Real. """ raise NotImplementedError @abstractmethod def __add__(self, other): raise NotImplementedError @abstractmethod def __radd__(self, other): raise NotImplementedError @abstractmethod def __neg__(self): raise NotImplementedError def __pos__(self): return self def __sub__(self, other): return self + -other def __rsub__(self, other): return -self + other @abstractmethod def __mul__(self, other): raise NotImplementedError @abstractmethod def __rmul__(self, other): raise NotImplementedError @abstractmethod def __div__(self, other): raise NotImplementedError @abstractmethod def __rdiv__(self, other): raise NotImplementedError @abstractmethod def __pow__(self, exponent): """Like division, a**b should promote to complex when necessary.""" raise NotImplementedError @abstractmethod def __rpow__(self, base): raise NotImplementedError @abstractmethod def __abs__(self): """Returns the Real distance from 0.""" raise NotImplementedError @abstractmethod def conjugate(self): """(x+y*i).conjugate() returns (x-y*i).""" raise NotImplementedError @abstractmethod def __eq__(self, other): raise NotImplementedError def __ne__(self, other): return not (self == other) The ``Real`` ABC indicates that the value is on the real line, and supports the operations of the ``float`` builtin. Real numbers are totally ordered except for NaNs (which this PEP basically ignores). :: class Real(Complex): """To Complex, Real adds the operations that work on real numbers. In short, those are: a conversion to float, trunc(), divmod, %, <, <=, >, and >=. Real also provides defaults for the derived operations. """ @abstractmethod def __float__(self): """Any Real can be converted to a native float object.""" raise NotImplementedError @abstractmethod def __trunc__(self): """Truncates self to an Integral. Returns an Integral i such that: * i>0 iff self>0 * abs(i) <= abs(self). """ raise NotImplementedError def __divmod__(self, other): """The pair (self // other, self % other). Sometimes this can be computed faster than the pair of operations. """ return (self // other, self % other) def __rdivmod__(self, other): """The pair (self // other, self % other). Sometimes this can be computed faster than the pair of operations. """ return (other // self, other % self) @abstractmethod def __floordiv__(self, other): """The floor() of self/other. Integral.""" raise NotImplementedError @abstractmethod def __rfloordiv__(self, other): """The floor() of other/self.""" raise NotImplementedError @abstractmethod def __mod__(self, other): raise NotImplementedError @abstractmethod def __rmod__(self, other): raise NotImplementedError @abstractmethod def __lt__(self, other): """< on Reals defines a total ordering, except perhaps for NaN.""" raise NotImplementedError @abstractmethod def __le__(self, other): raise NotImplementedError # Concrete implementations of Complex abstract methods. def __complex__(self): return complex(float(self)) @property def real(self): return self @property def imag(self): return 0 def conjugate(self): """Conjugate is a no-op for Reals.""" return self There is no built-in rational type, but it's straightforward to write, so we provide an ABC for it. **Open issue**: Add Demo/classes/Rat.py to the stdlib? :: class Rational(Real, Exact): """.numerator and .denominator should be in lowest terms.""" @abstractproperty def numerator(self): raise NotImplementedError @abstractproperty def denominator(self): raise NotImplementedError # Concrete implementation of Real's conversion to float. def __float__(self): return self.numerator / self.denominator And finally integers:: class Integral(Rational): """Integral adds a conversion to int and the bit-string operations.""" @abstractmethod def __int__(self): raise NotImplementedError def __index__(self): return int(self) @abstractmethod def __pow__(self, exponent, modulus): """self ** exponent % modulus, but maybe faster. Implement this if you want to support the 3-argument version of pow(). Otherwise, just implement the 2-argument version described in Complex. Raise a TypeError if exponent < 0 or any argument isn't Integral. """ raise NotImplementedError @abstractmethod def __lshift__(self, other): raise NotImplementedError @abstractmethod def __rlshift__(self, other): raise NotImplementedError @abstractmethod def __rshift__(self, other): raise NotImplementedError @abstractmethod def __rrshift__(self, other): raise NotImplementedError @abstractmethod def __and__(self, other): raise NotImplementedError @abstractmethod def __rand__(self, other): raise NotImplementedError @abstractmethod def __xor__(self, other): raise NotImplementedError @abstractmethod def __rxor__(self, other): raise NotImplementedError @abstractmethod def __or__(self, other): raise NotImplementedError @abstractmethod def __ror__(self, other): raise NotImplementedError @abstractmethod def __invert__(self): raise NotImplementedError # Concrete implementations of Rational and Real abstract methods. def __float__(self): return float(int(self)) @property def numerator(self): return self @property def denominator(self): return 1 Exact vs. Inexact Classes ------------------------- Floating point values may not exactly obey several of the properties you would expect. For example, it is possible for ``(X + -X) + 3 == 3``, but ``X + (-X + 3) == 0``. On the range of values that most functions deal with this isn't a problem, but it is something to be aware of. Therefore, I define ``Exact`` and ``Inexact`` ABCs to mark whether types have this problem. Every instance of ``Integral`` and ``Rational`` should be Exact, but ``Reals`` and ``Complexes`` may or may not be. (Do we really only need one of these, and the other is defined as ``not`` the first?) :: class Exact(Number): pass class Inexact(Number): pass Changes to operations and __magic__ methods ------------------------------------------- To support more precise narrowing from float to int (and more generally, from Real to Integral), I'm proposing the following new __magic__ methods, to be called from the corresponding library functions. All of these return Integrals rather than Reals. 1. ``__trunc__(self)``, called from a new builtin ``trunc(x)``, which returns the Integral closest to ``x`` between 0 and ``x``. 2. ``__floor__(self)``, called from ``math.floor(x)``, which returns the greatest Integral ``<= x``. 3. ``__ceil__(self)``, called from ``math.ceil(x)``, which returns the least Integral ``>= x``. 4. ``__round__(self)``, called from ``round(x)``, with returns the Integral closest to ``x``, rounding half toward even. **Open issue:** We could support the 2-argument version, but then we'd only return an Integral if the second argument were ``<= 0``. 5. ``__properfraction__(self)``, called from a new function, ``math.properfraction(x)``, which resembles C's ``modf()``: returns a pair ``(n:Integral, r:Real)`` where ``x == n + r``, both ``n`` and ``r`` have the same sign as ``x``, and ``abs(r) < 1``. **Open issue:** Oh, we already have ``math.modf``. What name do we want for this? Should we use divmod(x, 1) instead? Because the ``int()`` conversion from ``float`` is equivalent to but less explicit than ``trunc()``, let's remove it. (Or, if that breaks too much, just add a deprecation warning.) ``complex.__{divmod,mod,floordiv,int,float}__`` should also go away. These should continue to raise ``TypeError`` to help confused porters, but should not appear in ``help(complex)`` to avoid confusing more people. **Open issue:** This is difficult to do with the ``PyNumberMethods`` struct. What's the best way to accomplish it? Notes for type implementors --------------------------- Implementors should be careful to make equal numbers equal and hash them to the same values. This may be subtle if there are two different extensions of the real numbers. For example, a complex type could reasonably implement hash() as follows:: def __hash__(self): return hash(complex(self)) but should be careful of any values that fall outside of the built in complex's range or precision. Adding More Numeric ABCs ~~~~~~~~~~~~~~~~~~~~~~~~ There are, of course, more possible ABCs for numbers, and this would be a poor hierarchy if it precluded the possibility of adding those. You can add ``MyFoo`` between ``Complex`` and ``Real`` with:: class MyFoo(Complex): ... MyFoo.register(Real) Implementing the arithmetic operations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We want to implement the arithmetic operations so that mixed-mode operations either call an implementation whose author knew about the types of both arguments, or convert both to the nearest built in type and do the operation there. For subtypes of Integral, this means that __add__ and __radd__ should be defined as:: class MyIntegral(Integral): def __add__(self, other): if isinstance(other, MyIntegral): return do_my_adding_stuff(self, other) elif isinstance(other, OtherTypeIKnowAbout): return do_my_other_adding_stuff(self, other) else: return NotImplemented def __radd__(self, other): if isinstance(other, MyIntegral): return do_my_adding_stuff(other, self) elif isinstance(other, OtherTypeIKnowAbout): return do_my_other_adding_stuff(other, self) elif isinstance(other, Integral): return int(other) + int(self) elif isinstance(other, Real): return float(other) + float(self) elif isinstance(other, Complex): return complex(other) + complex(self) else: return NotImplemented There are 5 different cases for a mixed-type operation on subclasses of Complex. I'll refer to all of the above code that doesn't refer to MyIntegral and OtherTypeIKnowAbout as "boilerplate". ``a`` will be an instance of ``A``, which is a subtype of ``Complex`` (``a : A <: Complex``), and ``b : B <: Complex``. I'll consider ``a + b``: 1. If A defines an __add__ which accepts b, all is well. 2. If A falls back to the boilerplate code, and it were to return a value from __add__, we'd miss the possibility that B defines a more intelligent __radd__, so the boilerplate should return NotImplemented from __add__. (Or A may not implement __add__ at all.) 3. Then B's __radd__ gets a chance. If it accepts a, all is well. 4. If it falls back to the boilerplate, there are no more possible methods to try, so this is where the default implementation should live. 5. If B <: A, Python tries B.__radd__ before A.__add__. This is ok, because it was implemented with knowledge of A, so it can handle those instances before delegating to Complex. If ``A<:Complex`` and ``B<:Real`` without sharing any other knowledge, then the appropriate shared operation is the one involving the built in complex, and both __radd__s land there, so ``a+b == b+a``. Rejected Alternatives ===================== The initial version of this PEP defined an algebraic hierarchy inspired by a Haskell Numeric Prelude [#numericprelude]_ including MonoidUnderPlus, AdditiveGroup, Ring, and Field, and mentioned several other possible algebraic types before getting to the numbers. I had expected this to be useful to people using vectors and matrices, but the NumPy community really wasn't interested, and we ran into the issue that even if ``x`` is an instance of ``X <: MonoidUnderPlus`` and ``y`` is an instance of ``Y <: MonoidUnderPlus``, ``x + y`` may still not make sense. Then I gave the numbers a much more branching structure to include things like the Gaussian Integers and Z/nZ, which could be Complex but wouldn't necessarily support things like division. The community decided that this was too much complication for Python, so I've now scaled back the proposal to resemble the Scheme numeric tower much more closely. References ========== .. [#pep3119] Introducing Abstract Base Classes (http://www.python.org/dev/peps/pep-3119/) .. [#classtree] Possible Python 3K Class Tree?, wiki page created by Bill Janssen (http://wiki.python.org/moin/AbstractBaseClasses) .. [#numericprelude] NumericPrelude: An experimental alternative hierarchy of numeric type classes (http://darcs.haskell.org/numericprelude/docs/html/index.html) .. [#schemetower] The Scheme numerical tower (http://www.swiss.ai.mit.edu/ftpdir/scheme-reports/r5rs-html/r5rs_8.html#SEC50) Acknowledgements ================ Thanks to Neil Norwitz for encouraging me to write this PEP in the first place, to Travis Oliphant for pointing out that the numpy people didn't really care about the algebraic concepts, to Alan Isaac for reminding me that Scheme had already done this, and to Guido van Rossum and lots of other people on the mailing list for refining the concept. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -------------- next part -------------- A non-text attachment was scrubbed... Name: numbers.diff Type: application/octet-stream Size: 28346 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070802/1ae0d946/attachment-0001.obj -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pep-3141.txt Url: http://mail.python.org/pipermail/python-3000/attachments/20070802/1ae0d946/attachment-0001.txt From martin at v.loewis.de Thu Aug 2 21:43:14 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 02 Aug 2007 21:43:14 +0200 Subject: [Python-3000] optimizing [x]range In-Reply-To: References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> Message-ID: <46B233D2.4030304@v.loewis.de> > The patch is based on the latest trunk/ checkout, Python 2.6. I don't > think this is a problem if nobody else made any effort towards making > xrange more sequence-like in the Python 3000 branch. The C source might > require some tab/space cleanup. Unfortunately, this is exactly what happened: In Py3k, the range object is defined in terms PyObject*, so your patch won't apply to the 3k branch. Regards, Martin From guido at python.org Thu Aug 2 21:47:09 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Aug 2007 12:47:09 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B2323E.6040206@cornell.edu> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2323E.6040206@cornell.edu> Message-ID: On 8/2/07, Joel Bender wrote: > > My personal suggestion is to stay close to the .NET formatting language > > If Microsoft formatting ideas are going to be used, why not use the > Excel language? In my mind it's not any worse than any other string of > characters with special meanings. It's widely understood (mostly), > clearly documented (kinda), and I think the date and time formatting is > clearer than strftime. > > I would expect [Red] to be omitted. You may be overestimating how widely it understood it is. I betcha that most Python programmers have never heard of it. I certainly have no idea what the Excel language is (and I've had Excel on my various laptops for about a decade). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jjb5 at cornell.edu Thu Aug 2 21:36:30 2007 From: jjb5 at cornell.edu (Joel Bender) Date: Thu, 02 Aug 2007 15:36:30 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> Message-ID: <46B2323E.6040206@cornell.edu> Guido van Rossum wrote: > Personally I think support for the various accounting-style output is > not worth it. I betcha any accounting system worth the name would not > use this and instead have its own custom code for formatting anyway. > > My personal suggestion is to stay close to the .NET formatting language If Microsoft formatting ideas are going to be used, why not use the Excel language? In my mind it's not any worse than any other string of characters with special meanings. It's widely understood (mostly), clearly documented (kinda), and I think the date and time formatting is clearer than strftime. I would expect [Red] to be omitted. Joel From guido at python.org Fri Aug 3 00:25:36 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Aug 2007 15:25:36 -0700 Subject: [Python-3000] optimizing [x]range In-Reply-To: <46B233D2.4030304@v.loewis.de> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <46B233D2.4030304@v.loewis.de> Message-ID: On 8/2/07, "Martin v. L?wis" wrote: > > The patch is based on the latest trunk/ checkout, Python 2.6. I don't > > think this is a problem if nobody else made any effort towards making > > xrange more sequence-like in the Python 3000 branch. The C source might > > require some tab/space cleanup. > > Unfortunately, this is exactly what happened: In Py3k, the range object > is defined in terms PyObject*, so your patch won't apply to the 3k branch. FWIW, making xrange (or range in Py3k) "more sequence-like" is exactly what should *not* happen. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tjreedy at udel.edu Fri Aug 3 01:01:39 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 2 Aug 2007 19:01:39 -0400 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchyfor Numbers References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> <5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com> Message-ID: "Jeffrey Yasskin" wrote in message news:5d44f72f0708021153u7ea1f443jfdee3c167b011011 at mail.gmail.com... | def __bool__(self): | """True if self != 0.""" | return self != 0 Could this be a Number rather than Complex method? --------------- | There is no built-in rational type Floats constitute a bit-size bounded (like ints) set of rationals with denominators restricted to powers of two. Decimal literals and Decimals constitute a memory bounded (like longs) set of rationals with denominators instead restricted to powers of ten. I suspect that if both were presented as such, new programmers would be less likely to ask if >>> 1.1 1.1000000000000001 is a bug in Python. Math.frexp returns a disguised form of (numerator,denominator) (without common factor of two removal). If undisguised functions were added (and the same for Decimal), there would be no need, really, for class Real. If such were done, a .num_denom() method either supplementing or replacing .numerator() and .denominator() and returning (num, denom) would have the same efficiency justification of int.divmod. I would like to see a conforming Rat.py class with unrestricted denominators. -------------------- | And finally integers:: | | class Integral(Rational): | """Integral adds a conversion to int and the bit-string operations.""" The bit-string operations are not 'integer' operations. Rather they are 'integers represented as powers of two' operations. While << and >> can be interpreted (and implemented) as * and //, the other four are genernally meaningless for other representations, such as prime factorization or fibonacci base. The Lib Ref agrees: 3.4.1 Bit-string Operations on Integer Types Plain and long integer types support additional operations that make sense only for bit-strings Other integer types should not have to support them to call themselves Integral. So I think at least |, ^, &, and ~ should be removed from Integral and put in a subclass thereof. Possible names are Pow2Int or BitStringInt or BitIntegral. ----------- In short, having read up to the beginning of Exact vs. Inexact Classes, my suggestion is to delete the unrealizable 'real' class and add an easily realizable non-bit-string integer class. Terry Jan Reedy From tjreedy at udel.edu Fri Aug 3 01:30:59 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 2 Aug 2007 19:30:59 -0400 Subject: [Python-3000] Updated and simplified PEP 3141: A TypeHierarchyfor Numbers References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com><5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com> Message-ID: "Terry Reedy" wrote in message news:f8tnog$45f$1 at sea.gmane.org... || In short, having read up to the beginning of Exact vs. Inexact Classes, my | suggestion is to delete the unrealizable 'real' class Less than a minute after hitting Send, I realized that one could base a (restricted) class of non-rational reals on tuple of rationals, with one being an exponent of the other. But since operations on such pairs generally do not simplify to such a pair, the members of such a class would have to be expression trees. So computation would be mostly symbolic rather than actual. And I don't think we need an ABC for such a specialized symbolic computation class. tjr From guido at python.org Fri Aug 3 01:34:50 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Aug 2007 16:34:50 -0700 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchyfor Numbers In-Reply-To: References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> <5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com> Message-ID: On 8/2/07, Terry Reedy wrote: > Floats constitute a bit-size bounded (like ints) set of rationals with > denominators restricted to powers of two. Decimal literals and Decimals > constitute a memory bounded (like longs) set of rationals with denominators > instead restricted to powers of ten. I suspect that if both were presented > as such, new programmers would be less likely to ask if > >>> 1.1 > 1.1000000000000001 > is a bug in Python. You gotta be kidding. That complaint mostly comes from people who would completely glaze over an explanation like the paragraph above. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Fri Aug 3 03:26:58 2007 From: janssen at parc.com (Bill Janssen) Date: Thu, 2 Aug 2007 18:26:58 PDT Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchyfor Numbers In-Reply-To: References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> <5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com> Message-ID: <07Aug2.182707pdt."57996"@synergy1.parc.xerox.com> Terry, I liked these ideas so much I removed both "integer" and "float" from the HTTP-NG type system. See http://www.parc.com/janssen/pubs/http-next-generation-architecture.html, section 4.5.1. Though if I was doing it again, I'd go further, and make all fixed-point, floating-point, and string types abstract, so that only application-defined concrete subtypes could be instantiated. Bill From greg.ewing at canterbury.ac.nz Fri Aug 3 03:33:08 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Aug 2007 13:33:08 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> Message-ID: <46B285D4.1060207@canterbury.ac.nz> Nicko van Someren wrote: > 1000 > 200 > (3000) > 40 > (50000) > I.e. with the bulk of the padding applied before the number but > conditional padding after the number if there is no closing bracket. I think it should be the responsibility of the formatter to add the extra space when needed. Then the aligner can just do its usual thing with the result and doesn't have to know anything about the format. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Fri Aug 3 04:03:52 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Aug 2007 14:03:52 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> Message-ID: <46B28D08.9000700@canterbury.ac.nz> Guido van Rossum wrote: > In order to support the use cases for %s and %r, I propose to allow > appending a single letter 's', 'r' or 'f' to the width_specifier > (*not* the conversion_specifier): > > 'r' always calls repr() on the object; > 's' always calls str() on the object; > 'f' calls the object's __format__() method passing it the > conversion_specifier, or if it has no __format__() method, calls > repr() on it. This is also the default. Won't it seem a bit unintuitive that 'r' and 's' have to come before the colon, whereas all the others come after it? It would seem more logical to me if 'r' and 's' were treated as special cases of the conversion specifier that are recognised before calling __format__. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From brandon at rhodesmill.org Fri Aug 3 04:14:58 2007 From: brandon at rhodesmill.org (Brandon Craig Rhodes) Date: Thu, 02 Aug 2007 22:14:58 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: (Guido van Rossum's message of "Thu, 2 Aug 2007 11:30:58 -0700") References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> Message-ID: <87d4y5749p.fsf@ten22.rhodesmill.org> "Guido van Rossum" writes: > My personal suggestion is to stay close to the .NET formatting language: > > name_specifier [',' width_specifier] [':' conversion_specifier] A problem is that this format requires brute memorization to remember where to put things. If letters were used to prefix specifications, like "w" for width and "p" for precision, one could write something like: >>> 'The average is: {0:w8p2} today.'.format(avg) 'The average is: 7.24 today.' This would give users at least a shot at mnemonically parsing - and constructing - format strings, and eliminate the problem of having to decide what goes first. If, on the other hand, all we have to go on are some commas and colons, then I, for one, will probably always have to look things up - just like I always did for C-style percent-sign format specifications in the first place. -- Brandon Craig Rhodes brandon at rhodesmill.org http://rhodesmill.org/brandon From jyasskin at gmail.com Fri Aug 3 05:06:32 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Thu, 2 Aug 2007 20:06:32 -0700 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchyfor Numbers In-Reply-To: References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> <5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com> Message-ID: <5d44f72f0708022006l51618c4du7223ffad3bb6a0b6@mail.gmail.com> On 8/2/07, Terry Reedy wrote: > "Jeffrey Yasskin" wrote in message > news:5d44f72f0708021153u7ea1f443jfdee3c167b011011 at mail.gmail.com... > > | def __bool__(self): > | """True if self != 0.""" > | return self != 0 > > Could this be a Number rather than Complex method? Yes, as could probably __add__, __sub__, __mul__, __abs__, and maybe a few others, but I didn't have a good criterion for distinguishing. What's fundamental about a "number"? I chose to punt for now, thinking that we can move operations up there later if we want. Remember that this is all duck typed anyway: if you overload a function based on Number vs Sequence, that doesn't stop you from dividing the numbers you got. > --------------- > > | There is no built-in rational type > > Floats constitute a bit-size bounded (like ints) set of rationals with > denominators restricted to powers of two. Decimal literals and Decimals > constitute a memory bounded (like longs) set of rationals with denominators > instead restricted to powers of ten. You are strictly correct, but I think people think of them as approximations to the real numbers, rather than restricted and inexact rationals. In particular, functions like exp, sin, etc. make sense on approximated reals, but not on rationals. > Math.frexp returns a disguised form of (numerator,denominator) (without > common factor of two removal). If undisguised functions were added (and > the same for Decimal), there would be no need, really, for class Real. > > If such were done, a .num_denom() method either supplementing or replacing > .numerator() and .denominator() and returning (num, denom) would have the > same efficiency justification of int.divmod. > > I would like to see a conforming Rat.py class with unrestricted > denominators. > -------------------- > > | And finally integers:: > | > | class Integral(Rational): > | """Integral adds a conversion to int and the bit-string > operations.""" > > The bit-string operations are not 'integer' operations. Rather they are > 'integers represented as powers of two' operations. While << and >> can be > interpreted (and implemented) as * and //, the other four are genernally > meaningless for other representations, such as prime factorization or > fibonacci base. The Lib Ref agrees: > 3.4.1 Bit-string Operations on Integer Types > Plain and long integer types support additional operations that make > sense > only for bit-strings > Other integer types should not have to support them to call themselves > Integral. So I think at least |, ^, &, and ~ should be removed from > Integral and put in a subclass thereof. Possible names are Pow2Int or > BitStringInt or BitIntegral. If some more people agree that they want to write integral types that aren't based on powers of 2 (but with operations like addition that a prime factorization representation wouldn't support), I wouldn't object to pulling those operators out of Integral. Then recall that Integral only needs to be in the standard library so that the std lib's type checks can check for it rather than int. Are there any type checks in the standard library that are looking for the bit-string operations? Can BitString go elsewhere until it's proven its worth? > ----------- > > In short, having read up to the beginning of Exact vs. Inexact Classes, my > suggestion is to delete the unrealizable 'real' class and add an easily > realizable non-bit-string integer class. There are a couple of representations of non-rational subsets of the reals from the algebraic numbers all the way up to computable reals represented by Cauchy sequences. http://darcs.haskell.org/numericprelude/docs/html/index.html has a couple of these. I think RootSet and PowerSeries are the most concrete there. -- Namast?, Jeffrey Yasskin From guido at python.org Fri Aug 3 05:14:30 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Aug 2007 20:14:30 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B28D08.9000700@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B28D08.9000700@canterbury.ac.nz> Message-ID: On 8/2/07, Greg Ewing wrote: > Guido van Rossum wrote: > > In order to support the use cases for %s and %r, I propose to allow > > appending a single letter 's', 'r' or 'f' to the width_specifier > > (*not* the conversion_specifier): > > > > 'r' always calls repr() on the object; > > 's' always calls str() on the object; > > 'f' calls the object's __format__() method passing it the > > conversion_specifier, or if it has no __format__() method, calls > > repr() on it. This is also the default. > > Won't it seem a bit unintuitive that 'r' and 's' have > to come before the colon, whereas all the others come > after it? That depends on how you think of it. My point is that these determine which formatting API is used. > It would seem more logical to me if 'r' and 's' were > treated as special cases of the conversion specifier > that are recognised before calling __format__. But that would make it impossible to write a __format__ method that takes a string that *might* consist of just 'r' or 's'. The conversion specifier should be completely opaque (as it is in .NET). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 3 05:16:53 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Aug 2007 20:16:53 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <87d4y5749p.fsf@ten22.rhodesmill.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <87d4y5749p.fsf@ten22.rhodesmill.org> Message-ID: On 8/2/07, Brandon Craig Rhodes wrote: > "Guido van Rossum" writes: > > > My personal suggestion is to stay close to the .NET formatting language: > > > > name_specifier [',' width_specifier] [':' conversion_specifier] > > A problem is that this format requires brute memorization to remember > where to put things. If letters were used to prefix specifications, > like "w" for width and "p" for precision, one could write something > like: > > >>> 'The average is: {0:w8p2} today.'.format(avg) > 'The average is: 7.24 today.' > > This would give users at least a shot at mnemonically parsing - and > constructing - format strings, and eliminate the problem of having to > decide what goes first. > > If, on the other hand, all we have to go on are some commas and > colons, then I, for one, will probably always have to look things up - > just like I always did for C-style percent-sign format specifications > in the first place. I fully expect having to look up the *conversion specifier* syntax, which is specific to each type. But I expect that the conversion specifier is relatively rarely used, and instead *most* uses will just use the width specifier. The width specifier is so simple and universal that one will quickly remember it. (Experimentation is also easy enough.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Fri Aug 3 07:59:01 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Thu, 2 Aug 2007 22:59:01 -0700 Subject: [Python-3000] removing __members__ and __methods__ Message-ID: __members__ and __methods__ are both deprecated as of 2.2 and there is the new __dir__. Is there any reason to keep them? I don't notice anything in PEP 3100, but it seems like they should be removed. Also PyMember_[GS]et are documented as obsolete and I plan to remove them unless I hear otherwise. n From nnorwitz at gmail.com Fri Aug 3 08:01:53 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Thu, 2 Aug 2007 23:01:53 -0700 Subject: [Python-3000] C API cleanup int/long Message-ID: Since there is a merged int/long type now, we need to decide how the C API should look. For example, should the APIs be prefixed with PyInt_* or PyLong_? What are the other issues? What do people want for the C API in 3k? n From greg.ewing at canterbury.ac.nz Fri Aug 3 08:19:12 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Aug 2007 18:19:12 +1200 Subject: [Python-3000] C API cleanup int/long In-Reply-To: References: Message-ID: <46B2C8E0.8080409@canterbury.ac.nz> Neal Norwitz wrote: > Since there is a merged int/long type now, we need to decide how the C > API should look. For example, should the APIs be prefixed with > PyInt_* or PyLong_? I've always assumed it would be Py_Int*. If any integer can be any length, it doesn't make sense to have any length-related words in the names. -- Greg From nnorwitz at gmail.com Fri Aug 3 08:31:13 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Thu, 2 Aug 2007 23:31:13 -0700 Subject: [Python-3000] C API cleanup int/long In-Reply-To: <46B2C8E0.8080409@canterbury.ac.nz> References: <46B2C8E0.8080409@canterbury.ac.nz> Message-ID: On 8/2/07, Greg Ewing wrote: > Neal Norwitz wrote: > > Since there is a merged int/long type now, we need to decide how the C > > API should look. For example, should the APIs be prefixed with > > PyInt_* or PyLong_? > > I've always assumed it would be Py_Int*. If any > integer can be any length, it doesn't make sense > to have any length-related words in the names. Aside from the name, are there other issues you can think of with any of the API changes? There are some small changes, things like macros only having a function form. Are these a problem? Str/unicode is going to be a big change. Any thoughts there? n From talin at acm.org Fri Aug 3 08:55:03 2007 From: talin at acm.org (Talin) Date: Thu, 02 Aug 2007 23:55:03 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> Message-ID: <46B2D147.90606@acm.org> Guido van Rossum wrote: > My personal suggestion is to stay close to the .NET formatting language: > > name_specifier [',' width_specifier] [':' conversion_specifier] > > where width_specifier is a positive or negative number giving the > minimum width (negative for left-alignment) and conversion_specifier > is passed uninterpreted to the object's __format__ method. Before I comment on this I think I need to clear up a mismatch between your understanding of how __format__ works and mine. In particular, why it won't work for float and int to define a __format__ method. Remember how I said in your office that it made sense to me there were two levels of format hooks in .Net? I realize that I wasn't being very clear at the time - as often happens when my thoughts are racing too fast for my mouth. What I meant was that conceptually, there are two stages of customization, which I will call "pre-coercion" and "post-coercion" customization. Before I explain what that means, let me say that I don't think that this is actually how .Net works, and I'm not proposing that there actually be two customization hooks. What I want to do is describe an abstract conceptual model of formatting, in which formatting occurs in a number of stages. Pre-coercion formatting means that the real type of the value is used to control formatting. We don't attempt to convert the value to an int or float or repr() or anything - instead it's allowed to completely dominate the interpretation of the format codes. So the case of the DateTime object interpreting its specifiers as a stftime argument falls into this case. In most cases, there won't be a pre-coercion hook. In which case the formatting proceeds to the next two stages, which are type coercion and then post-coercion formatting. The type coercion is driven be a *standard interpretation* of the format specifier. After the value is converted to the type, we then apply formatting that is specific to that type. Now, I always envisioned that __format__ would allow reinterpretation of the format specifier. Therefore, __format__ fits into this model as a pre-coercion customization hook - it has to come *before* the type coercion, because otherwise type information would be destroyed and __format__ wouldn't work. But the formatters for int and float have to happen *after* type coercion. Therefore, those formatters can't be the same as __format__. > In order to support the use cases for %s and %r, I propose to allow > appending a single letter 's', 'r' or 'f' to the width_specifier > (*not* the conversion_specifier): > > 'r' always calls repr() on the object; > 's' always calls str() on the object; > 'f' calls the object's __format__() method passing it the > conversion_specifier, or if it has no __format__() method, calls > repr() on it. This is also the default. > > If no __format__() method was called (either because 'r' or 's' was > used, or because there was no __format__() method on the object), the > conversion_specifier (if given) is a *maximum* length; this handles > the pretty common use cases of %.20s and %.20r (limiting the size of a > printed value). > > The numeric types are the main types that must provide __format__(). > (I also propose that for datetime types the format string ought to be > interpreted as a strftime format string.) I think that > float.__format__() should *not* support the integer formatting codes > (d, x, o etc.) -- I find the current '%d' % 3.14 == '3' an abomination > which is most likely an incidental effect of calling int() on the > argument (should really be __index__()). But int.__format__() should > support the float formatting codes; I think '%6.3f' % 12 should return > ' 12.000'. This is in line with 1/2 returning 0.5; int values should > produce results identical to the corresponding float values when used > in the same context. I think this should be solved inside > int.__format__() though; the generic formatting code should not have > to know about this. I don't agree that using the 'd' format type to print floats is an abomination, but that's because of a difference in design philosophy. I'm inclined to be permissive in this, because I don't see the benefit of being pedantic here, and I do see the potential usefulness of considering 'd' to be the same as 'f' with a precision of 0. But that's a detail. I want to think about the larger picture. Earlier I said that there were 6 attributes being controlled by the various specifiers, but based on the previous discussion there are actually 8, in no particular order: -- minimum width -- maximum width -- decimal precision -- alignment -- padding -- treatment of signs and negative numbers -- type coercion options -- number formatting options for a given type, such as exponential notation. That seems a lot of parameters to cram into a lowly format string, and I can't imagine that anyone would like a system that requires these all to be specified individually. It would be cumbersome and hard to remember. Fortunately, we recognize that these parameters are not all independent. Many combinations of parameters are nonsensical, especially when talking about non-number types. Therefore, we can can compress the visual specification of these attributes on a much smaller number of actual specified format codes. Traditionally the C sprintf function has done two kinds of 'multiplexing' of these codes. The first is to change the interpretation of a particular field (such as precision) based on the number formatting type. The second is to use letters to represent combinations of attributes - so for example the letter 'd' implies both that it's an integer type, and also how that integer type should be formatted. So the challenge is to try and figure out how to represent all of the sensible permutations of formatting attributes in a way which is both intuitive and mnemonic. There are two approaches to making this system programmer friendly: We can either try to invent the best possible system out of whole cloth, or we can steal from the past in the hopes that programmers who already know a previous syntax for format strings will be able to employ their prior knowledge. If we decide to create a new system out of whole cloth, then what do we have to work with? Well, as I see it we have the following tools at our disposal for encoding meaning in a short form: -- Various delimiter characters: :,.!#$ and so on. -- Letters to represent one or more attributes. -- Numbers to represent scalar quantities -- The relative ordering of all of the above. We also have to consider what it means to be 'intuitive'. In this case, we should consider that the various delimiter characters have connotations - such as the fact that '.' suggests a decimal point, or that '<' suggests a left-pointing arrow. (I should also mention that "a:b,c" looks prettier to my eye than "a,b:c". There's a reason for this, and its because of Python syntax. Now, in Python, ':' isn't an operator - but if it was, you would have to consider its precedence to be very low. Because when we look at an expression 'if x: a,b' we know that comma binds more tightly than the colon, and so it's the same thing as saying 'if x: (a,b)'. But in any case this is purely an aesthetic digression and not terribly weighty.) That's all I have to say for the moment - I'm still thinking this through. In any case, I think it's worthwhile to be scrutinizing this issue at a very low level and examining all of the assumptions. -- Talin From pc at gafol.net Fri Aug 3 09:22:39 2007 From: pc at gafol.net (Paul Colomiets) Date: Fri, 03 Aug 2007 10:22:39 +0300 Subject: [Python-3000] text_zipimport fix Message-ID: <46B2D7BF.1010502@gafol.net> Hi, I've just uploaded patch that fixes test_zipimport. http://www.python.org/sf/1766592 I'm still in doubt of some str/bytes issues. Fix me if I'm wrong. 1. imp.get_magic() should return bytes 2. loader.get_data() should return bytes 3. loader.get_source() should return str with encoding given from "# -*- coding: something -*-" header How to achieve third without reinventing something? Seems that compiler makes use of it something thought the ast, but I'm not sure. Currently it does PyString_FromStringAndSize(bytes) which should be equivalent of str(the_bytes), and uses utf8 I think. -- Paul. From rrr at ronadam.com Fri Aug 3 10:08:05 2007 From: rrr at ronadam.com (Ron Adam) Date: Fri, 03 Aug 2007 03:08:05 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B2D147.90606@acm.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> Message-ID: <46B2E265.5080905@ronadam.com> Talin wrote: > (I should also mention that "a:b,c" looks prettier to my eye than > "a,b:c". There's a reason for this, and its because of Python syntax. > Now, in Python, ':' isn't an operator - but if it was, you would have to > consider its precedence to be very low. Because when we look at an > expression 'if x: a,b' we know that comma binds more tightly than the > colon, and so it's the same thing as saying 'if x: (a,b)'. But in any > case this is purely an aesthetic digression and not terribly weighty.) +1 See below! :-) After a fair amount of experimenting today, I think I've found a nice middle ground that meets some of what both you and Guido are looking for. (And a bit my own preference too.) What I've come up with is... '{name: specifier_1, specifier_2}' Where the order of specifier_1 and specifier_2 are not dependent. What that does is shorten the short common cases. It's also one less thing to remember. :-) The field name is set off with a colon which I think helps strengthen it's relationship as being a key when kwds are used as an argument source. And it also resembles pythons syntax pattern more closely as you mentioned above. The requirement for this to work is that the alignment specifier and the format specifiers can never start with the same characters. Which turns out to be easy if you use type prefixes instead of postfixes on the format specifiers. All of the following work: The most common cases are very short and simple. '{0}, {1:s}, {2:r}, {3:d}' # etc... for types '{0:10}' # min field width '{0:.20}' # max field width '{0:^20}' # centered '{0:-20}' # right justified '{0:+20}' # left justified (default) '{0:10.20}' # min & max field widths together If it starts with a letter, it's a format specifier.: '{0:d+}, {1:d()}, {2:d-}' '{0:f.3}, {1:f-.6}' Or if it starts with '+', '-', or '^', or a digit, it's a field alignment specifier. Combinations: '{0:10,r}' # Specifiers are not ordered dependent. '{0:r,10}' # Both of these produce the same output. '{0:-10,f+.3}' -> " +123.346" '{0:-15,f()7.3}' -> " ( 123.456)" '{0:^10,r}' -> " 'Hello' " For filled types such as numbers with leading zeros, a '/' character can separate the numeric_width from the fill character. A numeric_width isn't the same as field width, so it's part of the type formatter here. # width/char '{0:d7/0}' -> '0000123' '{0:f7/0.3}' -> '0000123.000' Filled widths can be used in the alignment specifiers too and follow the same rules. '{0:^16/_,s}' -> '____John Doe____' 'Chapter {0:-10/.,d}' -> 'Chapter........10' Some motivational thoughts: - The prefix form may make remembering formatting character sequences easier. Or if not, an alphabetic look up table would work nicely as a reference. - The colon and comma don't move around or change places. That may help make it more readable and less likely to have bugs due to typos. - If the format specifiers style is too close to some other existing languages format, but different enough to not be interchangeable, it could be more confusing instead of less confusing. I have a partial python implementation with a doc test I can post or send to you if you want to see how the parsing is handled. The format specifier parsing isn't implemented, but the top level string, field, and alignment_specs parsing is. Its enough to see how it ties together. Its a rough sketch, but it should be easy to build on. Cheers, Ron From stargaming at gmail.com Fri Aug 3 10:13:58 2007 From: stargaming at gmail.com (Stargaming) Date: Fri, 3 Aug 2007 08:13:58 +0000 (UTC) Subject: [Python-3000] optimizing [x]range References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <46B233D2.4030304@v.loewis.de> Message-ID: On Thu, 02 Aug 2007 15:25:36 -0700, Guido van Rossum wrote: > On 8/2/07, "Martin v. L?wis" wrote: >> > The patch is based on the latest trunk/ checkout, Python 2.6. I don't >> > think this is a problem if nobody else made any effort towards making >> > xrange more sequence-like in the Python 3000 branch. The C source >> > might require some tab/space cleanup. >> >> Unfortunately, this is exactly what happened: In Py3k, the range object >> is defined in terms PyObject*, so your patch won't apply to the 3k >> branch. > > FWIW, making xrange (or range in Py3k) "more sequence-like" is exactly > what should *not* happen. No, that's exactly what *should* happen for optimization reasons. xrange has never (neither in 2.6 nor 3.0) had an sq_contains slot. Growing such a slot is a precondition for implementing xrange.__contains__ as an optimized special case, and that makes it more sequence-like on the side of the implementation. This does not mean it becomes more like the 2.x range, which we're abandoning. Sorry for the confusion. From jeremy at alum.mit.edu Fri Aug 3 16:14:51 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 3 Aug 2007 10:14:51 -0400 Subject: [Python-3000] socket makefile bug Message-ID: I'm looking into httplib problems on the struni branch. One unexpected problem is that socket.makefile() is not behaving correctly. The docs say "The file object references a dup()ped version of the socket file descriptor, so the file object and socket object may be closed or garbage-collected independently." In Python 3000, the object returned by makefile is no a dup()ped versoin of the file descriptor. If I close the socket, I close the file returned by makefile(). I can dig into the makefile() problem, but I thought I'd mention in the hopes that someone else thinks its easy to fix. Jeremy From talin at acm.org Fri Aug 3 18:06:45 2007 From: talin at acm.org (Talin) Date: Fri, 03 Aug 2007 09:06:45 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B2E265.5080905@ronadam.com> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> Message-ID: <46B35295.1030007@acm.org> Ron Adam wrote: > After a fair amount of experimenting today, I think I've found a nice > middle ground that meets some of what both you and Guido are looking > for. (And a bit my own preference too.) First off, thank you very much for taking the time to think about this in such detail. There are a lot of good ideas here. What's missing, however, is a description of how all of this interacts with the __format__ hook. The problem we are facing right now is sometimes we want to override the __format__ hook and sometimes we don't. Right now, the model that we want seems to be: 1) High precedence type coercion, i.e. 'r', which bypasses __format__. 2) Check for __format__, and let it interpret the format specifier. 3) Regular type coercion, i.e. 'd', 'f' and so on. 4) Regular formatting based on type. -- Talin From guido at python.org Fri Aug 3 18:27:30 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 3 Aug 2007 09:27:30 -0700 Subject: [Python-3000] socket makefile bug In-Reply-To: References: Message-ID: The docs are out of date, we don't dup() any more (that was needed only because we were using fdopen()). But what *should* happen is that when you close the file object the socket is still open. The socket wrapper's close() method should be fixed. I can look into that later today. On 8/3/07, Jeremy Hylton wrote: > I'm looking into httplib problems on the struni branch. One > unexpected problem is that socket.makefile() is not behaving > correctly. The docs say "The file object references a dup()ped version > of the socket file descriptor, so the file object and socket object > may be closed or garbage-collected independently." In Python 3000, > the object returned by makefile is no a dup()ped versoin of the file > descriptor. If I close the socket, I close the file returned by > makefile(). > > I can dig into the makefile() problem, but I thought I'd mention in > the hopes that someone else thinks its easy to fix. > > Jeremy > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Fri Aug 3 18:34:06 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 3 Aug 2007 12:34:06 -0400 Subject: [Python-3000] socket makefile bug In-Reply-To: References: Message-ID: On 8/3/07, Guido van Rossum wrote: > The docs are out of date, we don't dup() any more (that was needed > only because we were using fdopen()). But what *should* happen is that > when you close the file object the socket is still open. The socket > wrapper's close() method should be fixed. I can look into that later > today. Ok. I confirmed that calling dup() fixes the problem, but that doesn't work on Windows. I also uncovered a bug in socket.py, which fails to set _can_dup_socket to True on platforms where you can dup a socket. Jeremy > > On 8/3/07, Jeremy Hylton wrote: > > I'm looking into httplib problems on the struni branch. One > > unexpected problem is that socket.makefile() is not behaving > > correctly. The docs say "The file object references a dup()ped version > > of the socket file descriptor, so the file object and socket object > > may be closed or garbage-collected independently." In Python 3000, > > the object returned by makefile is no a dup()ped versoin of the file > > descriptor. If I close the socket, I close the file returned by > > makefile(). > > > > I can dig into the makefile() problem, but I thought I'd mention in > > the hopes that someone else thinks its easy to fix. > > > > Jeremy > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From guido at python.org Fri Aug 3 19:04:28 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 3 Aug 2007 10:04:28 -0700 Subject: [Python-3000] removing __members__ and __methods__ In-Reply-To: References: Message-ID: Yes, they should all go. Expect some cleanup though! On 8/2/07, Neal Norwitz wrote: > __members__ and __methods__ are both deprecated as of 2.2 and there is > the new __dir__. Is there any reason to keep them? I don't notice > anything in PEP 3100, but it seems like they should be removed. > > Also PyMember_[GS]et are documented as obsolete and I plan to remove > them unless I hear otherwise. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 3 19:06:06 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 3 Aug 2007 10:06:06 -0700 Subject: [Python-3000] optimizing [x]range In-Reply-To: References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <46B233D2.4030304@v.loewis.de> Message-ID: On 8/3/07, Stargaming wrote: > On Thu, 02 Aug 2007 15:25:36 -0700, Guido van Rossum wrote: > > > On 8/2/07, "Martin v. L?wis" wrote: > >> > The patch is based on the latest trunk/ checkout, Python 2.6. I don't > >> > think this is a problem if nobody else made any effort towards making > >> > xrange more sequence-like in the Python 3000 branch. The C source > >> > might require some tab/space cleanup. > >> > >> Unfortunately, this is exactly what happened: In Py3k, the range object > >> is defined in terms PyObject*, so your patch won't apply to the 3k > >> branch. > > > > FWIW, making xrange (or range in Py3k) "more sequence-like" is exactly > > what should *not* happen. > > No, that's exactly what *should* happen for optimization reasons. > > xrange has never (neither in 2.6 nor 3.0) had an sq_contains slot. > Growing such a slot is a precondition for implementing > xrange.__contains__ as an optimized special case, and that makes it more > sequence-like on the side of the implementation. This does not mean it > becomes more like the 2.x range, which we're abandoning. > Sorry for the confusion. OK, gotcha. I was just warning not to add silliness like slicing. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 3 19:20:50 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 3 Aug 2007 10:20:50 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B35295.1030007@acm.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> Message-ID: I have no time for a complete response, but a few quickies: - The more I think about it the, more I think putting knowledge of floating point formatting into the wrapper is wrong. I really think we should put this into float.__format__ (and int.__format__, and Decimal.__format__). I can't find a reason why you don't want this; perhaps it is an axiom? I think it needs to be challenged. - The relative priorities of colon and comma vary by context; e.g. in a[i:j, m:n] the colon binds tighter. - Interpreting X.Y as min.max, while conventional in C, is hard to remember. - If we're going to deviate from .NET, we should deviate strongly, and I propose using semicolon as delimiter. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 3 20:41:52 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 3 Aug 2007 11:41:52 -0700 Subject: [Python-3000] text_zipimport fix In-Reply-To: <46B2D7BF.1010502@gafol.net> References: <46B2D7BF.1010502@gafol.net> Message-ID: I've checked this in as r56707. It looks fine at cursory inspection; if someone wants to test the handling of encodings more thoroughly, be my guest. --Guido On 8/3/07, Paul Colomiets wrote: > Hi, > > I've just uploaded patch that fixes test_zipimport. > http://www.python.org/sf/1766592 > > I'm still in doubt of some str/bytes issues. Fix me if I'm wrong. > 1. imp.get_magic() should return bytes > 2. loader.get_data() should return bytes > 3. loader.get_source() should return str with encoding given from "# -*- > coding: something -*-" header > > How to achieve third without reinventing something? Seems that compiler > makes use of it something thought the ast, but I'm not sure. > > Currently it does PyString_FromStringAndSize(bytes) which should be > equivalent of str(the_bytes), and uses utf8 I think. > > -- > Paul. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rrr at ronadam.com Fri Aug 3 22:37:36 2007 From: rrr at ronadam.com (Ron Adam) Date: Fri, 03 Aug 2007 15:37:36 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B35295.1030007@acm.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> Message-ID: <46B39210.6060202@ronadam.com> Talin wrote: > Ron Adam wrote: >> After a fair amount of experimenting today, I think I've found a nice >> middle ground that meets some of what both you and Guido are looking >> for. (And a bit my own preference too.) > > First off, thank you very much for taking the time to think about this > in such detail. There are a lot of good ideas here. Thanks, I use string operations a *lot* and I really do want it to work as easy as possible in a wide variety of situations. > What's missing, however, is a description of how all of this interacts > with the __format__ hook. The problem we are facing right now is > sometimes we want to override the __format__ hook and sometimes we > don't. Right now, the model that we want seems to be: > > 1) High precedence type coercion, i.e. 'r', which bypasses __format__. I think you are looking for an internal simplicity which isn't needed, and most people won't even think about. The exposed interface doesn't have any ambiguities if 'r' is a format specification just like 's', 'd', or 'f'. These are what the formatter will dispatch on. I think a few if/else's to catch types that will call __repr__ and __str__, instead of __format__ aren't that costly. I think there are other areas that can be optimized more, and/or other points where we can hook into and modify the results. Or am I missing something still? Maybe if you give an example where it makes a difference it would help us sort it out. > 2) Check for __format__, and let it interpret the format specifier. > 3) Regular type coercion, i.e. 'd', 'f' and so on. > 4) Regular formatting based on type. The sequence of parsing I have so far. 1. Split string into a dictionary of fields and a list of string parts. 'Partnumber: {0:10}, Price: ${1:f.2}'.format('123abc', 99.95) Results in... {'0':('', 10), '1':('f.2', '')} # key:(format_spec, align_spec) ['Partnumber: ', '{0}', ' Price: $', '{1}'] 2. Apply the format_spec and then the alignment_spec to the arguments. {'0':'123abc ', '1':'99.95'} * If the arguments are a sequence, they are enumerated to get keys. * If they are a dict, the existing keys are used. * Passing both *args and **kwds should also work. 3. Replace the keys in the string list with the corresponding formatted dictionary values. ['Partnumber: ', '123abc ', ' Price: $', '99.95'] 4. Join the string parts back together. 'Partnumber: 123abc Price: $99.95' It may be useful to expose some of these intermediate steps so that we could pre-process the specifications, or post-process the formatted results before it gets merged back into the string. Which seems to fit with some of your thoughts, although I think you are thinking more in the line of overriding methods instead of directly accessing the data. A little bit of both could go a long ways. Cheers, Ron From rrr at ronadam.com Fri Aug 3 23:18:55 2007 From: rrr at ronadam.com (Ron Adam) Date: Fri, 03 Aug 2007 16:18:55 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> Message-ID: <46B39BBF.80809@ronadam.com> Guido van Rossum wrote: > I have no time for a complete response, but a few quickies: > > - The more I think about it the, more I think putting knowledge of > floating point formatting into the wrapper is wrong. I really think we > should put this into float.__format__ (and int.__format__, and > Decimal.__format__). I can't find a reason why you don't want this; > perhaps it is an axiom? I think it needs to be challenged. I agree. > - The relative priorities of colon and comma vary by context; e.g. in > a[i:j, m:n] the colon binds tighter. > > - Interpreting X.Y as min.max, while conventional in C, is hard to remember. > > - If we're going to deviate from .NET, we should deviate strongly, and > I propose using semicolon as delimiter. So ... '{0:10.20,f.2}' would become... '{0:10;20,f.2}' Works for me. I think this would be better because then decimal places would be recognizable right off because they *would* have a decimal before them. And nothing else would. And min;max would be recognizable right off because of the semicolon. And I can't think of anything that would be unique and work better. Cheers, Ron From guido at python.org Sat Aug 4 00:43:33 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 3 Aug 2007 15:43:33 -0700 Subject: [Python-3000] socket makefile bug In-Reply-To: References: Message-ID: On 8/3/07, Jeremy Hylton wrote: > On 8/3/07, Guido van Rossum wrote: > > The docs are out of date, we don't dup() any more (that was needed > > only because we were using fdopen()). But what *should* happen is that > > when you close the file object the socket is still open. The socket > > wrapper's close() method should be fixed. I can look into that later > > today. > > Ok. I confirmed that calling dup() fixes the problem, but that > doesn't work on Windows. I also uncovered a bug in socket.py, which > fails to set _can_dup_socket to True on platforms where you can dup a > socket. Followup: Jeremy fixed this by adding an explicit reference count to the socket object, counting how many makefile() streams are hanging off it. A few more unit tests (including httplib) are now working. However, things are still not all good. E.g. $ rm -f CP936.TXT $ ./python Lib/test/regrtest.py -uall test_codecmaps_cn test_codecmaps_cn fetching http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT ... test test_codecmaps_cn crashed -- : (9, 'Bad file descriptor') 1 test failed: test_codecmaps_cn [68157 refs] $ ./python ... >>> import urllib [46065 refs] >>> x = urllib.urlopen("http://python.org").read() Traceback (most recent call last): File "", line 1, in File "/usr/local/google/home/guido/python/py3k-struni/Lib/io.py", line 390, in read return self.readall() File "/usr/local/google/home/guido/python/py3k-struni/Lib/io.py", line 400, in readall data = self.read(DEFAULT_BUFFER_SIZE) File "/usr/local/google/home/guido/python/py3k-struni/Lib/io.py", line 392, in read n = self.readinto(b) File "/usr/local/google/home/guido/python/py3k-struni/Lib/socket.py", line 264, in readinto return self._sock.recv_into(b) socket.error: (9, 'Bad file descriptor') [60365 refs] >>> -- --Guido van Rossum (home page: http://www.python.org/~guido/) From dalcinl at gmail.com Sat Aug 4 01:44:23 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 3 Aug 2007 20:44:23 -0300 Subject: [Python-3000] optimizing [x]range In-Reply-To: References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> Message-ID: On 8/2/07, Stargaming wrote: > >> made into an O(1) operation. here's a demo code (it should be trivial > >> to implement this in CPython) > [snipped algorithm] Did you taked into account that your patch is not backward compatible with py2.5?? Just try to do this with your patch, $ python Python 2.5.1 (r251:54863, Jun 1 2007, 12:15:26) >>> class A: ... def __eq__(self, other): ... return other == 3 ... >>> A() in xrange(3) False >>> A() in xrange(4) True >>> I know, my example is biased, but I have to insist. With this patch, 'a in xrange' will in general not be the same as 'a in range(...)'. I am fine with this for py3k, but not sure if all people will agree on this for python 2.6. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From rrr at ronadam.com Sat Aug 4 02:12:17 2007 From: rrr at ronadam.com (Ron Adam) Date: Fri, 03 Aug 2007 19:12:17 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B35295.1030007@acm.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> Message-ID: <46B3C461.80704@ronadam.com> Talin wrote: > What's missing, however, is a description of how all of this interacts > with the __format__ hook. The problem we are facing right now is > sometimes we want to override the __format__ hook and sometimes we > don't. Right now, the model that we want seems to be: > > 1) High precedence type coercion, i.e. 'r', which bypasses __format__. > 2) Check for __format__, and let it interpret the format specifier. > 3) Regular type coercion, i.e. 'd', 'f' and so on. > 4) Regular formatting based on type. A few more thoughts regarding this.. We can divide these into concrete and abstract type specifiers. Concrete type specifiers only accept a specific type. They would raise an exception if the argument is of another type. Concrete type specifiers would always passed to an objects __format__ method. Abstract type specifications are more lenient and work with a wider range of objects because they use duck typing. So that splits things up as follows... (I added an abstract 't' text type, to resolve the ambiguity of the 's' type either calling __format__ or __str__) Concrete type specifiers: s - string type (not the __str__ method in this case) b,c,d,o,x,X - int type e,E,f,F - float type Abstract types specifiers: (uses duck typing) ! - calls __format__ method, fallbacks... (__str__, __repr__) t - (text) calls __str__ method, no fallback r - calls __repr__ method, fallback (__str__) The '!' type could be the default if no type is specified, it is needed if you also specify any formatting options. '{0:!xyz}' So '!xyz' would be passed to the __format__ method if it exists and it would be up to that type to know what to do with the xyz. The format_spec wouldn't be passed to __str__ or __repr__. Should there be an abstract numeric type? These would call an objects __int__ or __float__ method if it exists. One of the things I've noticed is you have 'd' instead of 'i'. Maybe 'i' should be the concrete integer type, and 'd' and abstract integer type that calls an objects __int__ method. Then we would need a floating point abstract type that calls __float__ ... 'g'? In the case of an abstract type that does a conversion the format spec could be forwarded to the objects __format__ method of the returned object. Then we have the following left over... 'g' - General format. This prints the number as a fixed-point number, unless the number is too large, in which case it switches to 'e' exponent notation. 'G' - General format. Same as 'g' except switches to 'E' if the number gets to large. 'n' - Number. This is the same as 'g', except that it uses the current locale setting to insert the appropriate number separator characters. '%' - Percentage. Multiplies the number by 100 and displays in fixed ('f') format, followed by a percent sign. Now I'm not sure how to best handle these. Are they abstract and call a particular __method__, or are they concrete and always call __format__ on a particular type? Cheers, Ron From adam at hupp.org Sat Aug 4 02:15:06 2007 From: adam at hupp.org (Adam Hupp) Date: Fri, 3 Aug 2007 19:15:06 -0500 Subject: [Python-3000] patch for csv test failures Message-ID: <20070804001505.GA22643@mouth.upl.cs.wisc.edu> I've uploaded a patch to SF[0] that fixes the csv struni test failures. The patch also implements unicode support in the _csv C module. Some questions: 1. The CSV PEP (305) lists Unicode support as a TODO. Is there a particular person I should talk to have this change reviewed? 2. PEP 7 (C style guide) says to use single tab indentation, except for py3k which uses 4 spaces per indent. _csv.c has a mix of both spaces and tabs. Should I reindent the whole thing or just leave it as-is? [0] http://www.python.org/sf/1767398 -- Adam Hupp | http://hupp.org/adam/ From rhamph at gmail.com Sat Aug 4 07:03:14 2007 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 3 Aug 2007 23:03:14 -0600 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B35295.1030007@acm.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> Message-ID: On 8/3/07, Talin wrote: > Ron Adam wrote: > > After a fair amount of experimenting today, I think I've found a nice > > middle ground that meets some of what both you and Guido are looking > > for. (And a bit my own preference too.) > > First off, thank you very much for taking the time to think about this > in such detail. There are a lot of good ideas here. > > What's missing, however, is a description of how all of this interacts > with the __format__ hook. The problem we are facing right now is > sometimes we want to override the __format__ hook and sometimes we > don't. Right now, the model that we want seems to be: > > 1) High precedence type coercion, i.e. 'r', which bypasses __format__. > 2) Check for __format__, and let it interpret the format specifier. > 3) Regular type coercion, i.e. 'd', 'f' and so on. > 4) Regular formatting based on type. Why not let __format__ return NotImplemented as meaning "use a fallback". E.g., 'd' would fall back to obj.__index__, 'r' to repr(obj), etc. You'd then have code like this: class float: def __format__(self, type, ...): if type == 'f': return formatted float else: return NotImplemented class MyFloat: def __format__(self, type, ...): if type == 'D': return custom format else: return float(self).__format__(type, ...) class Decimal: def __format__(self, type, ...): if type == 'f': return formatted similar to float else: return NotImplemented def handle_format(obj, type, ...): if hasattr(obj, '__format__'): s = obj.__format__(type, ...) else: s = NotImplemented if s is NotImplemented: if type == 'f': s = float(obj).__format__(type, ...) elif type == 'd': s = operator.index(obj).__format__(type, ...) elif type == 'r': s = repr(obj) elif type == 's': s = str(obj) else: raise ValueError("Unsupported format type") return s -- Adam Olsen, aka Rhamphoryncus From kbk at shore.net Sat Aug 4 07:11:12 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Sat, 04 Aug 2007 01:11:12 -0400 Subject: [Python-3000] map() Returns Iterator Message-ID: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> Although there has been quite a bit of discussion on dropping reduce() and retaining map(), filter(), and zip(), there has been less discussion (at least that I can find) on changing them to return iterators instead of lists. I think of map() and filter() as sequence transformers. To me, it's an unexpected semantic change that the result is no longer a list. In existing Lib/ code, it's twice as likely that the result of map() will be assigned than to use it as an iterator in a flow control statement. If the statistics on the usage of map() stay the same, 2/3 of the time the current implementation will require code like foo = list(map(fcn, bar)). map() and filter() were retained primarily because they can produce more compact and readable code when used correctly. Adding list() most of the time seems to diminish this benefit, especially when combined with a lambda as the first arg. There are a number of instances where map() is called for its side effect, e.g. map(print, line_sequence) with the return result ignored. In py3k this has caused many silent failures. We've been weeding these out, and there are only a couple left, but there are no doubt many more in 3rd party code. The situation with filter() is similar, though it's not used purely for side effects. zip() is infrequently used. However, IMO for consistency they should all act the same way. I've seen GvR slides suggesting replacing map() et. al. with list comprehensions, but never with generator expressions. PEP 3100: "Make built-ins return an iterator where appropriate (e.g. range(), zip(), map(), filter(), etc.)" It makes sense for range() to return an iterator. I have my doubts on map(), filter(), and zip(). Having them return iterators seems to be a premature optimization. Could something be done in the ast phase of compilation instead? -- KBK From rhamph at gmail.com Sat Aug 4 07:13:05 2007 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 3 Aug 2007 23:13:05 -0600 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> Message-ID: On 8/3/07, Adam Olsen wrote: > class MyFloat: > def __format__(self, type, ...): > if type == 'D': > return custom format > else: > return float(self).__format__(type, ...) Oops, explicitly falling back to float is unnecessary here. It should instead be: class MyFloat: def __float__(self): return self as float def __format__(self, type, ...): if type == 'D': return custom format else: return NotImplemented # Falls back to self.__float__().__format__() -- Adam Olsen, aka Rhamphoryncus From jyasskin at gmail.com Sat Aug 4 09:56:04 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sat, 4 Aug 2007 00:56:04 -0700 Subject: [Python-3000] map() Returns Iterator In-Reply-To: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> Message-ID: <5d44f72f0708040056y7f7b8f0ah2141ee7230860b2f@mail.gmail.com> Is it possible to make the result of map() look like a list if people are paying attention, but use memory like an iterator when they're not? We'd want to distinguish between: x = map(...) and for x in map(...) Actually, to get any use out of it, we'd need to allow the first case, as long as the first call to .__iter__() were also the last use of the value. (I think this makes the return value of map() a 'view' rather than an iterator?) How could we know that in time to do something about it? It looks to me that this could be accomplished if the last use of a variable in a particular scope didn't increment the reference count when passing that variable to a function. (Of course, I don't know anything about how function calls actually work, which is why this is wild speculation.) Then when the map-view's .__iter__() method is called, it could check self's reference count. If that count is 1, just proceed to iterate down the list, throwing away values after computing them. If the count is >1, then create a list and fill it while computing the map. This could also be a handy optimization for plain list iterators, and maybe other types. If .__iter__() is called on the last reference to a particular value, then as it walks the list, it can decrement the reference count of the items it has passed, since nobody can ever again retrieve them through that list. Also, calling map() for its side-effects is a perversion of the concept and shouldn't be encouraged by the language. Write a for loop. ;) On 8/3/07, Kurt B. Kaiser wrote: > Although there has been quite a bit of discussion on dropping reduce() > and retaining map(), filter(), and zip(), there has been less discussion > (at least that I can find) on changing them to return iterators instead > of lists. > > I think of map() and filter() as sequence transformers. To me, it's > an unexpected semantic change that the result is no longer a list. > > In existing Lib/ code, it's twice as likely that the result of map() > will be assigned than to use it as an iterator in a flow control > statement. > > If the statistics on the usage of map() stay the same, 2/3 of the time > the current implementation will require code like > > foo = list(map(fcn, bar)). > > map() and filter() were retained primarily because they can produce > more compact and readable code when used correctly. Adding list() most > of the time seems to diminish this benefit, especially when combined with > a lambda as the first arg. > > There are a number of instances where map() is called for its side > effect, e.g. > > map(print, line_sequence) > > with the return result ignored. In py3k this has caused many silent > failures. We've been weeding these out, and there are only a couple > left, but there are no doubt many more in 3rd party code. > > The situation with filter() is similar, though it's not used purely > for side effects. zip() is infrequently used. However, IMO for > consistency they should all act the same way. > > I've seen GvR slides suggesting replacing map() et. al. with list > comprehensions, but never with generator expressions. > > PEP 3100: "Make built-ins return an iterator where appropriate > (e.g. range(), zip(), map(), filter(), etc.)" > > It makes sense for range() to return an iterator. I have my doubts on > map(), filter(), and zip(). Having them return iterators seems to > be a premature optimization. Could something be done in the ast phase > of compilation instead? > > > > > > > -- > KBK > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jyasskin%40gmail.com > -- Namast?, Jeffrey Yasskin http://jeffrey.yasskin.info/ "Religion is an improper response to the Divine." ? "Skinny Legs and All", by Tom Robbins From greg.ewing at canterbury.ac.nz Sat Aug 4 13:55:23 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 04 Aug 2007 23:55:23 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B2D147.90606@acm.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> Message-ID: <46B4692B.7060308@canterbury.ac.nz> Talin wrote: > But the formatters for int and float have to happen *after* type > coercion. I don't see why. Couldn't the __format__ method for an int recognise float formats as well and coerce itself as necessary? > (I should also mention that "a:b,c" looks prettier to my eye than > "a,b:c". It seems more logical to me, too, for the colon to separate the value from all the stuff telling how to format it. -- Greg From greg.ewing at canterbury.ac.nz Sat Aug 4 14:15:29 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 05 Aug 2007 00:15:29 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B39BBF.80809@ronadam.com> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> Message-ID: <46B46DE1.1090403@canterbury.ac.nz> Ron Adam wrote: > '{0:10;20,f.2}' > > Works for me. It doesn't work for me, as it breaks up into 0:10; 20,f.2 i.e. semicolons separate more strongly than commas to my eyes. -- Greg From greg.ewing at canterbury.ac.nz Sat Aug 4 14:33:16 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 05 Aug 2007 00:33:16 +1200 Subject: [Python-3000] map() Returns Iterator In-Reply-To: <5d44f72f0708040056y7f7b8f0ah2141ee7230860b2f@mail.gmail.com> References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> <5d44f72f0708040056y7f7b8f0ah2141ee7230860b2f@mail.gmail.com> Message-ID: <46B4720C.5050504@canterbury.ac.nz> Jeffrey Yasskin wrote: > > Is it possible to make the result of map() look like a list if people > are paying attention, but use memory like an iterator when they're > not? I suppose it could lazily materialise a list behind the scenes when needed (i.e. on the first __getitem__ or __len__ call), but the semantics still wouldn't be *exactly* the same, as it wouldn't be possible to iterate over it more than once. Also as any side-effects would be delayed. -- Greg From rrr at ronadam.com Sat Aug 4 17:02:39 2007 From: rrr at ronadam.com (Ron Adam) Date: Sat, 04 Aug 2007 10:02:39 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B46DE1.1090403@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> Message-ID: <46B4950F.40905@ronadam.com> Greg Ewing wrote: > Ron Adam wrote: > >> '{0:10;20,f.2}' >> >> Works for me. > > It doesn't work for me, as it breaks up into > > 0:10; 20,f.2 > > i.e. semicolons separate more strongly than commas > to my eyes. And after my reply I realized this looks a bit odd. {0:;20,f.2} But I figured I could get used to it. An alternative I thought of this morning is to reuse the alignment symbols '^', '+', and '-' and require a minimum width if a maximum width is specified. Then the field aligner would also have instructions for how to align/trim something if it is wider than max_width. {0:0+7} 'Hello W' {0:0-7} 'o World' {0:0^7} 'llo Wor' Where this may make the most sense is if we are centering something in a minimum width field, but want to be sure we can see one end or the other if it's over the maximum field width. The trade off is it adds an extra character for cases where only max_width would be needed. {0:0+20,f.2} Separators in tkinter: The text widget has a line.column index. '1.0' is the first line, and first character. # window geometry is width x height + x_offset + y_offset (no spaces!) root.geometry("400x300+30+30") Cheers, Ron From rrr at ronadam.com Sat Aug 4 18:30:24 2007 From: rrr at ronadam.com (Ron Adam) Date: Sat, 04 Aug 2007 11:30:24 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B4950F.40905@ronadam.com> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> Message-ID: <46B4A9A0.9070206@ronadam.com> Ron Adam wrote: > An alternative I thought of this morning is to reuse the alignment symbols > '^', '+', and '-' and require a minimum width if a maximum width is specified. One more (or two) additions to this... In the common cases of generating columnar reports, the min_width and max_width values would be equal. So instead of repeating the numbers we could just prefix this case with double alignment symbols. So instead of: '{0:+20+20}, {1:^100+100}, {2:-15+15}' We could use: '{0:++20}, {1:^+100}, {2:-+15}' Which would result in a first column that right aligns, a second column that centers unless the value is longer than 100, in which case it right align, and cuts the end, and a third column that left aligns, but cuts off the right if it's over 15. One other feature might be to use the fill syntax form to specify an overflow replacement character... '{0:10+10/#}'.format('Python') -> 'Python ' '{0:10+10/#}'.format('To be, or not to be.') -> '##########' Another way to think of the double alignment specification term, is that it moves slicing and preformatting of exceptional cases into the string format operation so we don't have to do the following just to catch the rare possibility of exceptional cases. And it avoids altering the source data. if len(value1)>max_width: value = value[:max_width] # or [len(value)-max_with:] Etc... for value2, value3 ... line = format_string.format(value1, value2, value3, ...) Cheers, Ron From skip at pobox.com Sat Aug 4 23:06:30 2007 From: skip at pobox.com (skip at pobox.com) Date: Sat, 4 Aug 2007 16:06:30 -0500 Subject: [Python-3000] py3k conversion docs? Message-ID: <18100.59990.335150.692487@montanaro.dyndns.org> I'm looking at the recently submitted patch for the csv module and am scratching my head a bit trying to understand the code transformations. I've not looked at any py3k code yet, so this is all new to me. Is there any documentation about the Py3k conversion? I'm particularly interested in the string->unicode conversion. Here's one confusing conversion. I see PyString_FromStringAndSize replaced by PyUnicode_FromUnicode. In another place I see PyString_FromString replaced by PyUnicodeDecodeASCII. In some places I see a char left alone. In other places I see it replaced by PyUNICODE. Skip From skip at pobox.com Sat Aug 4 23:41:26 2007 From: skip at pobox.com (skip at pobox.com) Date: Sat, 4 Aug 2007 16:41:26 -0500 Subject: [Python-3000] atexit module problems/questions Message-ID: <18100.62086.177289.274444@montanaro.dyndns.org> During the recast of the atexit module into C it grew _clear and unregister functions. I can understand that a clear function might be handy, but why is it private? Given that sys.exitfunc is gone is there a reason to have _run_exitfuncs? Who's going to call it? Finally, I can see a situation where you might register the same function multiple times with different argument lists, yet unregister takes only the function as the discriminator. I think that's going to be of at-best minimal use, and error-prone. (A common use might be to register os.unlink for various files the program created during its run. Unregister would be quite useless there.) In GTK's gobject library, when you register idle or timer functions an integer id is returned. That id is what is used to later remove that function. If you decide to retainn unregister (I would vote to remove it if it's not going to be fixed) I think you might as well break the register function's api and return ncallbacks instead of the function. Skip From guido at python.org Sat Aug 4 23:48:47 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 4 Aug 2007 14:48:47 -0700 Subject: [Python-3000] py3k conversion docs? In-Reply-To: <18100.59990.335150.692487@montanaro.dyndns.org> References: <18100.59990.335150.692487@montanaro.dyndns.org> Message-ID: I haven't seen the patch you mention, and unfortunately there aren't docs for the conversion yet. However, one thing to note is that in 2.x, the PyString type ('str') is used for binary data, encoded text data, and decoded text data. In 3.0, binary and encoded text are represented using PyBytes ('bytes'), and decoded text is represented as PyUnicode (now called 'str'). Perhaps it helps understanding the patch knowing that 'char*' is likely encoded text, while 'PyUNICODE*' is likely decoded text. Sorry, --Guido On 8/4/07, skip at pobox.com wrote: > I'm looking at the recently submitted patch for the csv module and am > scratching my head a bit trying to understand the code transformations. > I've not looked at any py3k code yet, so this is all new to me. Is there > any documentation about the Py3k conversion? I'm particularly interested in > the string->unicode conversion. > > Here's one confusing conversion. I see PyString_FromStringAndSize replaced > by PyUnicode_FromUnicode. In another place I see PyString_FromString > replaced by PyUnicodeDecodeASCII. In some places I see a char left alone. > In other places I see it replaced by PyUNICODE. > > Skip > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Sat Aug 4 23:49:50 2007 From: skip at pobox.com (skip at pobox.com) Date: Sat, 4 Aug 2007 16:49:50 -0500 Subject: [Python-3000] atexit module problems/questions In-Reply-To: <18100.62086.177289.274444@montanaro.dyndns.org> References: <18100.62086.177289.274444@montanaro.dyndns.org> Message-ID: <18100.62590.744303.912325@montanaro.dyndns.org> skip> Given that sys.exitfunc is gone is there a reason to have skip> _run_exitfuncs? Who's going to call it? I should have elaborated. Clearly you need some way to call it, but since that is going to be called from C code (isn't it?), why expose it to Python code? Skip From lists at cheimes.de Sun Aug 5 02:44:44 2007 From: lists at cheimes.de (Christian Heimes) Date: Sun, 05 Aug 2007 02:44:44 +0200 Subject: [Python-3000] atexit module problems/questions In-Reply-To: <18100.62590.744303.912325@montanaro.dyndns.org> References: <18100.62086.177289.274444@montanaro.dyndns.org> <18100.62590.744303.912325@montanaro.dyndns.org> Message-ID: skip at pobox.com wrote: > skip> Given that sys.exitfunc is gone is there a reason to have > skip> _run_exitfuncs? Who's going to call it? > > I should have elaborated. Clearly you need some way to call it, but since > that is going to be called from C code (isn't it?), why expose it to Python > code? Unit tests? Some developers might want to test their registered functions. Christian From greg.ewing at canterbury.ac.nz Sun Aug 5 03:13:06 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 05 Aug 2007 13:13:06 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B4A9A0.9070206@ronadam.com> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> Message-ID: <46B52422.2090006@canterbury.ac.nz> Ron Adam wrote: > Which would result in a first column that right aligns, a second column > that centers unless the value is longer than 100, in which case it right > align, and cuts the end, and a third column that left aligns, but cuts off > the right if it's over 15. All this talk about cutting things off worries me. In the case of numbers at least, if you can't afford to expand the column width, normally the right thing to do is *not* to cut them off, but replace them with **** or some other thing that stands out. This suggests that the formatting and field width options may not be as easily separable as we would like. -- Greg From greg.ewing at canterbury.ac.nz Sun Aug 5 03:25:18 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 05 Aug 2007 13:25:18 +1200 Subject: [Python-3000] atexit module problems/questions In-Reply-To: <18100.62086.177289.274444@montanaro.dyndns.org> References: <18100.62086.177289.274444@montanaro.dyndns.org> Message-ID: <46B526FE.4060500@canterbury.ac.nz> skip at pobox.com wrote: > I can see a situation where you might register the same function > multiple times with different argument lists, yet unregister takes only the > function as the discriminator. One way to fix this would be to remove the ability to register arguments along with the function. It's not necessary, as you can always use a closure to get the same effect. Then you have a unique handle for each registered callback. -- Greg From jyasskin at gmail.com Sun Aug 5 03:53:45 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sat, 4 Aug 2007 18:53:45 -0700 Subject: [Python-3000] test_asyncore fails intermittently on Darwin In-Reply-To: <46AE943D.1040105@canterbury.ac.nz> References: <2cda2fc90707261505tdd9a0f1t861b5801c37ad11e@mail.gmail.com> <1d36917a0707261618oac94f20l98f464a2ab1edc4e@mail.gmail.com> <2cda2fc90707292338pff060c1i810737dcf6d5df54@mail.gmail.com> <2cda2fc90707292340k7eb11f2w82003e6f705438c3@mail.gmail.com> <46AE943D.1040105@canterbury.ac.nz> Message-ID: <5d44f72f0708041853m1bb0d005h9f1ff77103b9ebbe@mail.gmail.com> Well, regardless of the brokenness of the patch, I do get two different failures from this test on OSX. The first is caused by trying to socket.bind() a port that's already been bound recently: Exception in thread Thread-2: Traceback (most recent call last): File "/Users/jyasskin/src/python/test_asyncore/Lib/threading.py", line 464, in __bootstrap self.run() File "/Users/jyasskin/src/python/test_asyncore/Lib/threading.py", line 444, in run self.__target(*self.__args, **self.__kwargs) File "Lib/test/test_asyncore.py", line 59, in capture_server serv.bind(("", PORT)) File "", line 1, in bind socket.error: (48, 'Address already in use') That looks pretty easy to fix. The second: ====================================================================== ERROR: test_send (__main__.DispatcherWithSendTests_UsePoll) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_asyncore.py", line 351, in test_send d.send(data) File "/Users/jyasskin/src/python/test_asyncore/Lib/asyncore.py", line 468, in send self.initiate_send() File "/Users/jyasskin/src/python/test_asyncore/Lib/asyncore.py", line 455, in initiate_send num_sent = dispatcher.send(self, self.out_buffer[:512]) File "/Users/jyasskin/src/python/test_asyncore/Lib/asyncore.py", line 335, in send if why[0] == EWOULDBLOCK: TypeError: 'error' object is unindexable seems to be caused by a change in exceptions. I've reduced the problem into the attached patch, which adds a test to Lib/test/test_socket.py. It looks like subscripting is no longer the way to get values out of socket.errors, but another way hasn't been implemented yet. On 7/30/07, Greg Ewing wrote: > Hasan Diwan wrote: > > The issue seems to be in the socket.py close method. It needs to sleep > > socket.SO_REUSEADDR seconds before returning. > > WHAT??? socket.SO_REUSEADDR is a flag that you pass when > creating a socket to tell it to re-use an existing address, > not something to be used as a timeout value, as far as > I know. > > -- > Greg > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jyasskin%40gmail.com > -- Namast?, Jeffrey Yasskin http://jeffrey.yasskin.info/ "Religion is an improper response to the Divine." ? "Skinny Legs and All", by Tom Robbins -------------- next part -------------- A non-text attachment was scrubbed... Name: socket_breakage.diff Type: application/octet-stream Size: 1164 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070804/c8e25af9/attachment.obj From guido at python.org Sun Aug 5 04:09:45 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 4 Aug 2007 19:09:45 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B52422.2090006@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> Message-ID: On 8/4/07, Greg Ewing wrote: > Ron Adam wrote: > > Which would result in a first column that right aligns, a second column > > that centers unless the value is longer than 100, in which case it right > > align, and cuts the end, and a third column that left aligns, but cuts off > > the right if it's over 15. > > All this talk about cutting things off worries me. In the > case of numbers at least, if you can't afford to expand the > column width, normally the right thing to do is *not* to cut > them off, but replace them with **** or some other thing that > stands out. > > This suggests that the formatting and field width options may > not be as easily separable as we would like. I remember a language that did the *** thing; it was called Fortran. It was an absolutely terrible feature. A later language (Pascal) solved it by ignoring the field width if the number didn't fit -- it would mess up your layout but at least you'd see the value. That strategy worked much better, and later languages (e.g. C) followed it. So I think a maximum width is quite unnecessary for numbers. For strings, of course, it's useful; it can be made part of the string-specific conversion specifier. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Sun Aug 5 04:48:18 2007 From: skip at pobox.com (skip at pobox.com) Date: Sat, 4 Aug 2007 21:48:18 -0500 Subject: [Python-3000] atexit module problems/questions In-Reply-To: References: <18100.62086.177289.274444@montanaro.dyndns.org> <18100.62590.744303.912325@montanaro.dyndns.org> Message-ID: <18101.14962.541067.708881@montanaro.dyndns.org> skip> Given that sys.exitfunc is gone is there a reason to have skip> _run_exitfuncs? Who's going to call it? Christian> Unit tests? Some developers might want to test their Christian> registered functions. Your tests can just fork another instance of Python which prints: python -c 'import atexit def f(*args, **kwds): print("atexit", args, kwds) atexit.register(f, 1, x=2) ' and have your test case expect to see the appropriate output. Skip From skip at pobox.com Sun Aug 5 04:51:13 2007 From: skip at pobox.com (skip at pobox.com) Date: Sat, 4 Aug 2007 21:51:13 -0500 Subject: [Python-3000] atexit module problems/questions In-Reply-To: <46B526FE.4060500@canterbury.ac.nz> References: <18100.62086.177289.274444@montanaro.dyndns.org> <46B526FE.4060500@canterbury.ac.nz> Message-ID: <18101.15137.718715.98755@montanaro.dyndns.org> >> I can see a situation where you might register the same function >> multiple times with different argument lists, yet unregister takes >> only the function as the discriminator. Greg> One way to fix this would be to remove the ability to register Greg> arguments along with the function. It's not necessary, as you can Greg> always use a closure to get the same effect. Then you have a Greg> unique handle for each registered callback. Then you need to hang onto the closure. That might be some distance away from the point at which the function was registered. Returning a unique id corresponding to the specific call to atexit.register is much simpler the than forcing the caller to build a closure. Skip From talin at acm.org Sun Aug 5 06:17:21 2007 From: talin at acm.org (Talin) Date: Sat, 04 Aug 2007 21:17:21 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B4A9A0.9070206@ronadam.com> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> Message-ID: <46B54F51.40705@acm.org> Ron Adam wrote: > Ron Adam wrote: > >> An alternative I thought of this morning is to reuse the alignment symbols >> '^', '+', and '-' and require a minimum width if a maximum width is specified. > > One more (or two) additions to this... (snipped) I've kind of lost track of what the proposal is at this specific point. I like several of the ideas you have proposed, but I think it needs to be slimmed down even more. I don't have a particular syntax in mind - yet - but I can tell you what I would like to see in general. Guido used the term "mini-language" to describe the conversion specifier syntax. I think that's a good term, because it implies that it's not just a set of isolated properties, but rather a grammar where the arrangement and ordering of things matters. Like real human languages, it has a "Huffman-coding" property, where the most commonly-uttered phrases are the shortest. This conciseness is achieved by sacrificing some degree of orthogonality (in the same way that a CISC machine instruction is shorter than an equivalent RISC instruction.) In practical terms it means that the interpretation of a symbol depends on what comes before it. So in general common cases should be short, uncommon cases should be possible. And we don't have to allow every possible combination of options, just the ones that are most important. Another thing I want to point out is that Guido and I (in a private discussion) have resolved our argument about the role of __format__. Well, not so much *agreed* I guess, more like I capitulated. But in any case, the deal is that int, float, and decimal all get to have a __format__ method which interprets the format string for those types. There is no longer any automatic coercion of types based on the format string - so simply defining an __int__ method for a type is insufficient if you want to use the 'd' format type. Instead, if you want to use 'd' you can simply write the following: def MyClass: def __format__(self, spec): return int(self).__format__(spec) This at least has the advantage of simplifying the problem quite a bit. The global 'format(value, spec)' function now just does: 1) check for the 'repr' override, if present return repr(val) 2) call val.__format__(spec) if it exists 3) call str(val).__format__(spec) Note that this also means that float.__format__ will have to handle 'd' and int.__format__ will handle 'f', and so on, although this can be done by explicit type conversion in the __format__ method. (No need for float to handle 'x' and the like, even though it does work with %-formatting today.) > One other feature might be to use the fill syntax form to specify an > overflow replacement character... > > '{0:10+10/#}'.format('Python') -> 'Python ' > > '{0:10+10/#}'.format('To be, or not to be.') -> '##########' Yeah, as Guido pointed out in another message that's not going to fly. A few minor points on syntax of the minilanguage: -- I like your idea that :xxxx and ,yyyy can occur in any order. -- I'm leaning towards the .Net conversion spec syntax convention where the type letter comes first: ':f10'. The idea being that the first letter changes the interpretation of subsequent letters. Note that in the .Net case, the numeric quantity after the letter represents a *precision* specifier, not a min/max field width. So for example, in .Net having a float field of minimum width 10 and a decimal precision of 3 digits would be ':f3,10'. Now, as stated above, there's no 'max field width' for any data type except strings. So in the case of strings, we can re-use the precision specifier just like C printf does: ':s10' to limit the string to 10 characters. So 's:10,5' to indicate a max width of 10, min width of 5. -- There's no decimal precision quantity for any data type except floats. So ':d10' doesn't mean anything I think, but ':d,10' is minimum 10 digits. -- I don't have an opinion yet on where the other stuff (sign options, padding, alignment) should go, except that sign should go next to the type letter, while the rest should go after the comma. -- For the 'repr' override, Guido suggests putting 'r' in the alignment field: '{0,r}'. How that mixes with alignment and padding is unknown, although frankly why anyone would want to pad and align a repr() is completely beyond me. -- Talin From rrr at ronadam.com Sun Aug 5 08:06:43 2007 From: rrr at ronadam.com (Ron Adam) Date: Sun, 05 Aug 2007 01:06:43 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> Message-ID: <46B568F3.9060105@ronadam.com> Guido van Rossum wrote: > On 8/4/07, Greg Ewing wrote: >> Ron Adam wrote: >>> Which would result in a first column that right aligns, a second column >>> that centers unless the value is longer than 100, in which case it right >>> align, and cuts the end, and a third column that left aligns, but cuts off >>> the right if it's over 15. >> All this talk about cutting things off worries me. In the >> case of numbers at least, if you can't afford to expand the >> column width, normally the right thing to do is *not* to cut >> them off, but replace them with **** or some other thing that >> stands out. >> >> This suggests that the formatting and field width options may >> not be as easily separable as we would like. > > I remember a language that did the *** thing; it was called Fortran. > It was an absolutely terrible feature. A later language (Pascal) > solved it by ignoring the field width if the number didn't fit -- it > would mess up your layout but at least you'd see the value. That > strategy worked much better, and later languages (e.g. C) followed it. > So I think a maximum width is quite unnecessary for numbers. For > strings, of course, it's useful; it can be made part of the > string-specific conversion specifier. I looked up Fortran's print descriptors and it seems they have only a single width descriptor which as you say automatically included the *** over flow behavior. So the programmer doesn't have a choice, they can either specify a width and get that too, or don't specify a width. I can see how that would be very annoying. See section 2... http://www-solar.mcs.st-and.ac.uk/~steveb/course/notes/set4.pdf The field width specification I've described is rich enough so that the programmer can choose the behavior they want. So it doesn't have the same problem. A programmer can choose to implement the Fortran behavior if they really want to. They would need to specify an overflow replacement character to turn that on. Other wise it never occurs. '{0:10+20/*,s}' In the above case the field width would normally be 10, but could expand upto 20, and only if it goes over 20 is the field filled with '*'s. But that behavior was explicitly specified by the programmer by supplying an overflow replacement character along with the max_width size. It's not automatically included as in the Fortran case. Truncating behavior is explicitly specified by giving a max_width size without a replacement character. And a minimum width is explicitly specified by supplying a min_width size. So the programmer has full and explicit control of the alignment behaviors in all cases. Since an alignment specification is always paired with a format specification, the programmer can choose the best alignment behavior to go along with a formatter in the context of their application. This is a good thing even though some programmers may not always make the best choices at first. I believe they will learn fairly quickly what not to do. So the choices are: 1 - Remove the replacement character alignment option. It may not be all that useful, and by removing it we protect programmers from making some mistakes, but limit others from this feature who may find it useful. So just how useful/desirable is this? 2 - Only use max_width inside string formatters. This further protects programmers from making silly choices. And further limits other that may want to use max_width with other types. It also breaks up the clean split of alignment and format specifiers. (But this may be a matter of perspective.) I'm +0 on (1), and -1 on (2) moving max_width to the string formatter. So what do others think about these features? If you do #2, then #1 also goes, unless it too is moved to the string formatter. Note: Moving these to the string type formatter doesn't prevent them from being used with numbers in all cases. A general text class would still be able to use them with numeric entries because it would call the __str__ method of the number to first convert the number to a string, but then call __format__ on that string and forward these string options. It just requires more thought to do, and a better understanding of the internal process. But also this depends on the choice of the underlying implementation. Cheers, Ron From hasan.diwan at gmail.com Sun Aug 5 09:12:35 2007 From: hasan.diwan at gmail.com (Hasan Diwan) Date: Sun, 5 Aug 2007 00:12:35 -0700 Subject: [Python-3000] test_asyncore fails intermittently on Darwin In-Reply-To: <5d44f72f0708041853m1bb0d005h9f1ff77103b9ebbe@mail.gmail.com> References: <2cda2fc90707261505tdd9a0f1t861b5801c37ad11e@mail.gmail.com> <1d36917a0707261618oac94f20l98f464a2ab1edc4e@mail.gmail.com> <2cda2fc90707292338pff060c1i810737dcf6d5df54@mail.gmail.com> <2cda2fc90707292340k7eb11f2w82003e6f705438c3@mail.gmail.com> <46AE943D.1040105@canterbury.ac.nz> <5d44f72f0708041853m1bb0d005h9f1ff77103b9ebbe@mail.gmail.com> Message-ID: <2cda2fc90708050012p49831ad5ga69f7a069acff3d2@mail.gmail.com> On 04/08/07, Jeffrey Yasskin wrote: > Well, regardless of the brokenness of the patch, I do get two > different failures from this test on OSX. The first is caused by > trying to socket.bind() a port that's already been bound recently: > > Exception in thread Thread-2: > Traceback (most recent call last): > File "/Users/jyasskin/src/python/test_asyncore/Lib/threading.py", > line 464, in __bootstrap > self.run() > File "/Users/jyasskin/src/python/test_asyncore/Lib/threading.py", > line 444, in run > self.__target(*self.__args, **self.__kwargs) > File "Lib/test/test_asyncore.py", line 59, in capture_server > serv.bind(("", PORT)) > File "", line 1, in bind > socket.error: (48, 'Address already in use') Patch number 1767834 -- uncommitted as yet -- fixes this problem. -- Cheers, Hasan Diwan From rrr at ronadam.com Sun Aug 5 11:57:25 2007 From: rrr at ronadam.com (Ron Adam) Date: Sun, 05 Aug 2007 04:57:25 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B54F51.40705@acm.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> Message-ID: <46B59F05.3070200@ronadam.com> Talin wrote: > Ron Adam wrote: > >> Ron Adam wrote: >> >>> An alternative I thought of this morning is to reuse the alignment >>> symbols '^', '+', and '-' and require a minimum width if a maximum >>> width is specified. >> >> One more (or two) additions to this... > > (snipped) > > I've kind of lost track of what the proposal is at this specific point. > I like several of the ideas you have proposed, but I think it needs to > be slimmed down even more. I put in a lot of implementation details, so it may seem heavier than it really is. > I don't have a particular syntax in mind - yet - but I can tell you what > I would like to see in general. > > Guido used the term "mini-language" to describe the conversion specifier > syntax. I think that's a good term, because it implies that it's not > just a set of isolated properties, but rather a grammar where the > arrangement and ordering of things matters. I agree, a mini-language also imply a richness that a simple option list doesn't have. > Like real human languages, it has a "Huffman-coding" property, where the > most commonly-uttered phrases are the shortest. This conciseness is > achieved by sacrificing some degree of orthogonality (in the same way > that a CISC machine instruction is shorter than an equivalent RISC > instruction.) In practical terms it means that the interpretation of a > symbol depends on what comes before it. Sounds good. > So in general common cases should be short, uncommon cases should be > possible. And we don't have to allow every possible combination of > options, just the ones that are most important. I figured some of what I suggested would be vetoed, but included them in case they are desirable. It's not always easy to know before hand how the community, or Guido, ;-) is going to respond to any suggestion. > Another thing I want to point out is that Guido and I (in a private > discussion) have resolved our argument about the role of __format__. > Well, not so much *agreed* I guess, more like I capitulated. Refer to the message in this thread where I discuss the difference between concrete and abstract format specifiers. I think this is basically where you and Guido are differing on these issues. I got the impression you prefer the more abstract interpretation and Guido prefers a more traditional interpretation. We can have both as long as they are well defined and documented as being one or the other. It's when we try to make one format specifier have both qualities at different times that it gets messy. Here's how the apply_format function could look, we may not be in as much disagreement as you think. def apply_format(value, format_spec): abstract = False type = format_spec[0] if type in 'rtgd': abstract = True if format_spec[0] == 'r': # abstarct repr value = repr(value) elif format_spec[0] == 't': # abstarct text value = str(value) elif format_spec[0] == 'g': # abstract float value = float(value) else format_spec[0] == 'd': # abstarct int value = int(value) return value.__format__(format_spec, abstract) The above abstract types use duck typing to convert to concrete types before calling the returned types __format__ method. There aren't that many abstract types needed. We only need a few to cover the most common cases. That's it. It's up to each types __format__ method to figure out things from there. They can look at the original type spec passed to them and handle special cases if need be. If the abstract flag is False and the format_spec type doesn't match the type of the __format__ methods class, then an exception can be raised. This offers a wider range of strictness/leniency to string formatting. There are cases where you may want either. > But in any case, the deal is that int, float, and decimal all get to > have a __format__ method which interprets the format string for those > types. Good, +1 > There is no longer any automatic coercion of types based on the > format string Ever? This seems to contradict below where you say int needs to handle float, and float needs to handle int. Can you explain further? > - so simply defining an __int__ method for a type is > insufficient if you want to use the 'd' format type. Instead, if you > want to use 'd' you can simply write the following: > > def MyClass: > def __format__(self, spec): > return int(self).__format__(spec) So if an item has an __int__ method, but not a __format__ method, and you tried to print it with a 'd' format type, it would raise an exception? From your descriptions elsewhere in this reply it sounds like it would fall back to string output. Or am I missing something? > This at least has the advantage of simplifying the problem quite a bit. > The global 'format(value, spec)' function now just does: > > 1) check for the 'repr' override, if present return repr(val) > 2) call val.__format__(spec) if it exists > 3) call str(val).__format__(spec) The repr override is the same as in the above function, except in the above example any options after the 'r' would be interpreted by the string __format__ method. Sense there isn't any string specific options yet... it can just be returned early as in #1 here, but if options are added to the string type, that could be changed to forward the format_spec to the string __format__ method. Number two is the same also. Number three could be the same... Just put the __format__() in a try/except and call str(value) on the exception. It sounds like we may be getting hung up on interpretation rather than a real difference. > Note that this also means that float.__format__ will have to handle 'd' > and int.__format__ will handle 'f', and so on, although this can be done > by explicit type conversion in the __format__ method. (No need for float > to handle 'x' and the like, even though it does work with %-formatting > today.) This happens in my example above in the case of 'g' and 'd' types specifiers, but I'm not sure when it happens in your description if no conversions are made? >> One other feature might be to use the fill syntax form to specify an >> overflow replacement character... >> >> '{0:10+10/#}'.format('Python') -> 'Python ' >> >> '{0:10+10/#}'.format('To be, or not to be.') -> '##########' > > Yeah, as Guido pointed out in another message that's not going to fly. This one was just a see if it fly's suggestion. It apparently didn't unless a bunch of people all of a sudden say they have actual and valid use cases for it that make sense. Some times you just have to punt and see what happens. ;-) > A few minor points on syntax of the minilanguage: > > -- I like your idea that :xxxx and ,yyyy can occur in any order. > > -- I'm leaning towards the .Net conversion spec syntax convention where > the type letter comes first: ':f10'. The idea being that the first > letter changes the interpretation of subsequent letters. > > Note that in the .Net case, the numeric quantity after the letter > represents a *precision* specifier, not a min/max field width. I agree with these points of course. > So for example, in .Net having a float field of minimum width 10 and a > decimal precision of 3 digits would be ':f3,10'. It looks ok to me, but there may be some cases where it could be ambiguous. How would you specify leading 0's. Or would we do that in the alignment specifier? {0:f3,-10/0} '000123.000' > Now, as stated above, there's no 'max field width' for any data type > except strings. So in the case of strings, we can re-use the precision > specifier just like C printf does: ':s10' to limit the string to 10 > characters. So 's:10,5' to indicate a max width of 10, min width of 5. I'm sure you meant '{0:s10,5}' here. What happens if the string is too long? Does it always cut the left side off? Or do we use +' - and ^ here too? > -- There's no decimal precision quantity for any data type except > floats. So ':d10' doesn't mean anything I think, but ':d,10' is minimum > 10 digits. This is fine... The maximum value is optional, so this works in my examples as well. If there's not enough cases where specifying a maximum width is useful I'm ok with not having it. The reason I prefer it in the alignment side, is it applies to all cases equally. A consistency I prefer, but maybe not one that's needed. > -- I don't have an opinion yet on where the other stuff (sign options, > padding, alignment) should go, except that sign should go next to the > type letter, while the rest should go after the comma. I think I agree here. > -- For the 'repr' override, Guido suggests putting 'r' in the alignment > field: '{0,r}'. How that mixes with alignment and padding is unknown, > although frankly why anyone would want to pad and align a repr() is > completely beyond me. Sometimes it's handy for formatting a variable repr output in columns. Mostly for debugging, learning exercises, or documentation purposes. Since there is no actual Repr type, it may seem like it shouldn't be a type specifier. But if you consider it as indirect string type, an abstract type that converts to string type, the idea and implementation works fine and it can then forward it's type specifier to the strings __format__ method. (or not) The exact behavior can be flexible. To me there is an underlying consistency with grouping abstract/indirect types with more concrete types rather than makeing an exception in the field alignment specifier. Moving repr to the format side sort of breaks the original clean idea of having a field alignment specifier and separate type format specifiers. I think if we continue to sort out the detail behaviors of the underlying implementation, the best overall solution will sort it self out. Good and complete example test cases will help too. I think we actually agree on quite a lot so far. :-) Cheers, Ron From martin at v.loewis.de Sun Aug 5 14:37:15 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Aug 2007 14:37:15 +0200 Subject: [Python-3000] C API cleanup str In-Reply-To: References: <46B2C8E0.8080409@canterbury.ac.nz> Message-ID: <46B5C47B.5090703@v.loewis.de> > Aside from the name, are there other issues you can think of with any > of the API changes? There are some small changes, things like macros > only having a function form. Are these a problem? > > Str/unicode is going to be a big change. Any thoughts there? We need some rules on what the character set is on the C level. E.g. if you do PyString_FromStringAndSize, is that ASCII, Latin-1, UTF-8? Likewise, what is the encoding in PyArg_ParseTuple for s and s# parameters? Regards, Martin From ironfroggy at gmail.com Sun Aug 5 15:20:33 2007 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sun, 5 Aug 2007 09:20:33 -0400 Subject: [Python-3000] map() Returns Iterator In-Reply-To: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> Message-ID: <76fd5acf0708050620v25da595fi11a3e8d5f76306c1@mail.gmail.com> I can't remember specifics, but i had always expected map and filter to be replaced by their itertools counter parts. On 8/4/07, Kurt B. Kaiser wrote: > Although there has been quite a bit of discussion on dropping reduce() > and retaining map(), filter(), and zip(), there has been less discussion > (at least that I can find) on changing them to return iterators instead > of lists. > > I think of map() and filter() as sequence transformers. To me, it's > an unexpected semantic change that the result is no longer a list. > > In existing Lib/ code, it's twice as likely that the result of map() > will be assigned than to use it as an iterator in a flow control > statement. > > If the statistics on the usage of map() stay the same, 2/3 of the time > the current implementation will require code like > > foo = list(map(fcn, bar)). > > map() and filter() were retained primarily because they can produce > more compact and readable code when used correctly. Adding list() most > of the time seems to diminish this benefit, especially when combined with > a lambda as the first arg. > > There are a number of instances where map() is called for its side > effect, e.g. > > map(print, line_sequence) > > with the return result ignored. In py3k this has caused many silent > failures. We've been weeding these out, and there are only a couple > left, but there are no doubt many more in 3rd party code. > > The situation with filter() is similar, though it's not used purely > for side effects. zip() is infrequently used. However, IMO for > consistency they should all act the same way. > > I've seen GvR slides suggesting replacing map() et. al. with list > comprehensions, but never with generator expressions. > > PEP 3100: "Make built-ins return an iterator where appropriate > (e.g. range(), zip(), map(), filter(), etc.)" > > It makes sense for range() to return an iterator. I have my doubts on > map(), filter(), and zip(). Having them return iterators seems to > be a premature optimization. Could something be done in the ast phase > of compilation instead? > > > > > > > -- > KBK > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/ironfroggy%40gmail.com > -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://ironfroggy-code.blogspot.com/ From alan.mcintyre at gmail.com Sun Aug 5 16:02:58 2007 From: alan.mcintyre at gmail.com (Alan McIntyre) Date: Sun, 5 Aug 2007 10:02:58 -0400 Subject: [Python-3000] test_asyncore fails intermittently on Darwin In-Reply-To: <5d44f72f0708041853m1bb0d005h9f1ff77103b9ebbe@mail.gmail.com> References: <2cda2fc90707261505tdd9a0f1t861b5801c37ad11e@mail.gmail.com> <1d36917a0707261618oac94f20l98f464a2ab1edc4e@mail.gmail.com> <2cda2fc90707292338pff060c1i810737dcf6d5df54@mail.gmail.com> <2cda2fc90707292340k7eb11f2w82003e6f705438c3@mail.gmail.com> <46AE943D.1040105@canterbury.ac.nz> <5d44f72f0708041853m1bb0d005h9f1ff77103b9ebbe@mail.gmail.com> Message-ID: <1d36917a0708050702n6b48594bn824bd97ea6622421@mail.gmail.com> On 8/4/07, Jeffrey Yasskin wrote: > Well, regardless of the brokenness of the patch, I do get two > different failures from this test on OSX. The first is caused by > trying to socket.bind() a port that's already been bound recently: > That looks pretty easy to fix. It was fixed in the trunk on July 28 as part of rev 56604, by letting the OS assign the port (binding to port 0). I apologize if everybody was expecting me to fix this in Python 3000; I thought the initial complaint was in reference to 2.6. I'm working on test improvements for 2.6, so I'm sort of fixated on the trunk at the moment. :) I wouldn't mind trying to roll my changes forward into Py3k after GSoC is done if I have the time, though. Alan From guido at python.org Sun Aug 5 17:08:28 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 5 Aug 2007 08:08:28 -0700 Subject: [Python-3000] C API cleanup str In-Reply-To: <46B5C47B.5090703@v.loewis.de> References: <46B2C8E0.8080409@canterbury.ac.nz> <46B5C47B.5090703@v.loewis.de> Message-ID: On 8/5/07, "Martin v. L?wis" wrote: > > Aside from the name, are there other issues you can think of with any > > of the API changes? There are some small changes, things like macros > > only having a function form. Are these a problem? > > > > Str/unicode is going to be a big change. Any thoughts there? > > We need some rules on what the character set is on the C level. > E.g. if you do PyString_FromStringAndSize, is that ASCII, Latin-1, > UTF-8? Likewise, what is the encoding in PyArg_ParseTuple for s > and s# parameters? IMO at the C level all conversions between bytes and Unicode that don't specify a conversion should use UTF-8. That's what most of the changes made so far do. An exception should be made for stuff that explicitly handles filenames; there the filesystem encoding should obviously used. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Sun Aug 5 17:48:06 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Aug 2007 17:48:06 +0200 Subject: [Python-3000] C API cleanup str In-Reply-To: References: <46B2C8E0.8080409@canterbury.ac.nz> <46B5C47B.5090703@v.loewis.de> Message-ID: <46B5F136.4010502@v.loewis.de> > IMO at the C level all conversions between bytes and Unicode that > don't specify a conversion should use UTF-8. That's what most of the > changes made so far do. I agree. We should specify that somewhere, so we have a recorded guideline to use in case of doubt. One function that misbehaves under this spec is PyUnicode_FromString[AndSize], which assumes the input is Latin-1 (i.e. it performs a codepoint-per-codepoint conversion). As a consequence, this now can fail because of encoding errors (which it previously couldn't). > An exception should be made for stuff that explicitly handles > filenames; there the filesystem encoding should obviously used. In most cases, this still follows the rule, as the filename encoding is specified explicitly. I agree this should also be specified, in particular when the import code gets fixed (where strings typically denote file names). Regards, Martin From guido at python.org Sun Aug 5 17:59:38 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 5 Aug 2007 08:59:38 -0700 Subject: [Python-3000] C API cleanup str In-Reply-To: <46B5F136.4010502@v.loewis.de> References: <46B2C8E0.8080409@canterbury.ac.nz> <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> Message-ID: On 8/5/07, "Martin v. L?wis" wrote: > > IMO at the C level all conversions between bytes and Unicode that > > don't specify a conversion should use UTF-8. That's what most of the > > changes made so far do. > > I agree. We should specify that somewhere, so we have a recorded > guideline to use in case of doubt. But where? Time to start a PEP for the C API perhaps? > One function that misbehaves under this spec is > PyUnicode_FromString[AndSize], which assumes the input is Latin-1 > (i.e. it performs a codepoint-per-codepoint conversion). Ouch. > As a consequence, this now can fail because of encoding errors > (which it previously couldn't). You mean if it were fixed it could fail, right? Code calling it should be checking for errors anyway because it allocates memory. Have you tried making this particular change and seeing what fails? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Sun Aug 5 18:25:53 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Aug 2007 18:25:53 +0200 Subject: [Python-3000] C API cleanup str In-Reply-To: References: <46B2C8E0.8080409@canterbury.ac.nz> <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> Message-ID: <46B5FA11.5040404@v.loewis.de> >> I agree. We should specify that somewhere, so we have a recorded >> guideline to use in case of doubt. > > But where? Time to start a PEP for the C API perhaps? I would put it into the API documentation. We can put a daily-generated version of the documentation online, just as the trunk documentation is updated daily. IMO, a PEP is necessary only for disputed cases. As the C API seems to get few if any disputes, we just need to record the decisions made. >> As a consequence, this now can fail because of encoding errors >> (which it previously couldn't). > > You mean if it were fixed it could fail, right? Right. > Have you tried making this particular change and seeing what fails? No. I suspect most callers pass ASCII, so they should be fine. In the cases where it really fails, the caller likely meant to create bytes. Regards, Martin From adam at hupp.org Sun Aug 5 18:31:32 2007 From: adam at hupp.org (Adam Hupp) Date: Sun, 5 Aug 2007 11:31:32 -0500 Subject: [Python-3000] py3k conversion docs? In-Reply-To: <18100.59990.335150.692487@montanaro.dyndns.org> References: <18100.59990.335150.692487@montanaro.dyndns.org> Message-ID: <20070805163132.GA10277@mouth.upl.cs.wisc.edu> On Sat, Aug 04, 2007 at 04:06:30PM -0500, skip at pobox.com wrote: > I'm looking at the recently submitted patch for the csv module and am > scratching my head a bit trying to understand the code transformations. > I've not looked at any py3k code yet, so this is all new to me. Is there > any documentation about the Py3k conversion? I'm particularly interested in > the string->unicode conversion. > > Here's one confusing conversion. I see PyString_FromStringAndSize replaced > by PyUnicode_FromUnicode. In that case the type of ReaderObj.field has changed from char* to Py_UNICODE*. _FromUnicode should be analagous to the _FromStringAndSize call here. > In another place I see PyString_FromString replaced by > PyUnicodeDecodeASCII. In some places I see a char left alone. In > other places I see it replaced by PyUNICODE. Actually, I missed one spot that should use Py_UNICODE instead of char. get_nullchar_as_None should be taking a Py_UNICODE instead of a char, and PyUnicode_DecodeASCII should really be a call to _FromUnicode. I'll say though that I'm not positive this patch is the Right Way to do the conversion. Review by someone that does would be appreciated. -- Adam Hupp | http://hupp.org/adam/ From talin at acm.org Sun Aug 5 18:33:29 2007 From: talin at acm.org (Talin) Date: Sun, 05 Aug 2007 09:33:29 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B59F05.3070200@ronadam.com> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> Message-ID: <46B5FBD9.4020301@acm.org> Ron Adam wrote: > Talin wrote: >> Another thing I want to point out is that Guido and I (in a private >> discussion) have resolved our argument about the role of __format__. >> Well, not so much *agreed* I guess, more like I capitulated. > > Refer to the message in this thread where I discuss the difference > between concrete and abstract format specifiers. I think this is > basically where you and Guido are differing on these issues. I got the > impression you prefer the more abstract interpretation and Guido prefers > a more traditional interpretation. We can have both as long as they are > well defined and documented as being one or the other. It's when we try > to make one format specifier have both qualities at different times that > it gets messy. > > > Here's how the apply_format function could look, we may not be in as > much disagreement as you think. > > def apply_format(value, format_spec): > abstract = False > type = format_spec[0] > if type in 'rtgd': > abstract = True > if format_spec[0] == 'r': # abstarct repr > value = repr(value) > elif format_spec[0] == 't': # abstarct text > value = str(value) > elif format_spec[0] == 'g': # abstract float > value = float(value) > else > format_spec[0] == 'd': # abstarct int > value = int(value) > return value.__format__(format_spec, abstract) > > The above abstract types use duck typing to convert to concrete types > before calling the returned types __format__ method. There aren't that > many abstract types needed. We only need a few to cover the most common > cases. > > That's it. It's up to each types __format__ method to figure out things > from there. They can look at the original type spec passed to them and > handle special cases if need be. Let me define some terms again for the discussion. As noted before, the ',' part is called the alignment specifier. It's no longer appropriate to use the term 'conversion specifier', since we're not doing conversions, so I guess I will stick with the term 'format specifier' for the ':' part. What Guido wants is for the general 'apply_format' function to not examine the format specifier *at all*. The reason is that for some types, the __format__ method can define its own interpretation of the format string which may include the letters 'rtgd' as part of its regular syntax. Basically, he wants no constraints on what __format__ is allowed to do. Given this constraint, it becomes pretty obvious which attributes go in which part. Attributes which are actually involved in generating the text (signs and leading digits) would have to go in the format_specifier, and attributes which are are interpreted by apply_format (such as left/right alignment) would have to go in the alignment specifier. Of course, the two can't be entirely isolated because there is interaction between the two specifiers for some types. For example, it would normally be the case that padding is applied by 'apply_format', which knows about the field width and the padding character. However, in the case of an integer that is printed with leading zeros, the sign must come *before* the padding: '+000000010'. It's not sufficient to simply apply padding blindly to the output of __format__, which would give you '000000+10'. (Maybe leading zeros and padding are different things? But the __format__ would still need to know the field width, which is usually part of the alignment spec, since it's usually applied as a post-processing step by 'apply_format') > If the abstract flag is False and the format_spec type doesn't match the > type of the __format__ methods class, then an exception can be raised. > This offers a wider range of strictness/leniency to string formatting. > There are cases where you may want either. > > >> But in any case, the deal is that int, float, and decimal all get to >> have a __format__ method which interprets the format string for those >> types. > > Good, +1 > >> There is no longer any automatic coercion of types based on the format >> string > > Ever? This seems to contradict below where you say int needs to handle > float, and float needs to handle int. Can you explain further? What I mean is that a float, upon receiving a format specifier of 'd', needs to print the number so that it 'looks like' an integer. It doesn't actually have to convert it to an int. So 'd' in this case is just a synonym for 'f0'. >> - so simply defining an __int__ method for a type is insufficient if >> you want to use the 'd' format type. Instead, if you want to use 'd' >> you can simply write the following: >> >> def MyClass: >> def __format__(self, spec): >> return int(self).__format__(spec) > > > So if an item has an __int__ method, but not a __format__ method, and > you tried to print it with a 'd' format type, it would raise an exception? > > From your descriptions elsewhere in this reply it sounds like it would > fall back to string output. Or am I missing something? Yes, we have to have some sort of fallback if there's no __format__ method at all. My thought here is to coerce to str() in this case. >> So for example, in .Net having a float field of minimum width 10 and a >> decimal precision of 3 digits would be ':f3,10'. > > It looks ok to me, but there may be some cases where it could be > ambiguous. How would you specify leading 0's. Or would we do that in > the alignment specifier? > > {0:f3,-10/0} '000123.000' I'm not sure. This is the one case where the two specifiers interact, as I mentioned above. >> Now, as stated above, there's no 'max field width' for any data type >> except strings. So in the case of strings, we can re-use the precision >> specifier just like C printf does: ':s10' to limit the string to 10 >> characters. So 's:10,5' to indicate a max width of 10, min width of 5. > > I'm sure you meant '{0:s10,5}' here. Right. >> -- For the 'repr' override, Guido suggests putting 'r' in the >> alignment field: '{0,r}'. How that mixes with alignment and padding is >> unknown, although frankly why anyone would want to pad and align a >> repr() is completely beyond me. > > Sometimes it's handy for formatting a variable repr output in columns. > Mostly for debugging, learning exercises, or documentation purposes. > > Since there is no actual Repr type, it may seem like it shouldn't be a > type specifier. But if you consider it as indirect string type, an > abstract type that converts to string type, the idea and implementation > works fine and it can then forward it's type specifier to the strings > __format__ method. (or not) > > The exact behavior can be flexible. > > To me there is an underlying consistency with grouping abstract/indirect > types with more concrete types rather than makeing an exception in the > field alignment specifier. > > Moving repr to the format side sort of breaks the original clean idea of > having a field alignment specifier and separate type format specifiers. The reason for this is because of the constraint that apply_format never looks at the format specifier, so overrides for repr() can only go in the thing that it does look at - the alignment spec. > I think if we continue to sort out the detail behaviors of the > underlying implementation, the best overall solution will sort it self > out. Good and complete example test cases will help too. > > I think we actually agree on quite a lot so far. :-) Me too. > Cheers, > Ron > From rhamph at gmail.com Sun Aug 5 20:05:48 2007 From: rhamph at gmail.com (Adam Olsen) Date: Sun, 5 Aug 2007 12:05:48 -0600 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B5FBD9.4020301@acm.org> References: <46B13ADE.7080901@acm.org> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> Message-ID: On 8/5/07, Talin wrote: > Ron Adam wrote: > > To me there is an underlying consistency with grouping abstract/indirect > > types with more concrete types rather than makeing an exception in the > > field alignment specifier. > > > > Moving repr to the format side sort of breaks the original clean idea of > > having a field alignment specifier and separate type format specifiers. > > The reason for this is because of the constraint that apply_format never > looks at the format specifier, so overrides for repr() can only go in > the thing that it does look at - the alignment spec. How important is this constraint? In my proposal, apply_format (which I called handle_format, alas) immediately called __format__. Only if __format__ didn't exist or it returned NotImplemented would it check what type was expected and attempt a coercion (__float__, __index__, etc), then calling __format__ on that. -- Adam Olsen, aka Rhamphoryncus From rrr at ronadam.com Sun Aug 5 21:41:59 2007 From: rrr at ronadam.com (Ron Adam) Date: Sun, 05 Aug 2007 14:41:59 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B5FBD9.4020301@acm.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> Message-ID: <46B62807.4030106@ronadam.com> Talin wrote: > Ron Adam wrote: >> Talin wrote: > Let me define some terms again for the discussion. As noted before, the > ',' part is called the alignment specifier. It's no longer appropriate > to use the term 'conversion specifier', since we're not doing > conversions, so I guess I will stick with the term 'format specifier' > for the ':' part. I don't consider them as conversions, it's all going to end up as either a string or an exception at the end. It's just a matter of the best way to get there. The only case where a conversion of any type *doesn't* happen is when a value is already a string and a string specifier is applied to it or there is no format specifier. In all most other cases, some sort of converting process occurs, although it may be a manual reading of characters or bytes and not an explicit type cast. And in those cases, its more a matter of when it happens rather than how it happens that is important. Also this is a one directional data path. The process should never have side effects that may effect an object that is passed into a formatter. This isn't enforceable, but pythons builtin mechanisms should never do that. Creating new objects in an intermediate step doesn't do that. > What Guido wants is for the general 'apply_format' function to not > examine the format specifier *at all*. Hmmm... With this, it becomes much harder to determine what a format specifier will do because it depends totally on the objects __format__ method implementation. So the behavior of a specific format specifier may change depending on the argument object type. It also makes the __format__ methods much more complex because you need to have them know how to handle a wider variety of possibilities. What will the built in types __format__ method do if they get a specifier they don't know how to handle? Raise an exception, or fall back to str, or repr? > The reason is that for some types, the __format__ method can define its > own interpretation of the format string which may include the letters > 'rtgd' as part of its regular syntax. Basically, he wants no constraints > on what __format__ is allowed to do. You suggested the format specification be interpreted like a mini language. That implies there may be global format interpreter that an objects __format__ method can call. Such an interpreter would know how to handle the built in types and be extendable. Or we could supply a __format__ method to change the behavior if we want something else. In effect, it moves any conversions/interpretations that may happen later in the even chain. Is this the direction he wants to go in? Or does he want each built in object to have it's own __format__ method independent from each other? > Given this constraint, it becomes pretty obvious which attributes go in > which part. Attributes which are actually involved in generating the > text (signs and leading digits) would have to go in the > format_specifier, and attributes which are are interpreted by > apply_format (such as left/right alignment) would have to go in the > alignment specifier. > > Of course, the two can't be entirely isolated because there is > interaction between the two specifiers for some types. For example, it > would normally be the case that padding is applied by 'apply_format', > which knows about the field width and the padding character. However, in > the case of an integer that is printed with leading zeros, the sign must > come *before* the padding: '+000000010'. It's not sufficient to simply > apply padding blindly to the output of __format__, which would give you > '000000+10'. > > (Maybe leading zeros and padding are different things? But the > __format__ would still need to know the field width, which is usually > part of the alignment spec, since it's usually applied as a > post-processing step by 'apply_format') It is different. That is why earlier I made the distinction between a numeric width and a field width. This would be a numeric width, and it would be inside a field which may have it's own minimum width and possibly a different fill character. '{0:d+6/0,^15/_}'.format(123) -> '____+000123____' This way, the two terms don't have to know about each other. The same output in some cases can be generated in more than one way, but I don't think that is always a bad thing. Trying to avoid that makes things more complex. >>> There is no longer any automatic coercion of types based on the >>> format string >> >> Ever? This seems to contradict below where you say int needs to >> handle float, and float needs to handle int. Can you explain further? > > What I mean is that a float, upon receiving a format specifier of 'd', > needs to print the number so that it 'looks like' an integer. It doesn't > actually have to convert it to an int. So 'd' in this case is just a > synonym for 'f0'. I will think about this a bit. It seems to me, the results are the same with more work. What about rounding behaviors, isn't 'f0' different in that regard? >>> - so simply defining an __int__ method for a type is insufficient if >>> you want to use the 'd' format type. Instead, if you want to use 'd' >>> you can simply write the following: >>> >>> def MyClass: >>> def __format__(self, spec): >>> return int(self).__format__(spec) >> >> >> So if an item has an __int__ method, but not a __format__ method, and >> you tried to print it with a 'd' format type, it would raise an >> exception? >> >> From your descriptions elsewhere in this reply it sounds like it >> would fall back to string output. Or am I missing something? > > Yes, we have to have some sort of fallback if there's no __format__ > method at all. My thought here is to coerce to str() in this case. Will a string have a __format__ method and if so, will the format specifier term be forwarded to the string's __format__ method in this case too? >>> So for example, in .Net having a float field of minimum width 10 and >>> a decimal precision of 3 digits would be ':f3,10'. >> >> It looks ok to me, but there may be some cases where it could be >> ambiguous. How would you specify leading 0's. Or would we do that >> in the alignment specifier? >> >> {0:f3,-10/0} '000123.000' > > I'm not sure. This is the one case where the two specifiers interact, as > I mentioned above. Yes, that is way I asked about it. To avoid interaction you need for floats to have a 'numeric width'. And to avoid ambiguities with the precision term you need the '.'. {0:f+6/0.3} '-000123.000' {0:f+6.3} '+ 456.000' {0:f6} ' 789.0' {0:f.3} '42.000' >> To me there is an underlying consistency with grouping >> abstract/indirect types with more concrete types rather than makeing >> an exception in the field alignment specifier. >> >> Moving repr to the format side sort of breaks the original clean idea >> of having a field alignment specifier and separate type format >> specifiers. > > The reason for this is because of the constraint that apply_format never > looks at the format specifier, so overrides for repr() can only go in > the thing that it does look at - the alignment spec. Ok. But I'm -1 on this for the record. It creates an exceptional case. ie... the format is applied first, except if the alignment term has an 'r' in it. Then what happens to the format specifier term if it exists? Is it forwarded to the string __format__ method here?, ignored?, or is an exception raised? I'm going to think about these issues some more. Maybe I'll change my mind or find another way to 'see' this. Cheers, Ron From martin at v.loewis.de Sun Aug 5 22:32:16 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Aug 2007 22:32:16 +0200 Subject: [Python-3000] C API cleanup str In-Reply-To: References: <46B2C8E0.8080409@canterbury.ac.nz> <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> Message-ID: <46B633D0.7050902@v.loewis.de> > You mean if it were fixed it could fail, right? Code calling it should > be checking for errors anyway because it allocates memory. > > Have you tried making this particular change and seeing what fails? I now tried, and it turned out that bytes.__reduce__ would break (again); I fixed it and changed it in r56755. It turned out that PyUnicode_FromString was even documented to accept latin-1. While I was looking at it, I wondered why PyUnicode_FromStringAndSize allows a NULL first argument, creating a null-initialized Unicode object. This functionality is already available as PyUnicode_FromUnicode, and callers who previously wrote obuf = PyString_FromStringAndSize(NULL, bufsize); if (!obuf)return NULL; buf = PyString_AsString(buf); could be tricked into believing that they now can change the string object they just created - which they cannot, as buf will just be the UTF-8 encoded version of the real string. Regards, Martin From martin at v.loewis.de Sun Aug 5 22:49:33 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Aug 2007 22:49:33 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules Message-ID: <46B637DD.7070905@v.loewis.de> I changed bsddb so that it consistently produces and consumes bytes only, and added convenience wrappers StringKeys and StringValues for people whose database are known to store only strings as either keys or values; those get UTF-8 encoded. While I could fix test_bsddb with these changes, anydbm and whichdb broke, as they expect to use string keys. Changing them to use bytes keys then broke dumbdbm, which uses a dictionary internally for the index. This brings me to join others in the desire for immutable bytes objects: I think such a type is needed, and it should probably use the same hash algorithm as str8. I don't think it needs to be a separate type, instead, bytes objects could have a idem-potent .freeze() method which switches the "immutable" bit on. There would be no way to switch it off again. If that is not acceptable, please tell me how else to fix the dbm modules. Regards, Martin From fdrake at acm.org Sun Aug 5 23:04:39 2007 From: fdrake at acm.org (Fred Drake) Date: Sun, 5 Aug 2007 17:04:39 -0400 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B637DD.7070905@v.loewis.de> References: <46B637DD.7070905@v.loewis.de> Message-ID: <393F0424-E74F-4E27-9AFB-45EC70704A56@acm.org> On Aug 5, 2007, at 4:49 PM, Martin v. L?wis wrote: > I don't think it needs to be a separate type, > instead, bytes objects could have a idem-potent > .freeze() method which switches the "immutable" > bit on. There would be no way to switch it off > again. +1 -Fred -- Fred Drake From greg.ewing at canterbury.ac.nz Mon Aug 6 02:10:20 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Aug 2007 12:10:20 +1200 Subject: [Python-3000] More PEP 3101 changes incoming Message-ID: <46B666EC.2090807@canterbury.ac.nz> Guido wrote: > I remember a language that did the *** thing; it was called Fortran. > It was an absolutely terrible feature. I agree that expanding the field width is much to be preferred if possible. But if you *must* have a maximum field width, it's better to show no number at all than a number with some of its digits invisibly chopped off. > So I think a maximum width is quite unnecessary for numbers. I agree with that. By the way, I've always thought the reason C has maximum widths for string formats is so you can deal with strings which are not null-terminated, an issue that doesn't arise in Python. -- Greg From greg.ewing at canterbury.ac.nz Mon Aug 6 02:12:21 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Aug 2007 12:12:21 +1200 Subject: [Python-3000] atexit module problems/questions In-Reply-To: <18101.15137.718715.98755@montanaro.dyndns.org> References: <18100.62086.177289.274444@montanaro.dyndns.org> <46B526FE.4060500@canterbury.ac.nz> <18101.15137.718715.98755@montanaro.dyndns.org> Message-ID: <46B66765.7060406@canterbury.ac.nz> skip at pobox.com wrote: > Then you need to hang onto the closure. That might be some distance away > from the point at which the function was registered. Well, if you get a unique ID, you need to hang onto that somehow, too. -- Greg From greg.ewing at canterbury.ac.nz Mon Aug 6 02:24:20 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Aug 2007 12:24:20 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B54F51.40705@acm.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> Message-ID: <46B66A34.4080202@canterbury.ac.nz> Talin wrote: > So 's:10,5' to indicate a max width of 10, min width of 5. If you just say ':s10' does this mean there's *no* minimum width, or that the minimum width is also 10? The former would be somewhat unintuitive, but if the latter, then the separation between format and width specifiers breaks down. -- Greg From greg.ewing at canterbury.ac.nz Mon Aug 6 02:31:28 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Aug 2007 12:31:28 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B568F3.9060105@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> Message-ID: <46B66BE0.7090005@canterbury.ac.nz> Ron Adam wrote: > > Truncating behavior is > explicitly specified by giving a max_width size without a replacement > character. I think that would be an extremely bad default for numbers. > I believe they will learn fairly quickly what not to do. Even if that's true, why make the behaviour that's desirable in the vast majority of cases more difficult to specify? -- Greg From greg.ewing at canterbury.ac.nz Mon Aug 6 02:42:38 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Aug 2007 12:42:38 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B59F05.3070200@ronadam.com> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> Message-ID: <46B66E7E.4060209@canterbury.ac.nz> Ron Adam wrote: > return value.__format__(format_spec, abstract) Why would the __format__ method need to be passed an 'abstract' flag? It can tell from the format_spec if it needs to know. -- Greg From greg.ewing at canterbury.ac.nz Mon Aug 6 03:08:46 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Aug 2007 13:08:46 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B5FBD9.4020301@acm.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> Message-ID: <46B6749E.9020304@canterbury.ac.nz> Talin wrote: > in > the case of an integer that is printed with leading zeros, the sign must > come *before* the padding: '+000000010'. It's not sufficient to simply > apply padding blindly to the output of __format__, which would give you > '000000+10'. How about this, then: The apply_format function parses the alignment spec and passes the result to the __format__ method along with the format spec. The __format__ method can then choose to do its own alignment and padding to achieve the specified field width. If it returns something less than the specified width, apply_format then uses the default alignment algorithm. Then the __format__ method has complete control over the whole process if it wants, the only distinction being that the alignment spec has a fixed syntax whereas the format spec can be anything. -- Greg From skip at pobox.com Mon Aug 6 04:12:32 2007 From: skip at pobox.com (skip at pobox.com) Date: Sun, 5 Aug 2007 21:12:32 -0500 Subject: [Python-3000] atexit module problems/questions In-Reply-To: <46B66765.7060406@canterbury.ac.nz> References: <18100.62086.177289.274444@montanaro.dyndns.org> <46B526FE.4060500@canterbury.ac.nz> <18101.15137.718715.98755@montanaro.dyndns.org> <46B66765.7060406@canterbury.ac.nz> Message-ID: <18102.33680.607164.490861@montanaro.dyndns.org> >> Then you need to hang onto the closure. That might be some distance >> away from the point at which the function was registered. Greg> Well, if you get a unique ID, you need to hang onto that somehow, Greg> too. Yes, but an int is both much smaller than a function and can't be involved in cyclic garbage. Skip From rrr at ronadam.com Mon Aug 6 06:47:36 2007 From: rrr at ronadam.com (Ron Adam) Date: Sun, 05 Aug 2007 23:47:36 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B66E7E.4060209@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B66E7E.4060209@canterbury.ac.nz> Message-ID: <46B6A7E8.7040001@ronadam.com> Greg Ewing wrote: > Ron Adam wrote: >> return value.__format__(format_spec, abstract) > > Why would the __format__ method need to be passed an > 'abstract' flag? It can tell from the format_spec if > it needs to know. I may have been thinking too far ahead on this one. I first wrote that without the abstract flag, but then changed it because it seemed there was an ambiguous situations that I thought this would clear up. I think i was thinking of a way to offer a generic way to tell a __format__ method weather or not to raise an exception or fall back to str or repr. lets say a string __format__ method looks like the following... def __format__(self, specifier, abstract=False): if not specifier or specifier[0] == 's' or abstract: return self raise(ValueError, invalid type for format specifier.) It would be more complex than this in most cases, but it doesn't need to know about any other specifier types to work. Of course string types don't need to fall back, but that doens't mean it is away s desirable for them to succeed. For example if we have... '{0:k10}'.format('python') Should it even try to succeed, or should it complain immediately? If the string __format__ method got 'k10' as a format specifier, it has no idea what the 'k10' is suppose to mean, it needs to make a choice to either fall back to str(), or raise an exception that could be caught and handled. So, Is it useful to sometimes be strict an at other times forgive and fall back? And if so, how can we handle that best? (The exact mechanism can be figured out later, its the desired behaviors that needs to be determined for now.) Cheers, Ron From rrr at ronadam.com Mon Aug 6 06:48:51 2007 From: rrr at ronadam.com (Ron Adam) Date: Sun, 05 Aug 2007 23:48:51 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B66BE0.7090005@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> Message-ID: <46B6A833.70007@ronadam.com> Greg Ewing wrote: > Ron Adam wrote: >> Truncating behavior is >> explicitly specified by giving a max_width size without a replacement >> character. > > I think that would be an extremely bad default for numbers. It's *not* a default. The default is to have no max_width. > > I believe they will learn fairly quickly what not to do. > > Even if that's true, why make the behaviour that's desirable > in the vast majority of cases more difficult to specify? > > -- > Greg > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/rrr%40ronadam.com > > From talin at acm.org Mon Aug 6 07:40:07 2007 From: talin at acm.org (Talin) Date: Sun, 05 Aug 2007 22:40:07 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B6749E.9020304@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46B6749E.9020304@canterbury.ac.nz> Message-ID: <46B6B437.7080901@acm.org> Greg Ewing wrote: > Talin wrote: >> in >> the case of an integer that is printed with leading zeros, the sign must >> come *before* the padding: '+000000010'. It's not sufficient to simply >> apply padding blindly to the output of __format__, which would give you >> '000000+10'. > > How about this, then: The apply_format function parses > the alignment spec and passes the result to the __format__ > method along with the format spec. The __format__ method > can then choose to do its own alignment and padding to > achieve the specified field width. If it returns something > less than the specified width, apply_format then uses the > default alignment algorithm. > > Then the __format__ method has complete control over the > whole process if it wants, the only distinction being that > the alignment spec has a fixed syntax whereas the format > spec can be anything. I think that this is right - at least, I can't think of another way to do it. -- Talin From skip at pobox.com Mon Aug 6 08:03:02 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 6 Aug 2007 01:03:02 -0500 Subject: [Python-3000] atexit module problems/questions In-Reply-To: <18102.33680.607164.490861@montanaro.dyndns.org> References: <18100.62086.177289.274444@montanaro.dyndns.org> <46B526FE.4060500@canterbury.ac.nz> <18101.15137.718715.98755@montanaro.dyndns.org> <46B66765.7060406@canterbury.ac.nz> <18102.33680.607164.490861@montanaro.dyndns.org> Message-ID: <18102.47510.862152.964621@montanaro.dyndns.org> skip> Yes, but an int is both much smaller than a function and can't be skip> involved in cyclic garbage. I also forgot to mention that inexperienced users will probably find it easier to hang onto an int than create a closure Skip From greg.ewing at canterbury.ac.nz Mon Aug 6 08:44:05 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Aug 2007 18:44:05 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B6A7E8.7040001@ronadam.com> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B66E7E.4060209@canterbury.ac.nz> <46B6A7E8.7040001@ronadam.com> Message-ID: <46B6C335.4080504@canterbury.ac.nz> Ron Adam wrote: > If the string __format__ method got 'k10' as a format specifier, it has > no idea what the 'k10' is suppose to mean, it needs to make a choice to > either fall back to str(), or raise an exception that could be caught > and handled. I think Guido's scheme handles this okay. Each type's __format__ decides whether it can handle the format spec, and if not, explicitly delegates to something else such as str(self).__format__. Eventually you will get to a type that either understands the format or has nowhere left to delegate to and raises an exception. -- Greg From walter at livinglogic.de Mon Aug 6 09:51:08 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Mon, 06 Aug 2007 09:51:08 +0200 Subject: [Python-3000] C API cleanup str In-Reply-To: <46B633D0.7050902@v.loewis.de> References: <46B2C8E0.8080409@canterbury.ac.nz> <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> <46B633D0.7050902@v.loewis.de> Message-ID: <46B6D2EC.601@livinglogic.de> Martin v. L?wis wrote: >> You mean if it were fixed it could fail, right? Code calling it should >> be checking for errors anyway because it allocates memory. >> >> Have you tried making this particular change and seeing what fails? > > I now tried, and it turned out that bytes.__reduce__ would break > (again); I fixed it and changed it in r56755. > > It turned out that PyUnicode_FromString was even documented to > accept latin-1. Yes, that seemed to me to be the most obvious interpretion. > While I was looking at it, I wondered why PyUnicode_FromStringAndSize > allows a NULL first argument, creating a null-initialized Unicode > object. Because that's what PyString_FromStringAndSize() does. > This functionality is already available as > PyUnicode_FromUnicode, and callers who previously wrote > > obuf = PyString_FromStringAndSize(NULL, bufsize); > if (!obuf)return NULL; > buf = PyString_AsString(buf); > > could be tricked into believing that they now can change the > string object they just created - which they cannot, as > buf will just be the UTF-8 encoded version of the real string. True, this will no longer work. So should NULL support be dropped from PyUnicode_FromStringAndSize()? Servus, Walter From martin at v.loewis.de Mon Aug 6 10:07:20 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 06 Aug 2007 10:07:20 +0200 Subject: [Python-3000] C API cleanup str In-Reply-To: <46B6D2EC.601@livinglogic.de> References: <46B2C8E0.8080409@canterbury.ac.nz> <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> <46B633D0.7050902@v.loewis.de> <46B6D2EC.601@livinglogic.de> Message-ID: <46B6D6B8.7000207@v.loewis.de> >> I now tried, and it turned out that bytes.__reduce__ would break >> (again); I fixed it and changed it in r56755. >> >> It turned out that PyUnicode_FromString was even documented to >> accept latin-1. > > Yes, that seemed to me to be the most obvious interpretion. Unfortunately, this made creating and retrieving asymmetric: when you do PyUnicode_AsString, you'll get an UTF-8 string; when you do PyUnicode_FromString, you did have to pass Latin-1. Making AsString also return Latin-1 would, of course, restrict the number of cases where it works. >> While I was looking at it, I wondered why PyUnicode_FromStringAndSize >> allows a NULL first argument, creating a null-initialized Unicode >> object. > > Because that's what PyString_FromStringAndSize() does. I guessed that was the historic reason; I just wondered whether the rationale for having it in PyString_FromStringAndSize still applies to Unicode. > So should NULL support be dropped from PyUnicode_FromStringAndSize()? That's my proposal, yes. Regards, Martin From rrr at ronadam.com Mon Aug 6 10:24:27 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 06 Aug 2007 03:24:27 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B6C219.4040900@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> Message-ID: <46B6DABB.3080509@ronadam.com> Greg Ewing wrote: > Ron Adam wrote: >> It's *not* a default. The default is to have no max_width. > > You're suggesting it would be a default if you > *did* specify a max width but no replacement > char. That's what I'm saying would be a bad > default. Absolutely, *for_field_widths* which is string of characters, after the formatting step is done. We could add a numeric max_width that is specific to numbers that default to the '*' character if width overflow is done. As long as the field max_width isn't specified or is shorter than the field max_width, it would do what you want. We could have a pre_process step that adjusts the format width to be within the field min-max range or raises an exception if you really want that. {0:d,10+20} # Field width are just string operations done after # formatting is done. {0:d10+20} # Numeric widths, differ from field widths. # They are specific to the type so can handle special # cases. Now here's the problem with all of this. As we add the widths back into the format specifications, we are basically saying the idea of a separate field width specifier is wrong. So maybe it's not really a separate independent thing after all, and it just a convenient grouping for readability purposes only. So in that case there is no field alignment function, and it's up to the __Format__ method to do both. :-/ Cheers, Ron From rrr at ronadam.com Mon Aug 6 10:40:32 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 06 Aug 2007 03:40:32 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B6C335.4080504@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B66E7E.4060209@canterbury.ac.nz> <46B6A7E8.7040001@ronadam.com> <46B6C335.4080504@canterbury.ac.nz> Message-ID: <46B6DE80.2050000@ronadam.com> Greg Ewing wrote: > Ron Adam wrote: >> If the string __format__ method got 'k10' as a format specifier, it has >> no idea what the 'k10' is suppose to mean, it needs to make a choice to >> either fall back to str(), or raise an exception that could be caught >> and handled. > > I think Guido's scheme handles this okay. Each type's > __format__ decides whether it can handle the format > spec, and if not, explicitly delegates to something > else such as str(self).__format__. Eventually you > will get to a type that either understands the format > or has nowhere left to delegate to and raises an > exception. It sounds like we are describing the same thing, but differ on "where" and "when" things are done, but we still haven't worked out the "what" and "how" yet. I think we need to work out the details "what" from the bottom up and then see "how" we can do those. So the questions I asked are still important. What should happen in various situations of mismatched or invalid type specifiers? When should exceptions occur? Then after that, what features should each format specifier have? Then we can determine what is best for "where" and "when" things should be done. Cheers, Ron From walter at livinglogic.de Mon Aug 6 11:14:21 2007 From: walter at livinglogic.de (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=) Date: Mon, 06 Aug 2007 11:14:21 +0200 Subject: [Python-3000] C API cleanup str In-Reply-To: <46B6D6B8.7000207@v.loewis.de> References: <46B2C8E0.8080409@canterbury.ac.nz> <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> <46B633D0.7050902@v.loewis.de> <46B6D2EC.601@livinglogic.de> <46B6D6B8.7000207@v.loewis.de> Message-ID: <46B6E66D.80301@livinglogic.de> Martin v. L?wis wrote: >>> I now tried, and it turned out that bytes.__reduce__ would break >>> (again); I fixed it and changed it in r56755. >>> >>> It turned out that PyUnicode_FromString was even documented to >>> accept latin-1. >> Yes, that seemed to me to be the most obvious interpretion. > > Unfortunately, this made creating and retrieving asymmetric: > when you do PyUnicode_AsString, you'll get an UTF-8 string; when > you do PyUnicode_FromString, you did have to pass Latin-1. Making > AsString also return Latin-1 would, of course, restrict the number of > cases where it works. True, UTF-8 seems to be the better choice. However all spots in the C source that call PyUnicode_FromString() only pass ASCII anyway, which will probably be the most common case. >>> While I was looking at it, I wondered why PyUnicode_FromStringAndSize >>> allows a NULL first argument, creating a null-initialized Unicode >>> object. >> Because that's what PyString_FromStringAndSize() does. > > I guessed that was the historic reason; I just wondered whether the > rationale for having it in PyString_FromStringAndSize still applies > to Unicode. > >> So should NULL support be dropped from PyUnicode_FromStringAndSize()? > > That's my proposal, yes. At least this would give a clear error message in case someone passes NULL. Servus, Walter From ncoghlan at gmail.com Mon Aug 6 12:58:13 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 06 Aug 2007 20:58:13 +1000 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B5FBD9.4020301@acm.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> Message-ID: <46B6FEC5.9040503@gmail.com> Talin wrote: > Of course, the two can't be entirely isolated because there is > interaction between the two specifiers for some types. For example, it > would normally be the case that padding is applied by 'apply_format', > which knows about the field width and the padding character. However, in > the case of an integer that is printed with leading zeros, the sign must > come *before* the padding: '+000000010'. It's not sufficient to simply > apply padding blindly to the output of __format__, which would give you > '000000+10'. > > (Maybe leading zeros and padding are different things? But the > __format__ would still need to know the field width, which is usually > part of the alignment spec, since it's usually applied as a > post-processing step by 'apply_format') Is the signature of __format__ up for negotiation? If __format__ receives both the alignment specifier and the format specifier as arguments, then the method would be free to return its own string that has already been adjusted to meet the minimum field width. Objects which don't care about alignment details can just return their formatted result and let the standard alignment handler deal with the minimum field width. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Mon Aug 6 13:02:35 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 06 Aug 2007 21:02:35 +1000 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B6B437.7080901@acm.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46B6749E.9020304@canterbury.ac.nz> <46B6B437.7080901@acm.org> Message-ID: <46B6FFCB.6010106@gmail.com> Talin wrote: > Greg Ewing wrote: >> Talin wrote: >>> in >>> the case of an integer that is printed with leading zeros, the sign must >>> come *before* the padding: '+000000010'. It's not sufficient to simply >>> apply padding blindly to the output of __format__, which would give you >>> '000000+10'. >> How about this, then: The apply_format function parses >> the alignment spec and passes the result to the __format__ >> method along with the format spec. The __format__ method >> can then choose to do its own alignment and padding to >> achieve the specified field width. If it returns something >> less than the specified width, apply_format then uses the >> default alignment algorithm. >> >> Then the __format__ method has complete control over the >> whole process if it wants, the only distinction being that >> the alignment spec has a fixed syntax whereas the format >> spec can be anything. > > I think that this is right - at least, I can't think of another way to > do it. Heh, I could have saved myself some typing by reading more before replying. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Mon Aug 6 13:04:48 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 06 Aug 2007 21:04:48 +1000 Subject: [Python-3000] atexit module problems/questions In-Reply-To: <18102.47510.862152.964621@montanaro.dyndns.org> References: <18100.62086.177289.274444@montanaro.dyndns.org> <46B526FE.4060500@canterbury.ac.nz> <18101.15137.718715.98755@montanaro.dyndns.org> <46B66765.7060406@canterbury.ac.nz> <18102.33680.607164.490861@montanaro.dyndns.org> <18102.47510.862152.964621@montanaro.dyndns.org> Message-ID: <46B70050.5010006@gmail.com> skip at pobox.com wrote: > skip> Yes, but an int is both much smaller than a function and can't be > skip> involved in cyclic garbage. > > I also forgot to mention that inexperienced users will probably find it > easier to hang onto an int than create a closure a) functools.partial isn't that hard to use b) we could create it automatically in atexit.register and return it Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Mon Aug 6 13:11:33 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 06 Aug 2007 21:11:33 +1000 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B637DD.7070905@v.loewis.de> References: <46B637DD.7070905@v.loewis.de> Message-ID: <46B701E5.3030206@gmail.com> Martin v. L?wis wrote: > I don't think it needs to be a separate type, > instead, bytes objects could have a idem-potent > .freeze() method which switches the "immutable" > bit on. There would be no way to switch it off > again. +1 here - hashable byte sequences are very handy for dealing with fragments of low level serial protocols. It would also be nice if b"" literals set that immutable flag automatically - otherwise converting some of my lookup tables over to Py3k would be a serious pain (not a pain I'm likely to have to deal with personally given the relative time frames involved, but a pain nonetheless). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From stargaming at gmail.com Mon Aug 6 13:40:13 2007 From: stargaming at gmail.com (Stargaming) Date: Mon, 6 Aug 2007 11:40:13 +0000 (UTC) Subject: [Python-3000] optimizing [x]range References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <46B233D2.4030304@v.loewis.de> Message-ID: On Thu, 02 Aug 2007 21:43:14 +0200, Martin v. L?wis wrote: >> The patch is based on the latest trunk/ checkout, Python 2.6. I don't >> think this is a problem if nobody else made any effort towards making >> xrange more sequence-like in the Python 3000 branch. The C source might >> require some tab/space cleanup. > > Unfortunately, this is exactly what happened: In Py3k, the range object > is defined in terms PyObject*, so your patch won't apply to the 3k > branch. > > Regards, > Martin Fixed. Rewrote the patch for the p3yk branch. I'm not sure if I used the PyNumber-API correctly, I mostly oriented this patch at other range_* methods. See http://sourceforge.net/ tracker/index.php?func=detail&aid=1766304&group_id=5470&atid=305470 Regards, Stargaming From skip at pobox.com Mon Aug 6 14:02:12 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 6 Aug 2007 07:02:12 -0500 Subject: [Python-3000] atexit module problems/questions In-Reply-To: <46B70050.5010006@gmail.com> References: <18100.62086.177289.274444@montanaro.dyndns.org> <46B526FE.4060500@canterbury.ac.nz> <18101.15137.718715.98755@montanaro.dyndns.org> <46B66765.7060406@canterbury.ac.nz> <18102.33680.607164.490861@montanaro.dyndns.org> <18102.47510.862152.964621@montanaro.dyndns.org> <46B70050.5010006@gmail.com> Message-ID: <18103.3524.814185.479602@montanaro.dyndns.org> Nick> a) functools.partial isn't that hard to use Never heard of it and I've been writing Python since the mid-90's. The point is that not everybody dreams in a functional programming style. Nick> b) we could create it automatically in atexit.register and return Nick> it That's a possibility, though I'm still inclined to think returning an ever-increasing int (which is already available as the index into the array) is cleaner and would be microscopically more efficient) is the way to go. In the no-arg case you'd just return the function which was passed in. Is creating and returning a closure going to be a challenge for Jython or IronPython? Changing the focus of this thread a bit, this all seems to be getting a bit baroque. Maybe we should back up and ask why atexit needed to be recast in C in the first place. Can someone enlighten me? At some level it seems more like gratuitous bug insertion than a true necessity. Skip From nicko at nicko.org Mon Aug 6 14:10:11 2007 From: nicko at nicko.org (Nicko van Someren) Date: Mon, 6 Aug 2007 13:10:11 +0100 Subject: [Python-3000] map() Returns Iterator In-Reply-To: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> Message-ID: <918D7BB8-3EAE-4295-A387-6B9AEB06D921@nicko.org> On 4 Aug 2007, at 06:11, Kurt B. Kaiser wrote: > Although there has been quite a bit of discussion on dropping reduce() > and retaining map(), filter(), and zip(), there has been less > discussion > (at least that I can find) on changing them to return iterators > instead > of lists. > > I think of map() and filter() as sequence transformers. To me, it's > an unexpected semantic change that the result is no longer a list. I agree. In almost all of the cases where I would naturally use map rather than a list comprehension either I want the transformed list (rather something that can generate it) or I want the function explicitly called on all the elements of the source list right away (rather than some time later, or perhaps never). > In existing Lib/ code, it's twice as likely that the result of map() > will be assigned than to use it as an iterator in a flow control > statement. > > If the statistics on the usage of map() stay the same, 2/3 of the time > the current implementation will require code like > > foo = list(map(fcn, bar)). I presume that if this semantic change stays we are going to have to add something to 2to3 which will force the creation of a list from the result of any call to map. > map() and filter() were retained primarily because they can produce > more compact and readable code when used correctly. Adding list() > most > of the time seems to diminish this benefit, especially when > combined with > a lambda as the first arg. > > There are a number of instances where map() is called for its side > effect, e.g. > > map(print, line_sequence) > > with the return result ignored. In py3k this has caused many silent > failures. We've been weeding these out, and there are only a couple > left, but there are no doubt many more in 3rd party code. I'm sure that there are lots of these. Other scenarios which will make for ugly bugs include things like map(db_commit, changed_record_list). > The situation with filter() is similar, though it's not used purely > for side effects. zip() is infrequently used. However, IMO for > consistency they should all act the same way. Filter returning an iterator is going to break lots of code which says things like: interesting_things = filter(predicate, things) ... if foo in interesting_things: ... Again, if this semantic stays then 2to3 better fix it. Arguably 2to3 could translate a call to filter() to a list comprehension. > I've seen GvR slides suggesting replacing map() et. al. with list > comprehensions, but never with generator expressions. > > PEP 3100: "Make built-ins return an iterator where appropriate > (e.g. range(), zip(), map(), filter(), etc.)" > > It makes sense for range() to return an iterator. I have my doubts on > map(), filter(), and zip(). Having them return iterators seems to > be a premature optimization. Could something be done in the ast phase > of compilation instead? Looking through code I've written, I suspect that basically whenever I use map(), filter() or zip() in any context other than in a for... loop I am after the concrete list and not an iterator for it. I would hesitate to suggest that it be optimised at compile time, irrespective of the issues resulting from these being built-ins rather that keywords (and thus can be reassigned). Consider we have a function f() has a printing side effect, then we have: for j in [f(i) for i in range(3)]: print j f: 0 f: 1 f: 2 0 1 2 And we have for j in (f(i) for i in range(3)): print j f: 0 0 f: 1 1 f: 2 2 We're talking about changing the behaviour of: for j in map(f, range(3)): print j from the former to the later. If we did some AST phase optimisation so that most of the time map() returned a list but it gave an iterator if it was used inside a for... loop I think it would be dreadfully confusing. IMHO, when I read "Make built-ins return an iterator where appropriate..." I'm inclined to think that it's appropriate for range () and but not for the others. Nicko From jeremy at alum.mit.edu Mon Aug 6 15:30:13 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 6 Aug 2007 09:30:13 -0400 Subject: [Python-3000] should rfc822 accept text io or binary io? Message-ID: This is a fairly specific question, but it gets at a more general issue I don't fully understand. I recently updated httplib and urllib so that they work on the struni branch. A recurring problem with these libraries is that they call methods like strip() and split(). On a string object, calling these methods with no arguments means strip/split whitespace. The bytes object has no corresponding default arguments; whitespace may not be well-defined for bytes. (Or is it?) In general, the approach was to read data as bytes off the socket and convert header lines to iso-8859-1 before processing them. test_urllib2_localnet still fails. One of the problems is that BaseHTTPServer doesn't process HTTP responses correctly. Like httplib, it converts the HTTP status line to iso-8859-1. But it parses the rest of the headers by calling mimetools.Message, which is really rfc822.Message. The header lines of an RFC 822 message (really, RFC 2822) are ascii, so it should be easy to do the conversion. rfc822.Message assumes it is reading from a text file and that readline() returns a string. So the short question is: Should rfc822.Message require a text io object or a binary io object? Or should it except either (via some new constructor or extra arguments to the existing constructor)? I'm not sure how to design an API for bytes vs strings. The API used to be equally well suited for reading from a file or a socket, but they don't behave the same way anymore. Jeremy From steven.bethard at gmail.com Mon Aug 6 17:22:58 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 6 Aug 2007 09:22:58 -0600 Subject: [Python-3000] map() Returns Iterator In-Reply-To: <918D7BB8-3EAE-4295-A387-6B9AEB06D921@nicko.org> References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> <918D7BB8-3EAE-4295-A387-6B9AEB06D921@nicko.org> Message-ID: On 8/6/07, Nicko van Someren wrote: > On 4 Aug 2007, at 06:11, Kurt B. Kaiser wrote: > > There are a number of instances where map() is called for its side > > effect, e.g. > > > > map(print, line_sequence) > > > > with the return result ignored. In py3k this has caused many silent > > failures. We've been weeding these out, and there are only a couple > > left, but there are no doubt many more in 3rd party code. > > I'm sure that there are lots of these. Other scenarios which will > make for ugly bugs include things like map(db_commit, > changed_record_list). I'd just like to say that I'll be glad to see these kind of things go away. This is really a confusing abuse of map() for a reader of the code. > Filter returning an iterator is going to break lots of code which > says things like: > interesting_things = filter(predicate, things) > ... > if foo in interesting_things: ... Actually, as written, that code will work just fine:: >>> from itertools import ifilter as filter >>> interesting_things = filter(str.isalnum, 'a 1 . a1 a1.'.split()) >>> if 'a1' in interesting_things: ... print 'it worked!' ... it worked! Perhaps you meant to have multiple if clauses? STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From guido at python.org Mon Aug 6 19:58:27 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 10:58:27 -0700 Subject: [Python-3000] C API cleanup str In-Reply-To: <46B6E66D.80301@livinglogic.de> References: <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> <46B633D0.7050902@v.loewis.de> <46B6D2EC.601@livinglogic.de> <46B6D6B8.7000207@v.loewis.de> <46B6E66D.80301@livinglogic.de> Message-ID: Do you guys need more guidance on this? It seems Martin's checkin didn't make things worse in the tests deparment -- I find (on Ubuntu) that test_ctypes is now failing, but test_threaded_import started passing. One issue with just putting this in the C API docs is that I believe (tell me if I'm wrong) that these haven't been kept up to date in the struni branch so we'll need to make a lot more changes than just this one... -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 6 20:06:26 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 11:06:26 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B637DD.7070905@v.loewis.de> References: <46B637DD.7070905@v.loewis.de> Message-ID: On 8/5/07, "Martin v. L?wis" wrote: > This brings me to join others in the desire for > immutable bytes objects: I think such a type is > needed, and it should probably use the same > hash algorithm as str8. > > I don't think it needs to be a separate type, > instead, bytes objects could have a idem-potent > .freeze() method which switches the "immutable" > bit on. There would be no way to switch it off > again. I'm sorry, but this is unacceptable. It would make all reasoning based upon the type of the object unsound: if type(X) == bytes, is it hashable? Can we append, delete or replace values? What is the type of a slice of it? If it is currently mutable, will it still be mutable after I call some other function on it? Python has traditionally always used a separate type for this purpose: list vs. tuple, set vs. frozenset. If we are have to have a frozen bytes type, it should be a separate type. > If that is not acceptable, please tell me how else > to fix the dbm modules. By fixing the code that uses them? By using str8 (perhaps renamed to frozenbytes and certainly stripped of its locale-dependent APIs)? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 6 20:18:23 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 11:18:23 -0700 Subject: [Python-3000] map() Returns Iterator In-Reply-To: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> Message-ID: On 8/3/07, Kurt B. Kaiser wrote: > Although there has been quite a bit of discussion on dropping reduce() > and retaining map(), filter(), and zip(), there has been less discussion > (at least that I can find) on changing them to return iterators instead > of lists. That's probably because over the years that this has been on my list of things I'd change most people agreed silently. > I think of map() and filter() as sequence transformers. To me, it's > an unexpected semantic change that the result is no longer a list. Well, enough people thought of them as iteratables to request imap(), ifilter() and izip() added to the itertools library. > In existing Lib/ code, it's twice as likely that the result of map() > will be assigned than to use it as an iterator in a flow control > statement. Did you take into account the number of calls to imap()? > If the statistics on the usage of map() stay the same, 2/3 of the time > the current implementation will require code like > > foo = list(map(fcn, bar)). And the 2to3 tool should make this transformation (unless it can tell from context that it's unnecessary, e.g. in a for-loop, or in a call to list(), tuple() or sorted(). We didn't write the 2to3 transform, but it's easier than some others we already did (e.g. keys()). > map() and filter() were retained primarily because they can produce > more compact and readable code when used correctly. Adding list() most > of the time seems to diminish this benefit, especially when combined with > a lambda as the first arg. When you have a lambda as the first argument the better translation is *definitely* a list comprehension, as it saves creating stack frames for the lambda calls. > There are a number of instances where map() is called for its side > effect, e.g. > > map(print, line_sequence) > > with the return result ignored. I'd just call that bad style. > In py3k this has caused many silent > failures. We've been weeding these out, and there are only a couple > left, but there are no doubt many more in 3rd party code. And that's why the 2to3 tool needs improvement. > The situation with filter() is similar, though it's not used purely > for side effects. Same reply though. Also, have you ever liked the behavior that filter returns a string if the input is a string, a tuple if the input is a tuple, and a list for all other cases? That really sucks IMO. > zip() is infrequently used. It was especially designed for use in for-loops (to end the fruitless discussions trying to come up with parallel iteration syntax). If it wasn't for the fact that iterators hadn't been invented yet at the time, zip() would definitely have returned an iterator right from the start, just as enumerate(). > However, IMO for consistency they should all act the same way. That's not much of consistency. > I've seen GvR slides suggesting replacing map() et. al. with list > comprehensions, but never with generator expressions. Depends purely on context. Also, it's easier to talk about "list comprehensions" than about "list comprehensions or generator expressions" all the time, so I may have abbreviated my suggestions occasionally. > PEP 3100: "Make built-ins return an iterator where appropriate > (e.g. range(), zip(), map(), filter(), etc.)" > > It makes sense for range() to return an iterator. I have my doubts on > map(), filter(), and zip(). Having them return iterators seems to > be a premature optimization. Could something be done in the ast phase > of compilation instead? Not likely, the compiler doesn't know enough about the state of builtins. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nicko at nicko.org Mon Aug 6 20:33:03 2007 From: nicko at nicko.org (Nicko van Someren) Date: Mon, 6 Aug 2007 19:33:03 +0100 Subject: [Python-3000] map() Returns Iterator In-Reply-To: References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> <918D7BB8-3EAE-4295-A387-6B9AEB06D921@nicko.org> Message-ID: On 6 Aug 2007, at 16:22, Steven Bethard wrote: > On 8/6/07, Nicko van Someren wrote: ... >> Filter returning an iterator is going to break lots of code which >> says things like: >> interesting_things = filter(predicate, things) >> ... >> if foo in interesting_things: ... > > Actually, as written, that code will work just fine:: > ... > Perhaps you meant to have multiple if clauses? You're right, I did! I wrote an much longer example with multiple ifs and it looked too verbose, so I edited it down, and lost the meaning. In fact, from a bug tracing point of view it's even worse, since, reusing your example, code with currently reads: interesting_things = filter(str.isalnum, 'a 1 . a1 a1.'.split()) if '1' in interesting_things: print "Failure" if 'a1' in interesting_things: print "... is not an option" will currently do the same as, but in v3 will be different from: interesting_things = filter(str.isalnum, 'a 1 . a1 a1.'.split()) if 'a1' in interesting_things: print "Failure" if '1' in interesting_things: print "... is not an option" If filter() really has to be made to return an iterator then (a) 2to3 is going to have to make lists of its output and (b) the behaviour is going to need to be very clearly documented. I do think that many people are going to be confused by this change. Nicko From bioinformed at gmail.com Mon Aug 6 22:18:22 2007 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Mon, 6 Aug 2007 16:18:22 -0400 Subject: [Python-3000] map() Returns Iterator In-Reply-To: References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> Message-ID: <2e1434c10708061318x1a2edf89x9faf352ecee58f55@mail.gmail.com> On 8/6/07, Guido van Rossum wrote: > > On 8/3/07, Kurt B. Kaiser wrote: > > If the statistics on the usage of map() stay the same, 2/3 of the time > > the current implementation will require code like > > > > foo = list(map(fcn, bar)). > > And the 2to3 tool should make this transformation (unless it can tell > from context that it's unnecessary, e.g. in a for-loop, or in a call > to list(), tuple() or sorted(). I hate to be pedantic, but it is not possible for 2to3 to tell, in general, that it is safe to elide the list() because the result is used directly in a for loop (for the usual arguments that use the Halting Problem as a trump card). e.g., foo = map(f,seq) for i in foo: if g(i): break cannot be correctly transformed to (in Python 2.5 parlance): foo = imap(f,seq) for i in foo: if g(i): break or equivalently: for x in seq: if g(f(x)): break when f has side-effects, since: 1. the loop need not evaluate the entire sequence, resulting in f(i) being called for some prefix of seq 2. g(i) may depend on the side-effects of f not yet determined Given that Python revels in being a non-pure functional language, we can poke fun of examples like: map(db.commit, transactions) but we need to recognize that the semantics are very specific, like: map(db.commit_phase2, map(db.commit_phase1, transactions)) that performs all phase 1 commits before any phase 2 commits. More so, the former idiom is much more sensible when 'commit' returns a meaningful return value that must be stored. Would we blink too much at: results = [] for tx in transactions: results.append(db.commit(tx)) except to consider rewriting in 'modern' syntax as a list comprehension: results = [ db.commit(tx) for tx in transactions ] I'm all for 2to3 being dumb. Dumb but correct. It should always put list() around all uses of map(), filter(), dict.keys(), etc to maintain the exact behavior from 2.6. Let the author of the code optimize away the extra work if/when they feel comfortable doing so. After all, it is their job/reputation/life on the line, not ours. ~Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070806/de0b5034/attachment.html From brett at python.org Mon Aug 6 22:29:07 2007 From: brett at python.org (Brett Cannon) Date: Mon, 6 Aug 2007 13:29:07 -0700 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: References: Message-ID: On 8/6/07, Jeremy Hylton wrote: > This is a fairly specific question, but it gets at a more general > issue I don't fully understand. > > I recently updated httplib and urllib so that they work on the struni > branch. A recurring problem with these libraries is that they call > methods like strip() and split(). On a string object, calling these > methods with no arguments means strip/split whitespace. The bytes > object has no corresponding default arguments; whitespace may not be > well-defined for bytes. (Or is it?) > In general, the approach was to read data as bytes off the socket and > convert header lines to iso-8859-1 before processing them. > > test_urllib2_localnet still fails. One of the problems is that > BaseHTTPServer doesn't process HTTP responses correctly. Like > httplib, it converts the HTTP status line to iso-8859-1. But it > parses the rest of the headers by calling mimetools.Message, which is > really rfc822.Message. The header lines of an RFC 822 message > (really, RFC 2822) are ascii, so it should be easy to do the > conversion. rfc822.Message assumes it is reading from a text file and > that readline() returns a string. > > So the short question is: Should rfc822.Message require a text io > object or a binary io object? Or should it except either (via some > new constructor or extra arguments to the existing constructor)? I'm > not sure how to design an API for bytes vs strings. The API used to > be equally well suited for reading from a file or a socket, but they > don't behave the same way anymore. > This really should be redirected as a question for the email package as rfc822 and mimetools have been deprecated for a while now in favor of using 'email'. -Brett From brett at python.org Mon Aug 6 22:30:23 2007 From: brett at python.org (Brett Cannon) Date: Mon, 6 Aug 2007 13:30:23 -0700 Subject: [Python-3000] C API cleanup str In-Reply-To: References: <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> <46B633D0.7050902@v.loewis.de> <46B6D2EC.601@livinglogic.de> <46B6D6B8.7000207@v.loewis.de> <46B6E66D.80301@livinglogic.de> Message-ID: On 8/6/07, Guido van Rossum wrote: > Do you guys need more guidance on this? It seems Martin's checkin > didn't make things worse in the tests deparment -- I find (on Ubuntu) > that test_ctypes is now failing, but test_threaded_import started > passing. The test_threaded_import pass is from a fix I checked in, not Martin (I think). -Brett From guido at python.org Mon Aug 6 22:43:43 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 13:43:43 -0700 Subject: [Python-3000] map() Returns Iterator In-Reply-To: <2e1434c10708061318x1a2edf89x9faf352ecee58f55@mail.gmail.com> References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> <2e1434c10708061318x1a2edf89x9faf352ecee58f55@mail.gmail.com> Message-ID: On 8/6/07, Kevin Jacobs wrote: > I hate to be pedantic, but it is not possible for 2to3 to tell, in general, > that it is safe to elide the list() because the result is used directly in a > for loop (for the usual arguments that use the Halting Problem as a trump > card). Of course not. 2to3 is a pragmatic tool and doesn't really care. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Mon Aug 6 22:46:15 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 6 Aug 2007 15:46:15 -0500 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: References: Message-ID: <18103.34967.170146.660275@montanaro.dyndns.org> I thought rfc822 was going away. From the current module documentation: Deprecated since release 2.3. The email package should be used in preference to the rfc822 module. This module is present only to maintain backward compatibility. Shouldn't rfc822 be gone altogether in Python 3? Skip From martin at v.loewis.de Mon Aug 6 23:00:43 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 06 Aug 2007 23:00:43 +0200 Subject: [Python-3000] C API cleanup str In-Reply-To: <46B6E66D.80301@livinglogic.de> References: <46B2C8E0.8080409@canterbury.ac.nz> <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> <46B633D0.7050902@v.loewis.de> <46B6D2EC.601@livinglogic.de> <46B6D6B8.7000207@v.loewis.de> <46B6E66D.80301@livinglogic.de> Message-ID: <46B78BFB.7000005@v.loewis.de> >> Unfortunately, this made creating and retrieving asymmetric: >> when you do PyUnicode_AsString, you'll get an UTF-8 string; when >> you do PyUnicode_FromString, you did have to pass Latin-1. Making >> AsString also return Latin-1 would, of course, restrict the number of >> cases where it works. > > True, UTF-8 seems to be the better choice. However all spots in the C > source that call PyUnicode_FromString() only pass ASCII anyway, which > will probably be the most common case. Right - so from a practical point of view, it makes no difference. However, we still need to agree, then standardize, now, so we can give people a consistent picture. >>> So should NULL support be dropped from PyUnicode_FromStringAndSize()? >> That's my proposal, yes. > > At least this would give a clear error message in case someone passes NULL. Ok, so I'm gooing to change it, then. Regards, Martin From martin at v.loewis.de Mon Aug 6 23:07:34 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 06 Aug 2007 23:07:34 +0200 Subject: [Python-3000] C API cleanup str In-Reply-To: References: <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> <46B633D0.7050902@v.loewis.de> <46B6D2EC.601@livinglogic.de> <46B6D6B8.7000207@v.loewis.de> <46B6E66D.80301@livinglogic.de> Message-ID: <46B78D96.4090901@v.loewis.de> > One issue with just putting this in the C API docs is that I believe > (tell me if I'm wrong) that these haven't been kept up to date in the > struni branch so we'll need to make a lot more changes than just this > one... That's certainly the case. However, if we end up deleting the str8 type entirely, I'd be in favor of recycling the PyString_* names for Unicode, in which case everything needs to be edited, anyway. Regards, Martin From martin at v.loewis.de Mon Aug 6 23:15:50 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 06 Aug 2007 23:15:50 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> Message-ID: <46B78F86.9000505@v.loewis.de> >> If that is not acceptable, please tell me how else >> to fix the dbm modules. > > By fixing the code that uses them? I don't know how to do that. All implementation strategies I can think of have significant drawbacks. > By using str8 (perhaps renamed to > frozenbytes and certainly stripped of its locale-dependent APIs)? Ok. So you could agree to a frozenbytes type, then? I'll add one, reusing the implementation of the bytes object. If that is done: a) should one of these be the base type of the other? b) should bytes literals be regular or frozen bytes? c) is it still ok to provide a .freeze() method on bytes returning frozenbytes? d) should unicode.defenc be frozen? e) should (or may) codecs return frozenbytes? Regards, Martin From fdrake at acm.org Mon Aug 6 23:18:21 2007 From: fdrake at acm.org (Fred Drake) Date: Mon, 6 Aug 2007 17:18:21 -0400 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: <18103.34967.170146.660275@montanaro.dyndns.org> References: <18103.34967.170146.660275@montanaro.dyndns.org> Message-ID: <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> On Aug 6, 2007, at 4:46 PM, skip at pobox.com wrote: > I thought rfc822 was going away. From the current module > documentation: > ... > Shouldn't rfc822 be gone altogether in Python 3? Yes. And the answers to Jeremy's questions about what sort of IO is appropriate for the email package should be left to the email-sig as well, I suspect. It's good that they've come up. -Fred -- Fred Drake From talex5 at gmail.com Mon Aug 6 21:33:18 2007 From: talex5 at gmail.com (Thomas Leonard) Date: Mon, 6 Aug 2007 20:33:18 +0100 Subject: [Python-3000] Binary compatibility Message-ID: Hi all, I recently asked about the UCS2 / UCS4 binary compatibility issues with Python on Guido's blog, and Guido suggested I continue the discussion here: http://www.artima.com/forums/flat.jsp?forum=106&thread=211430 The issue is that Python has a compile-time configuration setting which changes its ABI. For example, on Ubuntu we have: $ objdump -T /usr/bin/python|grep UCS 080ac3e0 g DF .text 00000206 Base PyUnicodeUCS4_EncodeUTF8 080b2810 g DF .text 000000ba Base PyUnicodeUCS4_DecodeLatin1 080b6c20 g DF .text 000002b3 Base PyUnicodeUCS4_RSplit ... Whereas on some other systems, including compiled-from-source Python, you get: $ objdump -T python|grep UCS 080abc80 g DF .text 00000201 Base PyUnicodeUCS2_EncodeUTF8 080b32e0 g DF .text 000000c7 Base PyUnicodeUCS2_DecodeLatin1 080b6740 g DF .text 000002b9 Base PyUnicodeUCS2_RSplit (note "UCS2" vs "UCS4") This means that I can't distribute Python extensions as binaries. Any extension built on Ubuntu may fail on some other system. I confess I haven't tried this recently, but it has caused me trouble in the past. I'd like to be sure it won't happen with Python 3. I've hit this problem with both of the open source projects I work on; the ROX desktop (http://rox.sf.net) and Zero Install (http://0install.net). ROX is a desktop environment. Most of our programs are written in (pure) Python. Some, including ROX-Filer, are pure C. Sometimes it would have been useful to combine the two: for example we could write the pager applet in Python if it could use C to talk to the libwnck library, or we could add Python scripting to the filer and gradually migrate more of the code to Python. Zero Install is a decentralised software installation system, itself written entirely in Python, in which software authors publish GPG-signed XML feed files on their websites. These feeds list versions of their programs along with a cryptographic digest of each version's contents (think GIT tree IDs here). This allows installing software without needing root access, while still sharing libraries and programs automatically between (mutually suspicious) users. Although we don't need to use C extensions for the system itself, distributing Python/C hybrid programs with it has been problematic. Another group having similar problems is the Autopackage project: http://trac.autopackage.org/wiki/LinuxProblems#Python http://trac.autopackage.org/wiki/PackagingPythonApps http://plan99.net/~mike/blog/2006/05/24/python-unicode-abi/ Finally, the issue has also been brought up before on the Python lists: http://mail.python.org/pipermail/python-dev/2005-September/056837.html Guido suggested: "Why don't you distribute a Python interpreter binary built with the right options? Depending on users having installed the correct Python version (especially if your users are not programmers) is asking for trouble." There are several problems for us with this approach: - We have to maintain our own version of Python, including pushing out security updates. - We also have to maintain all the Python modules, in particular python-gnome, in a similar way. - Our users have to download Python twice whenever there's a new release. - If some programs are using the distribution's Python and some are using ours (libraries installed using Zero Install are only used by software itself installed the same way; distribution packages aren't affected), two copies of Python must be loaded into memory. This is slow and wasteful of memory. This is assuming all third-party code uses Zero Install for distribution, so that only one extra version of Python is required. For people distributing programs by other means, they would also have to include their own copies of Python, leading to even more waste. >From our point of view, it would be better if the format of strings was an internal implementation detail. For most users, it doesn't matter what the setting is, as long as the public interface doesn't change! The cost of converting between formats is small, and in any case most software outside of Python (the GNOME stack, for example) uses UTF-8, so all strings have to be converted when going in or out of Python anyway. An alternative would be to default to UCS4, and give the option an alarming name such as --with-unicode-for-space-limited-devices or something so that packagers don't mess with it. Thanks, -- Dr Thomas Leonard http://rox.sourceforge.net GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1 From guido at python.org Mon Aug 6 23:33:34 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 14:33:34 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B78F86.9000505@v.loewis.de> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> Message-ID: On 8/6/07, "Martin v. L?wis" wrote: > >> If that is not acceptable, please tell me how else > >> to fix the dbm modules. > > > > By fixing the code that uses them? > > I don't know how to do that. All implementation strategies I > can think of have significant drawbacks. Can you elaborate about the issues? > > By using str8 (perhaps renamed to > > frozenbytes and certainly stripped of its locale-dependent APIs)? > > Ok. So you could agree to a frozenbytes type, then? I'll add one, > reusing the implementation of the bytes object. Not quite. It's the least evil. I'm hoping to put off the decision. Could you start using str8 instead for now? Or is that not usable for a fix? (If so, why not?) > If that is done: > > a) should one of these be the base type of the other? No. List and tuple don't inherit from each other, nor do set and frozenset. A common base class is okay. (We didn't quite do this for sets but it makes sense for Py3k to change this.) > b) should bytes literals be regular or frozen bytes? Regular -- set literals produce mutable sets, too. > c) is it still ok to provide a .freeze() method on > bytes returning frozenbytes? I'd rather use the same kind of API used between set and frozenset: each constructor takes the other as argument. > d) should unicode.defenc be frozen? Yes. It's currently a str8 isn't it? So that's already the case. > e) should (or may) codecs return frozenbytes? I think it would be more convenient and consistent if all APIs returned mutable bytes and the only API that creates frozen bytes was the frozen bytes constructor. (defenc excepted as it's a C-level API and having it be mutable would be bad.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Mon Aug 6 23:52:21 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 06 Aug 2007 23:52:21 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> Message-ID: <46B79815.1030504@v.loewis.de> >> I don't know how to do that. All implementation strategies I >> can think of have significant drawbacks. > > Can you elaborate about the issues? It's a decision tree: 0. whichdb fails 1. should the DB APIs use strings or bytes as keys and values? Given the discussion of bsddb, I went for "bytes". I replace f["1"] = b"1" with f[b"1"] = b"1" 2. then, dumbdbm fails, with TypeError: keys must be strings. I change __setitem__ to expect bytes instead of basestring 3. it fails with unhashable type: 'bytes' in line 166: if key not in self._index: _index is a dictionary. It's really essential that the key can be found quickly in _index, since this is how it finds the data in the database (so using, say, a linear search would be no option) > Not quite. It's the least evil. I'm hoping to put off the decision. For how long? Do you expect to receive further information that will make a decision simpler? > Could you start using str8 instead for now? Or is that not usable for > a fix? (If so, why not?) It should work, although I probably will have to fix the index file generation in dumbdbm (either way), since it uses %r to generate the index; this would put s prefixes into the file which won't be understood on reading (it uses eval() to process the index, which might need fixing, anyway) > No. List and tuple don't inherit from each other, nor do set and > frozenset. A common base class is okay. (We didn't quite do this for > sets but it makes sense for Py3k to change this.) Ok, so there would be basebytes, I assume. >> d) should unicode.defenc be frozen? > > Yes. It's currently a str8 isn't it? So that's already the case. Right. I think I will have to bite the bullet and use str8 explicitly, although doing so makes me shudder. Regards, Martin From guido at python.org Tue Aug 7 00:33:57 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 15:33:57 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B79815.1030504@v.loewis.de> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> Message-ID: On 8/6/07, "Martin v. L?wis" wrote: > I think I will have to bite the bullet and use str8 explicitly, > although doing so makes me shudder. This is the right short-term solution IMO. We'll rename or reconsider later -- either closer to the a1 release or after. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 7 01:55:18 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 16:55:18 -0700 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! Message-ID: We're down to 11 failing test in the struni branch. I'd like to get this down to zero ASAP so that we can retire the old p3yk (yes, with typo!) branch and rename py3k-struni to py3k. Please help! Here's the list of failing tests: test_ctypes Recently one test started failing again, after Martin changed PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1. test_email test_email_codecs test_email_renamed Can someone contact the email-sig and ask for help with these? test_minidom Recently started failing again; probably shallow. test_sqlite Virgin territory, probably best done by whoever wrote the code or at least someone with time to spare. test_tarfile Virgin territory again (but different owner :-). test_urllib2_localnet test_urllib2net I think Jeremy Hylton may be close to fixing these, he's done a lot of work on urllib and httplib. test_xml_etree_c Virgin territory again. There are also a few tests that only fail on CYGWIN or OSX; I won't bother listing these. If you want to help, please refer to this wiki page: http://wiki.python.org/moin/Py3kStrUniTests There are also other tasks; see http://wiki.python.org/moin/Py3kToDo -- --Guido van Rossum (home page: http://www.python.org/~guido/) From shiblon at gmail.com Tue Aug 7 02:09:11 2007 From: shiblon at gmail.com (Chris Monson) Date: Mon, 6 Aug 2007 20:09:11 -0400 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> Message-ID: On 8/6/07, Guido van Rossum wrote: > > On 8/6/07, "Martin v. L?wis" wrote: > > >> If that is not acceptable, please tell me how else > > >> to fix the dbm modules. > > > > > > By fixing the code that uses them? > > > > I don't know how to do that. All implementation strategies I > > can think of have significant drawbacks. > > Can you elaborate about the issues? > > > > By using str8 (perhaps renamed to > > > frozenbytes and certainly stripped of its locale-dependent APIs)? > > > > Ok. So you could agree to a frozenbytes type, then? I'll add one, > > reusing the implementation of the bytes object. > > Not quite. It's the least evil. I'm hoping to put off the decision. > > Could you start using str8 instead for now? Or is that not usable for > a fix? (If so, why not?) > > > If that is done: > > > > a) should one of these be the base type of the other? > > No. List and tuple don't inherit from each other, nor do set and > frozenset. A common base class is okay. (We didn't quite do this for > sets but it makes sense for Py3k to change this.) > > > b) should bytes literals be regular or frozen bytes? > > Regular -- set literals produce mutable sets, too. But all other string literals produce immutable types: "" r"" u"" (going away, but still) and hopefully b"" Wouldn't it be confusing to have b"" be the only mutable quote-delimited literal? For everything else, there's bytes(). :-) - C > c) is it still ok to provide a .freeze() method on > > bytes returning frozenbytes? > > I'd rather use the same kind of API used between set and frozenset: > each constructor takes the other as argument. > > > d) should unicode.defenc be frozen? > > Yes. It's currently a str8 isn't it? So that's already the case. > > > e) should (or may) codecs return frozenbytes? > > I think it would be more convenient and consistent if all APIs > returned mutable bytes and the only API that creates frozen bytes was > the frozen bytes constructor. (defenc excepted as it's a C-level API > and having it be mutable would be bad.) > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/shiblon%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070806/5973ca68/attachment.html From guido at python.org Tue Aug 7 02:19:12 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 17:19:12 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> Message-ID: On 8/6/07, Chris Monson wrote: > On 8/6/07, Guido van Rossum wrote: > > On 8/6/07, "Martin v. L?wis" wrote: > > > b) should bytes literals be regular or frozen bytes? > > > > Regular -- set literals produce mutable sets, too. > > But all other string literals produce immutable types: > > "" > r"" > u"" (going away, but still) > and hopefully b"" > > Wouldn't it be confusing to have b"" be the only mutable quote-delimited > literal? For everything else, there's bytes(). Well, it would be just as confusing to have a bytes literal and not have it return a bytes object. The frozenbytes type is intended (if I understand the use case correctly) as for the relatively rare case where bytes must be used as dict keys and we can't assume that the bytes use any particular encoding. Personally, I still think that converting to the latin-1 encoding is probably just as good for this particular use case. So perhaps I don't understand the use case(s?) correctly. > :-) What does the :-) mean? That you're not seriously objecting? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 7 02:39:15 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 17:39:15 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B79815.1030504@v.loewis.de> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> Message-ID: On 8/6/07, "Martin v. L?wis" wrote: > >> I don't know how to do that. All implementation strategies I > >> can think of have significant drawbacks. > > > > Can you elaborate about the issues? > > It's a decision tree: > > 0. whichdb fails > > 1. should the DB APIs use strings or bytes as keys and values? > Given the discussion of bsddb, I went for "bytes". I replace > > f["1"] = b"1" > with > f[b"1"] = b"1" > > 2. then, dumbdbm fails, with TypeError: keys must be strings. > I change __setitem__ to expect bytes instead of basestring > > 3. it fails with unhashable type: 'bytes' in line 166: > > if key not in self._index: > > _index is a dictionary. It's really essential that the key > can be found quickly in _index, since this is how it finds > the data in the database (so using, say, a linear search would > be no option) I thought about this issue some more. Given that the *dbm types strive for emulating dicts, I think it makes sense to use strings for the keys, and bytes for the values; this makes them more plug-compatible with real dicts. (We should ideally also change the keys() method etc. to return views.) This of course requires that we know the encoding used for the keys. Perhaps it would be acceptable to pick a conservative default encoding (e.g. ASCII) and add an encoding argument to the open() method. Perhaps this will work? It seems better than using str8 or bytes for the keys. > > Not quite. It's the least evil. I'm hoping to put off the decision. > > For how long? Do you expect to receive further information that will > make a decision simpler? I'm waiting for a show-stopper issue that can't be solved without having an immutable bytes type. It would be great if we could prove to ourselves that such a show-stopper will never happen; or if we found one quickly. But so far the show-stopper candidates aren't convincing. At the same time we still have enough uses of str9 and PyString left in the code base that we can't kill str8 yet. It would be great if we had the decision before alpha 1 but I'm okay if it remains open a bit longer (1-2 months past alpha 1). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From kbk at shore.net Tue Aug 7 02:49:04 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Mon, 06 Aug 2007 20:49:04 -0400 Subject: [Python-3000] map() Returns Iterator In-Reply-To: (Guido van Rossum's message of "Mon, 6 Aug 2007 11:18:23 -0700") References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> Message-ID: <87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com> "Guido van Rossum" writes: >> I think of map() and filter() as sequence transformers. To me, it's >> an unexpected semantic change that the result is no longer a list. > > Well, enough people thought of them as iteratables to request imap(), > ifilter() and izip() added to the itertools library. Agreed. When processing (possibly very long) streams, the lazy versions have great advantages. 3>> def ones(): while True: yield 1 3>> a = map(lambda x: x+x, ones()) 3>> b = map(lambda x: x+1, a) 3>> b.__next__() 3 If you try that in 2.6 you fill memory. However, IMHO eliminating the strict versions of map() and filter() in favor of the lazy versions from itertools kicks the degree of sophistication necessary to understand these functions up a notch (or three). 3>> c, d = map(int, ('1', '2')) 3>> c, d (1, 2) 3>> e = map(int, ('1', '2')) 3>> f, g = e 3>> f, g (1, 2) 3>> f, g = e Traceback (most recent call last): File "", line 1, in f, g = e ValueError: need more than 0 values to unpack To say nothing of remembering having to use 3>> foo = (list(map(bar))) most the time. I'd say keep map(), filter(), imap() and ifilter(), and use the latter when you're working with streams. "Explicit is better than implicit." Then there's the silent failure to process the side-effects of 3>> map(print, lines) which is rather unexpected. To me, this code is quite readable and not at all pathological (no more than any print statement :). It may not be Pythonic in the modern idiom (that pattern is found mostly in code.py and IDLE, and it's very rare), but it's legal and it's a little surprising that it's necessary to spell it 3>> list(map(print, lines)) now to get any action. It took me awhile to track down the failures in the interactive interpreter emulator because that pattern was being used to print the exceptions; the thing just produced no output at all. The alternatives 3>> print('\n'.join(lines)) or 3>> (print(line) for line in lines) # oops, nothing happened 3>> [print(line) for line in lines] aren't much of an improvement. >> In existing Lib/ code, it's twice as likely that the result of map() >> will be assigned than to use it as an iterator in a flow control >> statement. > > Did you take into account the number of calls to imap()? No. Since the py3k branch is partially converted, I went back to 2.6, where skipping Lib/test/, there are (approximately!!): 87 assignments of the output of map(), passing a list 21 assignments involving map(), but not directly. Many of these involve 'tuple' or 'join' and could accept an iterator. 58 return statements involving map() (39 directly) 1 use to construct a list used as an argument 2 for ... in map() (!!) and 1 for ... in enumerate(map(...)) 1 use as map(foo, bar) == baz_list 5 uses of imap() [...] > We didn't write the 2to3 transform, but it's easier than some others > we already did (e.g. keys()). I see a transform in svn. As an aside, is there any accepted process for running these transforms over the p3yk branch? Some parts of Lib/ are converted, possibly by hand, possibly by 2to3, and other parts are not. [...] > When you have a lambda as the first argument the better translation is > *definitely* a list comprehension, as it saves creating stack frames > for the lambda calls. Thanks, good tip. [...] > Also, have you ever liked the behavior that filter returns a string if > the input is a string, a tuple if the input is a tuple, and a list for > all other cases? That really sucks IMO. If you look at map() and filter() as sequence transformers, that makes some sense: preserve the type of the sequence if possible. But clearly map() and filter() should act the same way! >> zip() is infrequently used. > > It was especially designed for use in for-loops (to end the fruitless > discussions trying to come up with parallel iteration syntax). If it > wasn't for the fact that iterators hadn't been invented yet at the > time, zip() would definitely have returned an iterator right from the > start, just as enumerate(). > >> However, IMO for consistency they should all act the same way. > > That's not much of consistency. It's used only fifteen times in 2.6 Lib/ and four of those are izip(). Eight are assignments, mostly to build dicts. Six are in for-loops. One is a return. >> Could something be done in the ast phase of compilation instead? > > Not likely, the compiler doesn't know enough about the state of builtins. OK, thanks for the reply. -- KBK From greg.ewing at canterbury.ac.nz Tue Aug 7 02:53:27 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 07 Aug 2007 12:53:27 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B6DABB.3080509@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> Message-ID: <46B7C287.3030007@canterbury.ac.nz> Ron Adam wrote: > {0:d,10+20} # Field width are just string operations done after > # formatting is done. > > {0:d10+20} # Numeric widths, differ from field widths. > # They are specific to the type so can handle special > # cases. Still not good - if you confuse two very similar and easily confusable things, your numbers get chopped off. I'm with Guido on this one: only strings should have a max width, and it should be part of the string format spec, so you can't accidentally apply it to numbers. -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 7 02:57:13 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 07 Aug 2007 12:57:13 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B6DE80.2050000@ronadam.com> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B66E7E.4060209@canterbury.ac.nz> <46B6A7E8.7040001@ronadam.com> <46B6C335.4080504@canterbury.ac.nz> <46B6DE80.2050000@ronadam.com> Message-ID: <46B7C369.3040509@canterbury.ac.nz> Ron Adam wrote: > What should happen in various situations of mismatched or invalid type > specifiers? I think that a format string that is not understood by any part of the system should raise an exception (rather than, e.g. falling back on str()). Refuse the temptation to guess. -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 7 03:30:29 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 07 Aug 2007 13:30:29 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> Message-ID: <46B7CB35.5020204@canterbury.ac.nz> Guido van Rossum wrote: > The frozenbytes type is intended (if I > understand the use case correctly) as for the relatively rare case > where bytes must be used as dict keys Another issue I can see is that not having a frozen bytes literal means that there is no efficient way of embedding constant bytes data in a program. You end up with extra overhead in both time and space (two copies of the data in memory, and extra time needed to make the copy). If the literal form is frozen, on the other hand, you only incur these overheads when you really need a mutable copy of the data. -- Greg From talin at acm.org Tue Aug 7 03:40:28 2007 From: talin at acm.org (Talin) Date: Mon, 06 Aug 2007 18:40:28 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B6DABB.3080509@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> Message-ID: <46B7CD8C.5070807@acm.org> Ron Adam wrote: > Now here's the problem with all of this. As we add the widths back into > the format specifications, we are basically saying the idea of a separate > field width specifier is wrong. > > So maybe it's not really a separate independent thing after all, and it > just a convenient grouping for readability purposes only. I'm beginning to suspect that this is indeed the case. Before we go too much further, let me give out the URLs for the .Net documentation on these topics, since much of the current design we're discussing has been inspired by .Net: http://msdn2.microsoft.com/en-us/library/dwhawy9k.aspx http://msdn2.microsoft.com/en-us/library/0c899ak8.aspx http://msdn2.microsoft.com/en-us/library/0asazeez.aspx http://msdn2.microsoft.com/en-us/library/c3s1ez6e.aspx http://msdn2.microsoft.com/en-us/library/az4se3k1.aspx http://msdn2.microsoft.com/en-us/library/txafckwd.aspx I'd suggest some study of these. Although I would warn against adopting this wholesale, as there are a huge number of features described in these documents, more than I think we need. One other URL for people who want to play around with implementing this stuff is my Python prototype of the original version of the PEP. It has all the code you need to format floats with decimal precision, exponents, and so on: http://www.viridia.org/hg/python/string_format?f=5e4b833ed285;file=StringFormat.py;style=raw -- Talin From fdrake at acm.org Tue Aug 7 03:41:40 2007 From: fdrake at acm.org (Fred Drake) Date: Mon, 6 Aug 2007 21:41:40 -0400 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7CB35.5020204@canterbury.ac.nz> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B7CB35.5020204@canterbury.ac.nz> Message-ID: On Aug 6, 2007, at 9:30 PM, Greg Ewing wrote: > If the literal form is frozen, on the other hand, > you only incur these overheads when you really need > a mutable copy of the data. Indeed. I have no reason to think the desire for a frozen form is the oddball case; I suspect it will be much more common than the need for mutable bytes objects from literal data in my own code. -Fred -- Fred Drake From talin at acm.org Tue Aug 7 03:43:03 2007 From: talin at acm.org (Talin) Date: Mon, 06 Aug 2007 18:43:03 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B701E5.3030206@gmail.com> References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com> Message-ID: <46B7CE27.3030103@acm.org> Nick Coghlan wrote: > Martin v. L?wis wrote: >> I don't think it needs to be a separate type, >> instead, bytes objects could have a idem-potent >> .freeze() method which switches the "immutable" >> bit on. There would be no way to switch it off >> again. > > +1 here - hashable byte sequences are very handy for dealing with > fragments of low level serial protocols. > > It would also be nice if b"" literals set that immutable flag > automatically - otherwise converting some of my lookup tables over to > Py3k would be a serious pain (not a pain I'm likely to have to deal with > personally given the relative time frames involved, but a pain nonetheless). I'm also for an immutable bytes type - but I'm not so sure about freezing in place. The most efficient representation of immutable bytes is quite different from the most efficient representation of mutable bytes. Rather, I think that they should both share a common Abstract Base Class defining what you can do with immutable byte strings, but the actual storage of the bytes themselves should be implemented in the subclass. -- Talin From mike.klaas at gmail.com Tue Aug 7 03:57:07 2007 From: mike.klaas at gmail.com (Mike Klaas) Date: Mon, 6 Aug 2007 18:57:07 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules References: Message-ID: On 6-Aug-07, at 5:39 PM, Guido van Rossum wrote: > > I thought about this issue some more. > > Given that the *dbm types strive for emulating dicts, I think it makes > sense to use strings for the keys, and bytes for the values; this > makes them more plug-compatible with real dicts. (We should ideally > also change the keys() method etc. to return views.) This of course > requires that we know the encoding used for the keys. Perhaps it would > be acceptable to pick a conservative default encoding (e.g. ASCII) and > add an encoding argument to the open() method. > > Perhaps this will work? It seems better than using str8 or bytes > for the keys. There are some scenarios that might be difficult under such a regime. The berkeley api provides means for efficiently mapping a bytestring to another bytestring. Often, the data is not text, and the performance of the database is sensitive to the means of serialization. For instance, it is quite common to use integers as keys. If you are inserting keys in order, it is about a hundred times faster to encode the ints in big-endian byte order than than little-endian: class MyIntDB(object): def __setitem__(self, key, item): self.db.put(struct.pack('>Q', key), serializer(item)) def __getitem__(self, key): return unserializer(self.db.get(struct.pack('>Q', key))) How do you envision these types of tasks being accomplished with unicode keys? It is conceivable that one could write a custom unicode encoding that accomplishes this, convert the key to unicode, and pass the custom encoding name to the constructor. regards, -Mike From guido at python.org Tue Aug 7 04:06:34 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 19:06:34 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: Message-ID: On 8/6/07, Mike Klaas wrote: > On 6-Aug-07, at 5:39 PM, Guido van Rossum wrote: > > Given that the *dbm types strive for emulating dicts, I think it makes > > sense to use strings for the keys, and bytes for the values; this > > makes them more plug-compatible with real dicts. (We should ideally > > also change the keys() method etc. to return views.) This of course > > requires that we know the encoding used for the keys. Perhaps it would > > be acceptable to pick a conservative default encoding (e.g. ASCII) and > > add an encoding argument to the open() method. > > > > Perhaps this will work? It seems better than using str8 or bytes > > for the keys. > > There are some scenarios that might be difficult under such a regime. > > The berkeley api provides means for efficiently mapping a bytestring > to another bytestring. Often, the data is not text, and the > performance of the database is sensitive to the means of serialization. > > For instance, it is quite common to use integers as keys. If you are > inserting keys in order, it is about a hundred times faster to encode > the ints in big-endian byte order than than little-endian: I'm assuming that this speed difference says something about the implementation of the underlying dbm package. Which package did you use to measure this? > class MyIntDB(object): > def __setitem__(self, key, item): > self.db.put(struct.pack('>Q', key), serializer(item)) > def __getitem__(self, key): > return unserializer(self.db.get(struct.pack('>Q', key))) > > How do you envision these types of tasks being accomplished with > unicode keys? It is conceivable that one could write a custom > unicode encoding that accomplishes this, convert the key to unicode, > and pass the custom encoding name to the constructor. Well, the *easiest* (I don't know about simplest) way to use ints as keys is of course to use the decimal representation. You'd use str(key) instead of struct.pack(). This would of course not maintain key order -- is that important? If you need to be compatible with struct.pack(), and we were to choose Unicode strings for the keys in the API, then you might have to do something like struct.pack(...).encode("latin-1") and specify latin-1 as the database's key encoding. Of course this may not be compatible with an external constraint (e.g. another application that already has a key format) but in that case you may have to use arbitrary tricks anyway (the latin-1 encoding might still be helpful). However, I give you that a pure bytes API would be more convenient at times. How about we define two APIs, using raw bytes and one using strings + a given encoding? Or perhaps a special value of the encoding argument passed to *dbm.open() (maybe None, maybe the default, maybe "raw" or "bytes"?) to specify that the key values are to be bytes? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 7 04:22:40 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 19:22:40 -0700 Subject: [Python-3000] map() Returns Iterator In-Reply-To: <87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com> References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> <87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com> Message-ID: On 8/6/07, Kurt B. Kaiser wrote: > "Guido van Rossum" writes: [...pushback...] > However, IMHO eliminating the strict versions of map() and filter() in > favor of the lazy versions from itertools kicks the degree of > sophistication necessary to understand these functions up a notch (or > three). I wonder how bad this is given that range() and dict.keys() and friends will also stop returning lists? I don't think you ever saw any of my Py3k presentations (the slides of the latest one at here: http://conferences.oreillynet.com/presentations/os2007/os_vanrossum.ppt). I've always made a point of suggesting that we're switching to returning iterators instead of lists from as many APIs as makes sense (I stop at str.split() though, as I can't think of a use case where the list would be so big as to be bothersome). > To say nothing of remembering having to use > > 3>> foo = (list(map(bar))) > > most the time. I think you're overreacting due to your experience with conversion of existing code. I expect that new use cases where a list is needed will generally be written using list comprehensions in Py3k-specific tutorials, and generator expressions for situations where a list isn't needed (as a slightly more advanced feature). Then map() and filter() can be shown as more advanced optimizations of certain end cases. > I'd say keep map(), filter(), imap() and ifilter(), and > use the latter when you're working with streams. > > "Explicit is better than implicit." > > Then there's the silent failure to process the side-effects of > > 3>> map(print, lines) > > which is rather unexpected. To me, this code is quite readable and not > at all pathological (no more than any print statement :). It may not be > Pythonic in the modern idiom (that pattern is found mostly in code.py > and IDLE, and it's very rare), but it's legal and it's a little > surprising that it's necessary to spell it > > 3>> list(map(print, lines)) > > now to get any action. Aren't you a little too fond of this idiom? I've always found it a little surprising when I encountered it, and replaced it with the more straightforward for line in lines: print(line) > It took me awhile to track down the failures in > the interactive interpreter emulator because that pattern was being used > to print the exceptions; the thing just produced no output at all. I think that's just a side effect of the conversion. I take it you didn't use 2to3? > The alternatives > > 3>> print('\n'.join(lines)) > > or > > 3>> (print(line) for line in lines) # oops, nothing happened > 3>> [print(line) for line in lines] > > aren't much of an improvement. Well duh. Really. What's wrong with writing it as a plain old for-loop? > >> In existing Lib/ code, it's twice as likely that the result of map() > >> will be assigned than to use it as an iterator in a flow control > >> statement. > > > > Did you take into account the number of calls to imap()? > > No. Since the py3k branch is partially converted, I went back to 2.6, > where skipping Lib/test/, there are (approximately!!): > > 87 assignments of the output of map(), passing a list > 21 assignments involving map(), but not directly. Many of these involve > 'tuple' or 'join' and could accept an iterator. > 58 return statements involving map() (39 directly) > 1 use to construct a list used as an argument > 2 for ... in map() (!!) and 1 for ... in enumerate(map(...)) > 1 use as map(foo, bar) == baz_list > 5 uses of imap() I'm not sure what the relevant of assignments is. I can assign an iterator to a variable and do stuff with it and never require it to be a list. I can also pass a map() call into a function and then it depends on what the function does to that argument. > [...] > > > We didn't write the 2to3 transform, but it's easier than some others > > we already did (e.g. keys()). > > I see a transform in svn. I guess I didn't look well enough. > As an aside, is there any accepted process > for running these transforms over the p3yk branch? Some parts of Lib/ > are converted, possibly by hand, possibly by 2to3, and other parts are > not. (Aside: Please skip the p3yk branch and use the py3k-struni branch -- it's the way of the future.) I tend to do manual conversion of the stdlib because it's on the bleeding edge. At times I've regretted this, and gone back and run a particular transform over some of the code. I rarely use the full set of transforms on a whole subtree, although others sometimes do that. Do note the options that help convert doctests and deal with print() already being a function. [zip()] > It's used only fifteen times in 2.6 Lib/ and four of those are > izip(). Eight are assignments, mostly to build dicts. I don't understand. What's an "assignment" to build a dict? Do you mean something like dict(zip(keys, values)) ? That's an ideal use case for an iterator. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mike.klaas at gmail.com Tue Aug 7 04:47:37 2007 From: mike.klaas at gmail.com (Mike Klaas) Date: Mon, 6 Aug 2007 19:47:37 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: Message-ID: On 6-Aug-07, at 7:06 PM, Guido van Rossum wrote: > On 8/6/07, Mike Klaas wrote: >> For instance, it is quite common to use integers as keys. If you are >> inserting keys in order, it is about a hundred times faster to encode >> the ints in big-endian byte order than than little-endian: > > I'm assuming that this speed difference says something about the > implementation of the underlying dbm package. Which package did you > use to measure this? This is true for bsddb backed by Berkeley DB, but it should be true to some extent in any btree-based database. btrees are much more efficiently-constructed if built in key-order (especially when they don't fit in memory), and the difference stems purely from the nature of the representation: the little-endian byte representation of sorted integers is no longer sorted. The big- endian representation preserves the sort order. >> class MyIntDB(object): >> def __setitem__(self, key, item): >> self.db.put(struct.pack('>Q', key), serializer(item)) >> def __getitem__(self, key): >> return unserializer(self.db.get(struct.pack('>Q', >> key))) >> >> How do you envision these types of tasks being accomplished with >> unicode keys? It is conceivable that one could write a custom >> unicode encoding that accomplishes this, convert the key to unicode, >> and pass the custom encoding name to the constructor. > > Well, the *easiest* (I don't know about simplest) way to use ints as > keys is of course to use the decimal representation. You'd use > str(key) instead of struct.pack(). This would of course not maintain > key order -- is that important? If you need to be compatible with > struct.pack(), and we were to choose Unicode strings for the keys in > the API, then you might have to do something like > struct.pack(...).encode("latin-1") and specify latin-1 as the > database's key encoding. The decimal representation would work if it were left-padded appropriately, though it would be somewhat space-inefficient. The second option you propose is likely the most feasible. > Of course this may not be compatible with an external constraint (e.g. > another application that already has a key format) but in that case > you may have to use arbitrary tricks anyway (the latin-1 encoding > might still be helpful). > > However, I give you that a pure bytes API would be more convenient > at times. > > How about we define two APIs, using raw bytes and one using strings + > a given encoding? > > Or perhaps a special value of the encoding argument passed to > *dbm.open() (maybe None, maybe the default, maybe "raw" or "bytes"?) > to specify that the key values are to be bytes? Either option sounds fine, but you're still left with the need to implement the raw bytes version in dumbdbm. ISTM that this issue boils down to the question of if and how byte sequences can be hashed in py3k. Assuming you are trying to implement a file parser that dispatches to various methods based on binary data, what is the pythonic way to do this in py3k? One option is to .decode('latin-1') and dispatch on the (meaningless) text. Another is for something like str8 to be kept around. Yet another is to use non-hash-based datastructures (like trees) to implement these algorithms. -Mike From rrr at ronadam.com Tue Aug 7 04:53:26 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 06 Aug 2007 21:53:26 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B7C369.3040509@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B66E7E.4060209@canterbury.ac.nz> <46B6A7E8.7040001@ronadam.com> <46B6C335.4080504@canterbury.ac.nz> <46B6DE80.2050000@ronadam.com> <46B7C369.3040509@canterbury.ac.nz> Message-ID: <46B7DEA6.5050609@ronadam.com> Greg Ewing wrote: > Ron Adam wrote: >> What should happen in various situations of mismatched or invalid type >> specifiers? > > I think that a format string that is not understood > by any part of the system should raise an exception > (rather than, e.g. falling back on str()). Refuse the > temptation to guess. That handles invalid type specifiers. What about mismatched specifiers? Try to convert the data? Raise an exception? Either depending on what the type specifier is? I think the opinion so far is to let the objects __format__ method determine this, but we need to figure this out what the built in types will do. Ron From martin at v.loewis.de Tue Aug 7 05:13:43 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Aug 2007 05:13:43 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> Message-ID: <46B7E367.5060409@v.loewis.de> > Personally, I still think that converting to the latin-1 encoding is > probably just as good for this particular use case. So perhaps I don't > understand the use case(s?) correctly. No, it rather means that this solution didn't occur to me. It's a bit expensive, since every access (getitem or setitem) will cause a recoding, if the parameters are required to be bytes - but so would any other solution that you can accept (i.e. use str8, use a separate frozenbytes) - they all require that you copy the key parameter in setitem/getitem. So this sounds better than using str8. Regards, Martin From martin at v.loewis.de Tue Aug 7 05:27:40 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Aug 2007 05:27:40 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> Message-ID: <46B7E6AC.3020102@v.loewis.de> > I thought about this issue some more. > > Given that the *dbm types strive for emulating dicts, I think it makes > sense to use strings for the keys, and bytes for the values; this > makes them more plug-compatible with real dicts. (We should ideally > also change the keys() method etc. to return views.) This of course > requires that we know the encoding used for the keys. Perhaps it would > be acceptable to pick a conservative default encoding (e.g. ASCII) and > add an encoding argument to the open() method. > > Perhaps this will work? It seems better than using str8 or bytes for the keys. It would work, but it would not be good. The dbm files traditionally did not have any notion of character encoding for keys or values; they are really bytes:bytes mappings. The encoding used for the keys might not be known, or it might not be consistent across all keys. Furthermore, for the specific case of bsddb, some users pointed out that they absolutely think that keys must be bytes, since they *conceptually* aren't text at all. "Big" users of bsddb create databases where some tables are index tables for other tables; in such tables, the keys are combinations of fields where the byte representation allows for efficient lookup (akin postgres "create index foo_idx on foo(f1, f2, f3);" where the key to the index becomes the concatenation of f1, f2, and f3 - and f2 may be INTEGER, f3 TIMESTAMP WITHOUT TIME ZONE, say). It's always possible to treat these as if they were latin-1, but this is so unnaturally hacky that I didn't think of it. Regards, Martin From martin at v.loewis.de Tue Aug 7 05:29:33 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Aug 2007 05:29:33 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7CE27.3030103@acm.org> References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com> <46B7CE27.3030103@acm.org> Message-ID: <46B7E71D.1050705@v.loewis.de> > The most efficient representation of immutable bytes is quite different > from the most efficient representation of mutable bytes. In what way? Curious, Martin From martin at v.loewis.de Tue Aug 7 05:41:58 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Aug 2007 05:41:58 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: Message-ID: <46B7EA06.5040106@v.loewis.de> > Or perhaps a special value of the encoding argument passed to > *dbm.open() (maybe None, maybe the default, maybe "raw" or "bytes"?) > to specify that the key values are to be bytes? This is essentially the state of the bsddb module in the struni branch right now. The default is bytes keys and values; if you want string keys, you write db = bsddb.open(...) db = bsddb.StringKeys(db) which arranges for transparent UTF-8 encoding; it would be possible to extend this to db = bsddb.open(...) db = bsddb.StringKeys(db, encoding="latin-1") However, this has the view that there is a single "proper" key representation, which is bytes, and then reinterpretations. Now if you say that the dbm files are dicts conceptually, and bytes are not allowed as dict keys, then any API that allows for bytes as dbm keys (whether by default or as an option) is conceptually inconsistent - as you now do have dict-like objects which use bytes keys. This causes confusion if you pass one of them to, say, .update of a "real" dict, which then fails. IOW, I couldn't do d = {} d.update(db) if db is in the "keys are bytes" mode. Regards, Martin From martin at v.loewis.de Tue Aug 7 05:45:00 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Aug 2007 05:45:00 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: Message-ID: <46B7EABC.1060909@v.loewis.de> > For instance, it is quite common to use integers as keys. If you are > inserting keys in order, it is about a hundred times faster to encode > the ints in big-endian byte order than than little-endian: > > class MyIntDB(object): > def __setitem__(self, key, item): > self.db.put(struct.pack('>Q', key), serializer(item)) > def __getitem__(self, key): > return unserializer(self.db.get(struct.pack('>Q', key))) I guess Guido wants you to write class MyIntDB(object): def __setitem__(self, key, item): self.db.put(struct.pack('>Q', key).encode("latin-1"), serializer(item)) def __getitem__(self, key): return unserializer(self.db.get( struct.pack('>Q', key).encode("latin-1")) here. > How do you envision these types of tasks being accomplished with > unicode keys? It is conceivable that one could write a custom > unicode encoding that accomplishes this, convert the key to unicode, > and pass the custom encoding name to the constructor. See above. It's always trivial to do that with latin-1 as the encoding (I'm glad you didn't see that, either :-). Regards, Martin From python at rcn.com Tue Aug 7 05:45:09 2007 From: python at rcn.com (Raymond Hettinger) Date: Mon, 6 Aug 2007 20:45:09 -0700 Subject: [Python-3000] map() Returns Iterator References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> <87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com> Message-ID: <006201c7d8a5$5aff9050$f001a8c0@RaymondLaptop1> From: "Kurt B. Kaiser" > , IMHO eliminating the strict versions of map() and filter() in > favor of the lazy versions from itertools kicks the degree of > sophistication necessary to understand these functions up a notch (or > three). Not really. Once range() starts returning an iterator, that will be the new, basic norm. With that as a foundation, it would be suprising if map() and enumerate() and zip() did not return iterators. Learn once, use everywhere. Raymond From talin at acm.org Tue Aug 7 05:50:27 2007 From: talin at acm.org (Talin) Date: Mon, 06 Aug 2007 20:50:27 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7E71D.1050705@v.loewis.de> References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com> <46B7CE27.3030103@acm.org> <46B7E71D.1050705@v.loewis.de> Message-ID: <46B7EC03.7000605@acm.org> Martin v. L?wis wrote: >> The most efficient representation of immutable bytes is quite different >> from the most efficient representation of mutable bytes. > > In what way? Well, in some runtime environments (I'm not sure about Python), for immutables you can combine the object header and the bytes array into a single allocation. Further, the header need not contain an explicit pointer to the bytes themselves, instead the bytes are obtained by doing pointer arithmetic on the header address. For a mutable bytes object, you'll need to allocate the actual bytes separately from the header. Typically you'll also need a second 'length' field to represent the current physical capacity of the allocated memory block, in addition to the logical length of the byte array. So in other words, the in-memory layout of the two structs is different enough that attempting to combine them into a single struct is kind of awkward. > Curious, > Martin > From martin at v.loewis.de Tue Aug 7 05:51:26 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Aug 2007 05:51:26 +0200 Subject: [Python-3000] Binary compatibility In-Reply-To: References: Message-ID: <46B7EC3E.3070802@v.loewis.de> > This means that I can't distribute Python extensions as binaries. I think this conclusion is completely wrong. Why do you come to it? If you want to distribute extension modules for Ubuntu, just distribute the UCS-4 module. You need separate binary packages for different microprocessors and operating systems, anyway, as you can't use the same binary for Windows, OSX, Ubuntu, or Solaris. > Any extension built on Ubuntu may fail on some other system. Every extension built on Ubuntu *will* fail on other processors or operating systems - even if the Unicode issue was solved, it would still be a different instruction set (if you x85 vs. SPARC or Itanium, say), and even for a single microprocessor, it will fail if the OS ABI is different (different C libraries etc). Now, you seem to talk about different *Linux* systems. On Linux, use UCS-4. Regards, Martin From guido at python.org Tue Aug 7 05:56:35 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 20:56:35 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7EA06.5040106@v.loewis.de> References: <46B7EA06.5040106@v.loewis.de> Message-ID: On 8/6/07, "Martin v. L?wis" wrote: > > Or perhaps a special value of the encoding argument passed to > > *dbm.open() (maybe None, maybe the default, maybe "raw" or "bytes"?) > > to specify that the key values are to be bytes? > > This is essentially the state of the bsddb module in the struni branch > right now. The default is bytes keys and values; if you want string > keys, you write > > db = bsddb.open(...) > db = bsddb.StringKeys(db) > > which arranges for transparent UTF-8 encoding; Ah. I hadn't realized that this was the API. It sounds like as good a solution as mine. > it would be possible to extend this to > > db = bsddb.open(...) > db = bsddb.StringKeys(db, encoding="latin-1") This would be even better. > However, this has the view that there is a single "proper" key > representation, which is bytes, and then reinterpretations. > > Now if you say that the dbm files are dicts conceptually, and > bytes are not allowed as dict keys, then any API that allows > for bytes as dbm keys (whether by default or as an option) is > conceptually inconsistent - as you now do have dict-like objects > which use bytes keys. This causes confusion if you pass one of > them to, say, .update of a "real" dict, which then fails. IOW, > I couldn't do > > d = {} > d.update(db) > > if db is in the "keys are bytes" mode. I guess we have to rethink our use of these databases somewhat. I think I'm fine with the model that the basic dbm implementations map bytes to bytes, and aren't particularly compatible with dicts. (They aren't, really, anyway -- the key and value types are typically restricted, and the reference semantics are different.) But, just like for regular file we have TextIOWrapper which wraps a binary file with a layer for encoded text I/O, I think it would be very useful to have a layer around the *dbm modules for making them handle text. Perhaps the StringKeys and/or StringValues wrappers can be generalized? Or perhaps we could borrow from io.open(), and use a combination of the mode and the encoding to determine how to stack wrappers. Another approach might be to generalize shelve. It already supports pickling values. There could be a few variants for dealing with keys that are either strings or arbitrary immutables; the keys used for the underlying *dbm file would then be either an encoding (if the keys are limited to strings) or a pickle (if they aren't). (The latter would require some kind of canonical pickling version, so may not be practical; there also may not be enough of a use case to bother.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 7 06:03:45 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 21:03:45 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7EC03.7000605@acm.org> References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com> <46B7CE27.3030103@acm.org> <46B7E71D.1050705@v.loewis.de> <46B7EC03.7000605@acm.org> Message-ID: On 8/6/07, Talin wrote: > Martin v. L?wis wrote: > >> The most efficient representation of immutable bytes is quite different > >> from the most efficient representation of mutable bytes. > > > > In what way? > > Well, in some runtime environments (I'm not sure about Python), for > immutables you can combine the object header and the bytes array into a > single allocation. Further, the header need not contain an explicit > pointer to the bytes themselves, instead the bytes are obtained by doing > pointer arithmetic on the header address. > > For a mutable bytes object, you'll need to allocate the actual bytes > separately from the header. Typically you'll also need a second 'length' > field to represent the current physical capacity of the allocated memory > block, in addition to the logical length of the byte array. > > So in other words, the in-memory layout of the two structs is different > enough that attempting to combine them into a single struct is kind of > awkward. Right. You've described exactly the difference between str8 and bytes (PyString and PyBytes) in the struni branch (or in the future in Python 2.6 for that matter). There are two savings here: (1) the string object uses less memory (only a single instance of the malloc header and round-off waste); (2) the string object uses less time to allocate and free (only a single call to malloc() or free()). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Tue Aug 7 06:22:24 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Aug 2007 06:22:24 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7EC03.7000605@acm.org> References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com> <46B7CE27.3030103@acm.org> <46B7E71D.1050705@v.loewis.de> <46B7EC03.7000605@acm.org> Message-ID: <46B7F380.1050805@v.loewis.de> >>> The most efficient representation of immutable bytes is quite different >>> from the most efficient representation of mutable bytes. >> >> In what way? > > Well, in some runtime environments (I'm not sure about Python), for > immutables you can combine the object header and the bytes array into a > single allocation. Further, the header need not contain an explicit > pointer to the bytes themselves, instead the bytes are obtained by doing > pointer arithmetic on the header address. Hmm. That assumes that the mutable bytes type also supports changes to its length. I see that the Python bytes type does that, but I don't think it's really necessary - I'm not even sure it's useful. For a bytes array, you don't need a separate allocation, and it still can be mutable. > So in other words, the in-memory layout of the two structs is different > enough that attempting to combine them into a single struct is kind of > awkward. ... assuming the mutable bytes type behaves like a Python list, that is. If it behaved like a Java/C byte[], this issue would not exist. Regards, Martin From jyasskin at gmail.com Tue Aug 7 06:26:32 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Mon, 6 Aug 2007 21:26:32 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> Message-ID: <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> On 8/6/07, Guido van Rossum wrote: > On 8/6/07, "Martin v. L?wis" wrote: > > For how long? Do you expect to receive further information that will > > make a decision simpler? > > I'm waiting for a show-stopper issue that can't be solved without > having an immutable bytes type. Apologies if this has been answered before, but why are you waiting for a show-stopper that requires an immutable bytes type rather than one that requires a mutable one? This being software, there isn't likely to be a real show-stopper (especially if you're willing to copy the whole object), just things that are unnecessarily annoying or confusing. Hashing seems to be one of those. Taking TOOWTDI as a guideline: If you have immutable bytes and need a mutable object, just use list(). If you have mutable bytes and need an immutable object, you could 1) convert it to an int (probably big-endian), 2) convert it to a latin-1 unicode object (containing garbage, of course), 3) figure out an encoding in which to assume the bytes represent text and create a unicode string from that, or 4) use the deprecated str8 type. Why isn't this a clear win for immutable bytes? Jeffrey From stephen at xemacs.org Tue Aug 7 06:53:06 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 07 Aug 2007 13:53:06 +0900 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7E6AC.3020102@v.loewis.de> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <46B7E6AC.3020102@v.loewis.de> Message-ID: <87ejig7xot.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > It's always possible to treat these as if they were latin-1, but this > is so unnaturally hacky that I didn't think of it. Emacs and XEmacs have both suffered (in different ways) from treating raw bytes as ISO 8859-1. Python is very different (among other things, the Unicode type is already well-developed and the preferred representation for text), but I think it's just as well that you avoid this. Even if it costs a little extra work. From martin at v.loewis.de Tue Aug 7 06:43:21 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 07 Aug 2007 06:43:21 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> Message-ID: <46B7F869.6080007@v.loewis.de> > Apologies if this has been answered before, but why are you waiting > for a show-stopper that requires an immutable bytes type rather than > one that requires a mutable one? You mean, the need for a mutable bytes type might not be clear yet? Code that has been ported to the bytes type probably doesn't use it correctly yet, but to me, the need for a buffery thing where you can allocate some buffer, and then fill it byte-for-byte is quite obvious. It's a standard thing in all kinds of communication protocols: in sending, you allocate plenty of memory, fill it, and then send the fraction you actually consumed. In receiving, you allocate plenty of memory (not knowing yet how much you will receive), then only process as much as you needed. You do all that without creating new buffers all the time - you use a single one over and over again. Code that has been ported to bytes from str8 often tends to still follow the immutable pattern, creating a list of bytes objects to be joined later - this can be improved in code reviews. > Taking TOOWTDI as a guideline: If you have immutable bytes and need a > mutable object, just use list(). I don't think this is adequate. Too much lower-level API relies on having memory blocks, and that couldn't be implemented efficiently with a list. Regards, Martin From martin at v.loewis.de Tue Aug 7 06:53:32 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Aug 2007 06:53:32 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B7EA06.5040106@v.loewis.de> Message-ID: <46B7FACC.8030503@v.loewis.de> > I guess we have to rethink our use of these databases somewhat. Ok. In the interest of progress, I'll be looking at coming up with some fixes for the code base right now; as we agree that the underlying semantics is bytes:bytes, any encoding wrappers on top of it can be added later. > Perhaps the StringKeys and/or StringValues wrappers can be > generalized? Or perhaps we could borrow from io.open(), and use a > combination of the mode and the encoding to determine how to stack > wrappers. I thought about this, and couldn't think of a place where to put them. Also, the bsddb versions provide additional functions (such as .first() and .last()) which don't belong to the dict API. Furthermore, for dumbdbm, it would indeed be better if the dumbdbm object knew that keys are meant to be strings. It could support that natively - although not in a binary-backwards compatible manner with 2.x. Doing so would be more efficient in the implementation, as you'd avoid recoding. > Another approach might be to generalize shelve. It already supports > pickling values. There could be a few variants for dealing with keys > that are either strings or arbitrary immutables; the keys used for the > underlying *dbm file would then be either an encoding (if the keys are > limited to strings) or a pickle (if they aren't). (The latter would > require some kind of canonical pickling version, so may not be > practical; there also may not be enough of a use case to bother.) My concern is that people need to access existing databases. It's all fine that the code accessing them breaks, and that they have to actively port to Py3k. However, telling them that they have to represent the keys in their dbm disk files in a different manner might cause a revolt... Regards, Martin From kbk at shore.net Tue Aug 7 07:02:12 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Tue, 07 Aug 2007 01:02:12 -0400 Subject: [Python-3000] map() Returns Iterator In-Reply-To: (Guido van Rossum's message of "Mon, 6 Aug 2007 19:22:40 -0700") References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> <87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com> Message-ID: <87fy2whr8r.fsf@hydra.hampton.thirdcreek.com> "Guido van Rossum" writes: > [...pushback...] > >> However, IMHO eliminating the strict versions of map() and filter() in >> favor of the lazy versions from itertools kicks the degree of >> sophistication necessary to understand these functions up a notch (or >> three). > > I wonder how bad this is given that range() and dict.keys() and > friends will also stop returning lists? Don't know. It's straightforward for us, but we use it every day. I'm with you on the dict methods; I just view map() and filter() differently. I'll get used to it. Let's see what we hear from the high schools in a few years. > I don't think you ever saw any of my Py3k presentations (the slides of > the latest one at here: > http://conferences.oreillynet.com/presentations/os2007/os_vanrossum.ppt). Yes, I had dug them out. This link is the best so far, thanks! > I've always made a point of suggesting that we're switching to > returning iterators instead of lists from as many APIs as makes sense > (I stop at str.split() though, as I can't think of a use case where > the list would be so big as to be bothersome). It's not your father's snake :-) [...] > I think you're overreacting due to your experience with conversion of > existing code. I expect that new use cases where a list is needed will > generally be written using list comprehensions in Py3k-specific > tutorials, and generator expressions for situations where a list isn't > needed (as a slightly more advanced feature). Then map() and filter() > can be shown as more advanced optimizations of certain end cases. I think you are correct. [...] > (Aside: Please skip the p3yk branch and use the py3k-struni branch -- > it's the way of the future.) I was working on IDLE in p3yk because I expect a whole new set of failures when I jump it to py3k-struni. Maybe I'm wrong about that. It's mostly working now; I've been editing/testing Python 3000 with it for several weeks. > I tend to do manual conversion of the stdlib because it's on the > bleeding edge. At times I've regretted this, and gone back and run a > particular transform over some of the code. I rarely use the full set > of transforms on a whole subtree, although others sometimes do that. > Do note the options that help convert doctests and deal with print() > already being a function. I'll give it a shot. It probably would have helped me get IDLE going sooner; I had to trace the interpreter failure through IDLE into code.py. The biggest problem was those four map(print...) statements which I'll wager you wrote back in your salad days :-) I have my answer, thanks! See you in py3k-struni! > [zip()] >> It's used only fifteen times in 2.6 Lib/ and four of those are >> izip(). Eight are assignments, mostly to build dicts. > > I don't understand. What's an "assignment" to build a dict? Do you > mean something like > > dict(zip(keys, values)) > > ? That's an ideal use case for an iterator. Yup, typical lines are Lib/filecmp.py: a = dict(izip(imap(os.path.normcase, self.left_list), self.left_list)) Lib/mailbox.py: self._toc = dict(enumerate(zip(starts, stops))) -- KBK From kbk at shore.net Tue Aug 7 07:09:15 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Tue, 07 Aug 2007 01:09:15 -0400 Subject: [Python-3000] map() Returns Iterator In-Reply-To: <006201c7d8a5$5aff9050$f001a8c0@RaymondLaptop1> (Raymond Hettinger's message of "Mon, 6 Aug 2007 20:45:09 -0700") References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> <87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com> <006201c7d8a5$5aff9050$f001a8c0@RaymondLaptop1> Message-ID: <87bqdkhqx0.fsf@hydra.hampton.thirdcreek.com> "Raymond Hettinger" writes: > Not really. Once range() starts returning an iterator, > that will be the new, basic norm. With that as a foundation, > it would be suprising if map() and enumerate() and zip() > did not return iterators. Learn once, use everywhere. Except that range() is usually used in a loop, while map() and filter() are not. It seems to me that these two functions are going to expose naked iterators to beginners (well, intermediates) more than the other changes will. -- KBK From jyasskin at gmail.com Tue Aug 7 07:45:34 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Mon, 6 Aug 2007 22:45:34 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7F869.6080007@v.loewis.de> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> <46B7F869.6080007@v.loewis.de> Message-ID: <5d44f72f0708062245k32e79de4s4b545c59a974612a@mail.gmail.com> On 8/6/07, "Martin v. L?wis" wrote: > > Apologies if this has been answered before, but why are you waiting > > for a show-stopper that requires an immutable bytes type rather than > > one that requires a mutable one? > > You mean, the need for a mutable bytes type might not be clear yet? > > Code that has been ported to the bytes type probably doesn't use it > correctly yet, but to me, the need for a buffery thing where you > can allocate some buffer, and then fill it byte-for-byte is quite > obvious. It's a standard thing in all kinds of communication > protocols: in sending, you allocate plenty of memory, fill it, and > then send the fraction you actually consumed. In receiving, you > allocate plenty of memory (not knowing yet how much you will receive), > then only process as much as you needed. You do all that without > creating new buffers all the time - you use a single one over and > over again. For low-level I/O code, I totally agree that a mutable buffery object is needed. What I'm wondering about is why that object needs to bleed up into the code the struni branch is fixing. The bytes type isn't even going to serve that function without some significant interface changes. For example, to support re-using bytes buffers, socket.send() would need to take start and end offsets into its bytes argument. Otherwise, you have to slice the object to select the right data, which *because bytes are mutable* requires a copy. PEP 3116's .write() method has the same problem. Making those changes is, of course, doable, but it seems like something that should be consciously committed to. Python 2 seems to have gotten away with doing all the buffery stuff in C. Is there a reason Python 3 shouldn't do the same? I was about to wonder if the performance was even worth the nuisance, but then I realized that I could run my own (na?ve) benchmark. Running revision 56747 of the p3yk branch, I get: $ ./python.exe -m timeit 'b = bytes(v % 256 for v in range(1000))' 1000 loops, best of 3: 272 usec per loop $ ./python.exe -m timeit -s 'b=bytes(v%256 for v in range(2000))' 'for v in range(1000): b[v] = v % 256' 1000 loops, best of 3: 298 usec per loop which seems to demonstrate that pre-allocating the bytes object is slightly _more_ expensive than re-allocating it each time. In any case, if people want to use bytes as both the low-level buffery I/O thing and the high-level byte string, I think PEP 358 should document it, since right now it just asserts that bytes are mutable without any reason why. > Code that has been ported to bytes from str8 often tends to still > follow the immutable pattern, creating a list of bytes objects to > be joined later - this can be improved in code reviews. > > > Taking TOOWTDI as a guideline: If you have immutable bytes and need a > > mutable object, just use list(). > > I don't think this is adequate. Too much lower-level API relies on > having memory blocks, and that couldn't be implemented efficiently > with a list. > > Regards, > Martin > -- Namast?, Jeffrey Yasskin From lists at cheimes.de Tue Aug 7 08:13:07 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 07 Aug 2007 08:13:07 +0200 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: Message-ID: <46B80D73.5050009@cheimes.de> Guido van Rossum wrote: > test_minidom > Recently started failing again; probably shallow. test_minidom is passing for me (Ubuntu 7.04, r56793, UCS2 build). > test_tarfile > Virgin territory again (but different owner :-). The tarfile should be addressed by either its original author or somebody with lots of spare time. As stated earlier it's a beast. I tried to fix it several weeks ago because I thought it is a low hanging fruit. I was totally wrong. :/ Christian From martin at v.loewis.de Tue Aug 7 08:22:02 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 07 Aug 2007 08:22:02 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <5d44f72f0708062245k32e79de4s4b545c59a974612a@mail.gmail.com> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> <46B7F869.6080007@v.loewis.de> <5d44f72f0708062245k32e79de4s4b545c59a974612a@mail.gmail.com> Message-ID: <46B80F8A.7060906@v.loewis.de> > For low-level I/O code, I totally agree that a mutable buffery object > is needed. The code we are looking at right now (dbm interfaces) *is* low-level I/O code. > For example, to support re-using bytes buffers, socket.send() > would need to take start and end offsets into its bytes argument. > Otherwise, you have to slice the object to select the right data, > which *because bytes are mutable* requires a copy. PEP 3116's .write() > method has the same problem. Making those changes is, of course, > doable, but it seems like something that should be consciously > committed to. Sure. There are several ways to do that, including producing view objects - which would be possible even though the underlying buffer is mutable; the view would then be just as mutable. > Python 2 seems to have gotten away with doing all the buffery stuff in > C. Is there a reason Python 3 shouldn't do the same? I think Python 2 has demonstrated that this doesn't really work. People repeatedly did += on strings (leading to quadratic performance), invented the buffer interface (which is semantically flawed), added direct support for mmap, and so on. > $ ./python.exe -m timeit 'b = bytes(v % 256 for v in range(1000))' > 1000 loops, best of 3: 272 usec per loop > $ ./python.exe -m timeit -s 'b=bytes(v%256 for v in range(2000))' 'for > v in range(1000): b[v] = v % 256' > 1000 loops, best of 3: 298 usec per loop > > which seems to demonstrate that pre-allocating the bytes object is > slightly _more_ expensive than re-allocating it each time. There must be more conditions to it; I get martin at mira:~/work/3k$ ./python -m timeit 'b = bytes(v % 256 for v in range(1000))' 1000 loops, best of 3: 434 usec per loop martin at mira:~/work/3k$ ./python -m timeit -s 'b=bytes(v%256 for v in range(2000))' 'for v in range(1000): b[v] = v % 256' 1000 loops, best of 3: 394 usec per loop which is the reverse result. > In any case, if people want to use bytes as both the low-level buffery > I/O thing and the high-level byte string, I think PEP 358 should > document it, since right now it just asserts that bytes are mutable > without any reason why. That point is mute now; the PEP has been accepted. Documenting things is always good, but the time for objections to the PEP is over now - that's what the PEP process is for. Regards, Martin From talex5 at gmail.com Tue Aug 7 09:15:06 2007 From: talex5 at gmail.com (Thomas Leonard) Date: Tue, 7 Aug 2007 08:15:06 +0100 Subject: [Python-3000] Binary compatibility In-Reply-To: <46B7EC3E.3070802@v.loewis.de> References: <46B7EC3E.3070802@v.loewis.de> Message-ID: On 8/7/07, "Martin v. L?wis" wrote: > > This means that I can't distribute Python extensions as binaries. > > I think this conclusion is completely wrong. Why do you come to it? > > If you want to distribute extension modules for Ubuntu, just distribute > the UCS-4 module. You need separate binary packages for different > microprocessors and operating systems, anyway, as you can't use the > same binary for Windows, OSX, Ubuntu, or Solaris. You're right that we already have to provide several binaries (although OSX and Windows users usually aren't all that interested in running Unix desktop environments like ROX ;-), but each new combination is more work for us. Linux/x86 covers pretty much all our non-technical users, I think. Autopackage double-compiles C++ programs (C++ being the other piece of Linux infrastructure with an unstable ABI), for example, but if they want to provide binaries for a C++ program using Python, that's 4 binaries per architecture! (You also have to special-case the selection logic. Every installation system understands about different versions and different processors, but they need custom code to figure out which of two flavours of Python is installed). > > Any extension built on Ubuntu may fail on some other system. > > Every extension built on Ubuntu *will* fail on other processors > or operating systems - even if the Unicode issue was solved, it > would still be a different instruction set (if you x85 vs. SPARC > or Itanium, say), > and even for a single microprocessor, it will > fail if the OS ABI is different (different C libraries etc). Generally it doesn't. Our ROX-Filer/x86 binary using GTK+ runs on all Linux/x86 systems (as far as I know). Linux binary compatibility is currently very good, provided you avoid C++ and Python extensions. > Now, you seem to talk about different *Linux* systems. On Linux, > use UCS-4. Yes, that's what we want. But Python 2.5 defaults to UCS-2 (at least last time I tried), while many distros have used UCS-4. If Linux always used UCS-4, that would be fine, but currently there's no guarantee of that. -- Dr Thomas Leonard http://rox.sourceforge.net GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1 From pj at place.org Tue Aug 7 06:37:49 2007 From: pj at place.org (Paul Jimenez) Date: Mon, 06 Aug 2007 23:37:49 -0500 Subject: [Python-3000] Plea for help: python/branches/py3k-struni/Lib/tarfile.py Message-ID: <20070807043749.E11A8179C7E@place.org> This evening I had a couple hours to spar and happend to read Guido's plea for help near the beginning of it. I picked up a failing testcase that no one had claimed and did what I could: it's not finished, but it fixes approximately 75% of the errors in test_tarfile. I concentrated on fixing problems that the testcase turned up; a pure inspection of the source would turn up lots of things I missed, I'm sure. I hope it's useful; it probably need minor attention from me on what the Right Thing to do is in the case of encoding and decoding: ascii? I had to do a .decode('latin-1') to pass the umlaut-in-a-filename test, but I'm not at all sure that that's the true Right Thing. Anyway, here's a start; I'm explicitly *not* claiming that I'll ever touch this source code again; I don't want to block anyone else from working on it. Enjoy. --pj Index: tarfile.py =================================================================== --- tarfile.py (revision 56785) +++ tarfile.py (working copy) @@ -72,33 +72,33 @@ #--------------------------------------------------------- # tar constants #--------------------------------------------------------- -NUL = "\0" # the null character +NUL = b"\0" # the null character BLOCKSIZE = 512 # length of processing blocks RECORDSIZE = BLOCKSIZE * 20 # length of records -GNU_MAGIC = "ustar \0" # magic gnu tar string -POSIX_MAGIC = "ustar\x0000" # magic posix tar string +GNU_MAGIC = b"ustar \0" # magic gnu tar string +POSIX_MAGIC = b"ustar\x0000" # magic posix tar string LENGTH_NAME = 100 # maximum length of a filename LENGTH_LINK = 100 # maximum length of a linkname LENGTH_PREFIX = 155 # maximum length of the prefix field -REGTYPE = "0" # regular file -AREGTYPE = "\0" # regular file -LNKTYPE = "1" # link (inside tarfile) -SYMTYPE = "2" # symbolic link -CHRTYPE = "3" # character special device -BLKTYPE = "4" # block special device -DIRTYPE = "5" # directory -FIFOTYPE = "6" # fifo special device -CONTTYPE = "7" # contiguous file +REGTYPE = b"0" # regular file +AREGTYPE = b"\0" # regular file +LNKTYPE = b"1" # link (inside tarfile) +SYMTYPE = b"2" # symbolic link +CHRTYPE = b"3" # character special device +BLKTYPE = b"4" # block special device +DIRTYPE = b"5" # directory +FIFOTYPE = b"6" # fifo special device +CONTTYPE = b"7" # contiguous file -GNUTYPE_LONGNAME = "L" # GNU tar longname -GNUTYPE_LONGLINK = "K" # GNU tar longlink -GNUTYPE_SPARSE = "S" # GNU tar sparse file +GNUTYPE_LONGNAME = b"L" # GNU tar longname +GNUTYPE_LONGLINK = b"K" # GNU tar longlink +GNUTYPE_SPARSE = b"S" # GNU tar sparse file -XHDTYPE = "x" # POSIX.1-2001 extended header -XGLTYPE = "g" # POSIX.1-2001 global header -SOLARIS_XHDTYPE = "X" # Solaris extended header +XHDTYPE = b"x" # POSIX.1-2001 extended header +XGLTYPE = b"g" # POSIX.1-2001 global header +SOLARIS_XHDTYPE = b"X" # Solaris extended header USTAR_FORMAT = 0 # POSIX.1-1988 (ustar) format GNU_FORMAT = 1 # GNU tar format @@ -176,6 +176,9 @@ def stn(s, length): """Convert a python string to a null-terminated string buffer. """ + #return s[:length].encode('ascii') + (length - len(s)) * NUL + if type(s) != type(b''): + s = s.encode('ascii') return s[:length] + (length - len(s)) * NUL def nts(s): @@ -184,8 +187,8 @@ # Use the string up to the first null char. p = s.find("\0") if p == -1: - return s - return s[:p] + return s.decode('latin-1') + return s[:p].decode('latin-1') def nti(s): """Convert a number field to a python number. @@ -214,7 +217,7 @@ # encoding, the following digits-1 bytes are a big-endian # representation. This allows values up to (256**(digits-1))-1. if 0 <= n < 8 ** (digits - 1): - s = "%0*o" % (digits - 1, n) + NUL + s = ("%0*o" % (digits - 1, n)).encode('ascii') + NUL else: if format != GNU_FORMAT or n >= 256 ** (digits - 1): raise ValueError("overflow in number field") @@ -412,7 +415,7 @@ self.comptype = comptype self.fileobj = fileobj self.bufsize = bufsize - self.buf = "" + self.buf = b"" self.pos = 0 self.closed = False @@ -434,7 +437,7 @@ except ImportError: raise CompressionError("bz2 module is not available") if mode == "r": - self.dbuf = "" + self.dbuf = b"" self.cmp = bz2.BZ2Decompressor() else: self.cmp = bz2.BZ2Compressor() @@ -451,10 +454,10 @@ self.zlib.DEF_MEM_LEVEL, 0) timestamp = struct.pack(" LENGTH_LINK: buf += self._create_gnu_long_header(info["linkname"], GNUTYPE_LONGLINK) @@ -1071,7 +1074,7 @@ if pax_headers: buf = self._create_pax_generic_header(pax_headers) else: - buf = "" + buf = b"" return buf + self._create_header(info, USTAR_FORMAT) @@ -1108,7 +1111,7 @@ itn(info.get("gid", 0), 8, format), itn(info.get("size", 0), 12, format), itn(info.get("mtime", 0), 12, format), - " ", # checksum field + b" ", # checksum field info.get("type", REGTYPE), stn(info.get("linkname", ""), 100), stn(info.get("magic", POSIX_MAGIC), 8), @@ -1119,9 +1122,9 @@ stn(info.get("prefix", ""), 155) ] - buf = struct.pack("%ds" % BLOCKSIZE, "".join(parts)) + buf = struct.pack("%ds" % BLOCKSIZE, b"".join(parts)) chksum = calc_chksums(buf[-BLOCKSIZE:])[0] - buf = buf[:-364] + "%06o\0" % chksum + buf[-357:] + buf = buf[:-364] + ("%06o\0" % chksum).encode('ascii') + buf[-357:] return buf @staticmethod @@ -1139,10 +1142,10 @@ """Return a GNUTYPE_LONGNAME or GNUTYPE_LONGLINK sequence for name. """ - name += NUL + name = name.encode('ascii') + NUL info = {} - info["name"] = "././@LongLink" + info["name"] = b"././@LongLink" info["type"] = type info["size"] = len(name) info["magic"] = GNU_MAGIC @@ -1324,7 +1327,7 @@ lastpos = offset + numbytes pos += 24 - isextended = ord(buf[482]) + isextended = buf[482] origsize = nti(buf[483:495]) # If the isextended flag is given, @@ -1344,7 +1347,7 @@ realpos += numbytes lastpos = offset + numbytes pos += 24 - isextended = ord(buf[504]) + isextended = buf[504] if lastpos < origsize: sp.append(_hole(lastpos, origsize - lastpos)) Index: test/test_tarfile.py =================================================================== --- test/test_tarfile.py (revision 56784) +++ test/test_tarfile.py (working copy) @@ -115,7 +115,7 @@ fobj.seek(0, 2) self.assertEqual(tarinfo.size, fobj.tell(), "seek() to file's end failed") - self.assert_(fobj.read() == "", + self.assert_(fobj.read() == b"", "read() at file's end did not return empty string") fobj.seek(-tarinfo.size, 2) self.assertEqual(0, fobj.tell(), From greg.ewing at canterbury.ac.nz Tue Aug 7 10:20:23 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 07 Aug 2007 20:20:23 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> Message-ID: <46B82B47.9090108@canterbury.ac.nz> Guido van Rossum wrote: > At the same time we still have enough uses of str9 ^^^^ For holding data from 9-track tapes? :-) -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 7 10:21:28 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 07 Aug 2007 20:21:28 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> Message-ID: <46B82B88.5000804@canterbury.ac.nz> Guido van Rossum wrote: > Personally, I still think that converting to the latin-1 encoding is > probably just as good for this particular use case. Although that's a conceptually screwy thing to do when your data has nothing to do with characters. -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 7 10:47:20 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 07 Aug 2007 20:47:20 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7EA06.5040106@v.loewis.de> References: <46B7EA06.5040106@v.loewis.de> Message-ID: <46B83198.5090502@canterbury.ac.nz> Martin v. L?wis wrote: > Now if you say that the dbm files are dicts conceptually, I wouldn't say they're dicts, rather they're mappings. Restriction of keys to immutable values is a peculiarity of dicts, not a required feature of mappings in general. -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 7 10:51:50 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 07 Aug 2007 20:51:50 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7F380.1050805@v.loewis.de> References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com> <46B7CE27.3030103@acm.org> <46B7E71D.1050705@v.loewis.de> <46B7EC03.7000605@acm.org> <46B7F380.1050805@v.loewis.de> Message-ID: <46B832A6.5000104@canterbury.ac.nz> Martin v. L?wis wrote: > That assumes that the mutable bytes type also supports changes to > its length. It would be surprising if it didn't, because that would make it different from all the other builtin mutable sequences. -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 7 10:54:49 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 07 Aug 2007 20:54:49 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> Message-ID: <46B83359.8020204@canterbury.ac.nz> Jeffrey Yasskin wrote: > If you have mutable bytes and need an > immutable object, you could 1) convert it to an int (probably > big-endian), That's not a reversible transformation, because you lose information about leading zero bits. > 4) use the deprecated str8 type Which won't exist in Py3k, so it'll be a bit hard to use... -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 7 11:01:42 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 07 Aug 2007 21:01:42 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7F869.6080007@v.loewis.de> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> <46B7F869.6080007@v.loewis.de> Message-ID: <46B834F6.7050307@canterbury.ac.nz> Martin v. L?wis wrote: > Code that has been ported to the bytes type probably doesn't use it > correctly yet, but to me, the need for a buffery thing where you > can allocate some buffer, and then fill it byte-for-byte is quite > obvious. We actually already *have* something like that, i.e. array.array('B'). So I don't think it's a priori a silly idea to consider making the bytes type immutable only, and using the array type for when you want a mutable buffer. -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 7 11:33:51 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 07 Aug 2007 21:33:51 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B7DEA6.5050609@ronadam.com> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B66E7E.4060209@canterbury.ac.nz> <46B6A7E8.7040001@ronadam.com> <46B6C335.4080504@canterbury.ac.nz> <46B6DE80.2050000@ronadam.com> <46B7C369.3040509@canterbury.ac.nz> <46B7DEA6.5050609@ronadam.com> Message-ID: <46B83C7F.603@canterbury.ac.nz> Ron Adam wrote: > What about mismatched specifiers? It's not clear exactly what you mean by a "mismatched" specifier. Some types may recognise when they're being passed a format spec that belongs to another type, and try to convert themselves to that type (e.g. applying 'f' to an int or 'd' to a float). If the type doesn't recognise the format at all, and doesn't have a fallback type to delegate to (as will probably be the case with str) then you will get an exception. > I think the opinion so far is to let the objects __format__ method > determine this, but we need to figure this out what the built in types > will do. My suggestions would be: int - understands all the 'integer' formats (d, x, o, etc.) - recognises the 'float' formats ('f', 'e', etc.) and delegates to float - delegates anything it doesn't recognise to str float - understands all the 'float' formats - recognises the 'integer' formats and delegates to int - delegates anything it doesn't recognise to str str - recognises the 'string' formats (only one?) - raises an exception for anything it doesn't understand I've forgotten where 'r' was supposed to fit into this scheme. Can anyone remind me? -- Greg From theller at ctypes.org Tue Aug 7 14:06:27 2007 From: theller at ctypes.org (Thomas Heller) Date: Tue, 07 Aug 2007 14:06:27 +0200 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: Message-ID: Guido van Rossum schrieb: > We're down to 11 failing test in the struni branch. I'd like to get > this down to zero ASAP so that we can retire the old p3yk (yes, with > typo!) branch and rename py3k-struni to py3k. > > Please help! Here's the list of failing tests: > > test_ctypes > Recently one test started failing again, after Martin changed > PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1. I wanted to look into this and noticed that 'import time' on Windows doesn't work anymore on my machine. The reason is that PyUnicode_FromStringAndSize() is called for the string 'Westeurop?ische Normalzeit', and then fails with UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11: invalid data Thomas From theller at ctypes.org Tue Aug 7 14:12:26 2007 From: theller at ctypes.org (Thomas Heller) Date: Tue, 07 Aug 2007 14:12:26 +0200 Subject: [Python-3000] C API cleanup str In-Reply-To: <46B78D96.4090901@v.loewis.de> References: <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> <46B633D0.7050902@v.loewis.de> <46B6D2EC.601@livinglogic.de> <46B6D6B8.7000207@v.loewis.de> <46B6E66D.80301@livinglogic.de> <46B78D96.4090901@v.loewis.de> Message-ID: Martin v. L?wis schrieb: >> One issue with just putting this in the C API docs is that I believe >> (tell me if I'm wrong) that these haven't been kept up to date in the >> struni branch so we'll need to make a lot more changes than just this >> one... > > That's certainly the case. However, if we end up deleting the str8 > type entirely, I'd be in favor of recycling the PyString_* names > for Unicode, in which case everything needs to be edited, anyway. PyText_*, maybe? Thomas From ncoghlan at gmail.com Tue Aug 7 14:33:36 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 07 Aug 2007 22:33:36 +1000 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B82B88.5000804@canterbury.ac.nz> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> Message-ID: <46B866A0.2040800@gmail.com> Greg Ewing wrote: > Guido van Rossum wrote: >> Personally, I still think that converting to the latin-1 encoding is >> probably just as good for this particular use case. > > Although that's a conceptually screwy thing to do > when your data has nothing to do with characters. Yeah, this approach seems to run counter to the whole point of getting rid of the current str type: "for binary data use bytes, for text use Unicode, unless you need your binary data to be hashable, and then you decode it to gibberish Unicode via the latin-1 codec" This would mean that the Unicode type would acquire all of the ambiquity currently associated with the 8-bit str type: does it contain actual text, or does it contain arbitrary latin-1 decoded binary data? A separate frozenbytes type (with the bytes API instead of the string API) would solve the problem far more cleanly. Easy-for-me-to-say-when-I'm-not-providing-the-code-'ly yours, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Tue Aug 7 16:36:20 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 07:36:20 -0700 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: Message-ID: On 8/7/07, Thomas Heller wrote: > Guido van Rossum schrieb: > > test_ctypes > > Recently one test started failing again, after Martin changed > > PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1. > > I wanted to look into this and noticed that 'import time' on Windows > doesn't work anymore on my machine. The reason is that PyUnicode_FromStringAndSize() > is called for the string 'Westeurop?ische Normalzeit', and then fails with > > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11: invalid data I'm assuming that's a literal somewhere? In what encoding is it? That function was recently changed to require the input to be UTF-8. If the input isn't UTF-8, you'll have to use another API with an explicit encoding, PyUnicode_Decode(). I'm pretty sure this change is also responsible for the one failure (as it started around the time that change was made) but I don't understand the failure well enough to track it down. (It looked like uninitialized memory was being accessed though.) In case you wonder why it was changed, it's for symmetry with _PyUnicode_AsDefaultEncodedString(), which is the most common way to turn Unicode back into a char* without specifying an encoding. (And yes, that name needs to be changed.) See recent posts here. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Tue Aug 7 16:36:01 2007 From: skip at pobox.com (skip at pobox.com) Date: Tue, 7 Aug 2007 09:36:01 -0500 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: Message-ID: <18104.33617.706079.853923@montanaro.dyndns.org> test_csv got removed from the failing list after Guido applied Adam Hupp's patch. (I checked in a small update for one thing Adam missed.) I'm still getting test failures though: ====================================================================== FAIL: test_reader_attrs (__main__.Test_Csv) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_csv.py", line 63, in test_reader_attrs self._test_default_attrs(csv.reader, []) File "Lib/test/test_csv.py", line 47, in _test_default_attrs self.assertEqual(obj.dialect.delimiter, ',') AssertionError: s'\x00' != ',' This same exception crops up six times. Maybe this isn't str->unicode-related, but it sure seems like it to me. I spent some time over the past few days trying to figure it out, but I struck out. Skip From guido at python.org Tue Aug 7 16:42:42 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 07:42:42 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7F380.1050805@v.loewis.de> References: <46B637DD.7070905@v.loewis.de> <46B701E5.3030206@gmail.com> <46B7CE27.3030103@acm.org> <46B7E71D.1050705@v.loewis.de> <46B7EC03.7000605@acm.org> <46B7F380.1050805@v.loewis.de> Message-ID: On 8/6/07, "Martin v. L?wis" wrote: > >>> The most efficient representation of immutable bytes is quite different > >>> from the most efficient representation of mutable bytes. > >> > >> In what way? > > > > Well, in some runtime environments (I'm not sure about Python), for > > immutables you can combine the object header and the bytes array into a > > single allocation. Further, the header need not contain an explicit > > pointer to the bytes themselves, instead the bytes are obtained by doing > > pointer arithmetic on the header address. > > Hmm. That assumes that the mutable bytes type also supports changes to > its length. I see that the Python bytes type does that, but I don't > think it's really necessary - I'm not even sure it's useful. It is. The I/O library uses it extensively for buffers: instead of allocating a new object each time some data is added to a buffer, the buffer is simply extended. This saves the malloc/free calls for the object header, and in some cases realloc is also free (there is some overallocation in the bytes type and sometimes realloc can extend an object without moving it, if the space after it happens to be free). > For a bytes array, you don't need a separate allocation, and it still > can be mutable. > > > So in other words, the in-memory layout of the two structs is different > > enough that attempting to combine them into a single struct is kind of > > awkward. > > ... assuming the mutable bytes type behaves like a Python list, that > is. If it behaved like a Java/C byte[], this issue would not exist. There is no requirement to copy bad features from Java. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 7 16:48:46 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 07:48:46 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> Message-ID: On 8/6/07, Jeffrey Yasskin wrote: > On 8/6/07, Guido van Rossum wrote: > > I'm waiting for a show-stopper issue that can't be solved without > > having an immutable bytes type. > > Apologies if this has been answered before, but why are you waiting > for a show-stopper that requires an immutable bytes type rather than > one that requires a mutable one? This being software, there isn't > likely to be a real show-stopper (especially if you're willing to copy > the whole object), just things that are unnecessarily annoying or > confusing. Hashing seems to be one of those. Well one reason of course is that we currently have a mutable bytes object and that it works well in most situations. > Taking TOOWTDI as a guideline: If you have immutable bytes and need a > mutable object, just use list(). That would not work with low-level I/O (sometimes readinto() is useful), and in general list(b) (where b is a bytes object) takes up an order of magnitude more memory than b. > If you have mutable bytes and need an > immutable object, you could 1) convert it to an int (probably > big-endian), 2) convert it to a latin-1 unicode object (containing > garbage, of course), 3) figure out an encoding in which to assume the > bytes represent text and create a unicode string from that, or 4) use > the deprecated str8 type. Why isn't this a clear win for immutable > bytes? IMO there are some use cases where mutable bytes are the only realistic solution. These mostly have to do with doing large amounts of I/O reusing a buffer. Currently the array module can be used for this but I would like to get rid of it in favor of bytes and Travis Oliphant's new buffer API (which serves a similar purpose as the array module but has a much more powerful mini-language to describe the internal structure of the elements, similar to the struct module.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at pythoncraft.com Tue Aug 7 16:56:02 2007 From: aahz at pythoncraft.com (Aahz) Date: Tue, 7 Aug 2007 07:56:02 -0700 Subject: [Python-3000] map() Returns Iterator In-Reply-To: References: <87r6mjj34f.fsf@hydra.hampton.thirdcreek.com> <87k5s8i2yn.fsf@hydra.hampton.thirdcreek.com> Message-ID: <20070807145602.GA20333@panix.com> On Mon, Aug 06, 2007, Guido van Rossum wrote: > > I've always made a point of suggesting that we're switching to > returning iterators instead of lists from as many APIs as makes sense > (I stop at str.split() though, as I can't think of a use case where > the list would be so big as to be bothersome). s = ('123456789' * 10) + '\n' s = s * 10**9 s.split('\n') Now, maybe we "shouldn't" be processing all that in memory, but if your argument applies to other things, I don't see why it shouldn't apply to split(). Keep in mind that because split() generates a new string for each line, that really does eat lots of memory, even if you switch to 10**6 instead of 10**9, which seems like a very common use case. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ This is Python. We don't care much about theory, except where it intersects with useful practice. From guido at python.org Tue Aug 7 16:59:05 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 07:59:05 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B866A0.2040800@gmail.com> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> Message-ID: On 8/7/07, Nick Coghlan wrote: > Yeah, this approach seems to run counter to the whole point of getting > rid of the current str type: "for binary data use bytes, for text use > Unicode, unless you need your binary data to be hashable, and then you > decode it to gibberish Unicode via the latin-1 codec" > > This would mean that the Unicode type would acquire all of the ambiquity > currently associated with the 8-bit str type: does it contain actual > text, or does it contain arbitrary latin-1 decoded binary data? Not necessarily, as this kind of use is typically very localized. Remember practicality beats purity. > A separate frozenbytes type (with the bytes API instead of the string > API) would solve the problem far more cleanly. But at a cost: an extra data type, more code to maintain, more docs to write, thicker books, etc. To me, the most important cost is that every time you need to use bytes you would have to think about whether to use frozen or mutable bytes. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From shiblon at gmail.com Tue Aug 7 16:59:31 2007 From: shiblon at gmail.com (Chris Monson) Date: Tue, 7 Aug 2007 10:59:31 -0400 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> Message-ID: On 8/6/07, Guido van Rossum wrote: > > On 8/6/07, Chris Monson wrote: > > On 8/6/07, Guido van Rossum wrote: > > > On 8/6/07, "Martin v. L?wis" wrote: > > > > b) should bytes literals be regular or frozen bytes? > > > > > > Regular -- set literals produce mutable sets, too. > > > > But all other string literals produce immutable types: > > > > "" > > r"" > > u"" (going away, but still) > > and hopefully b"" > > > > Wouldn't it be confusing to have b"" be the only mutable quote-delimited > > literal? For everything else, there's bytes(). > > Well, it would be just as confusing to have a bytes literal and not > have it return a bytes object. The frozenbytes type is intended (if I > understand the use case correctly) as for the relatively rare case > where bytes must be used as dict keys and we can't assume that the > bytes use any particular encoding. > > Personally, I still think that converting to the latin-1 encoding is > probably just as good for this particular use case. So perhaps I don't > understand the use case(s?) correctly. > > > :-) > > What does the :-) mean? That you're not seriously objecting? No, just that I'm friendly. (just a smile, not a wink). I still think that having b"" be the only immutable string-looking thing is a bad idea. Just because the types are named "bytes" and "frozenbytes" instead of "bytes" and "BytesIO" or something similar doesn't mean that the syntax magically looks right. -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070807/d559048c/attachment.html From guido at python.org Tue Aug 7 17:01:41 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 08:01:41 -0700 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: <18104.33617.706079.853923@montanaro.dyndns.org> References: <18104.33617.706079.853923@montanaro.dyndns.org> Message-ID: Odd. It passes for me. What platform? What locale? Have you tried svn up and rebuilding? Do you have any local changes (svn st)? Note that in s'\x00', the 's' prefix is produced by the repr() of a str8 object; this may be enough of a hint to track it down. Perhaps there's a call to PyString_From... that got missed by the conversion and only matters for certain locales? --Guido On 8/7/07, skip at pobox.com wrote: > test_csv got removed from the failing list after Guido applied Adam Hupp's > patch. (I checked in a small update for one thing Adam missed.) I'm still > getting test failures though: > > ====================================================================== > FAIL: test_reader_attrs (__main__.Test_Csv) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "Lib/test/test_csv.py", line 63, in test_reader_attrs > self._test_default_attrs(csv.reader, []) > File "Lib/test/test_csv.py", line 47, in _test_default_attrs > self.assertEqual(obj.dialect.delimiter, ',') > AssertionError: s'\x00' != ',' > > This same exception crops up six times. Maybe this isn't > str->unicode-related, but it sure seems like it to me. I spent some time > over the past few days trying to figure it out, but I struck out. > > Skip > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From p.f.moore at gmail.com Tue Aug 7 17:02:40 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 7 Aug 2007 16:02:40 +0100 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: Message-ID: <79990c6b0708070802y112f5750o5b5e79a2833a19b8@mail.gmail.com> On 07/08/07, Guido van Rossum wrote: > On 8/7/07, Thomas Heller wrote: > > I wanted to look into this and noticed that 'import time' on Windows > > doesn't work anymore on my machine. The reason is that PyUnicode_FromStringAndSize() > > is called for the string 'Westeurop?ische Normalzeit', and then fails with > > > > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11: invalid data > > I'm assuming that's a literal somewhere? In what encoding is it? That > function was recently changed to require the input to be UTF-8. If the > input isn't UTF-8, you'll have to use another API with an explicit > encoding, PyUnicode_Decode(). I'd guess it's coming from a call to a Windows API somewhere. The correct fix is probably to switch to using the "wide character" Windows APIs, which will give Unicode values as results directly. A shorter-term fix is possibly to use Windows' default code page to decode all strings coming back from Windows APIs (although I'm not sure it'll be any quicker in practice!). Paul. From lists at cheimes.de Tue Aug 7 17:21:55 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 07 Aug 2007 17:21:55 +0200 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: <46B80D73.5050009@cheimes.de> Message-ID: <46B88E13.7070908@cheimes.de> Guido van Rossum wrote: > Alas, not for me (Ubuntu 6.06 LTS, UCS2 build): > > ====================================================================== > ERROR: testEncodings (__main__.MinidomTest) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "Lib/test/test_minidom.py", line 872, in testEncodings > self.assertEqual(doc.toxml(), > File "/usr/local/google/home/guido/python/py3k-struni/Lib/xml/dom/minidom.py", > line 46, in toxml > return self.toprettyxml("", "", encoding) > File "/usr/local/google/home/guido/python/py3k-struni/Lib/xml/dom/minidom.py", > line 54, in toprettyxml > self.writexml(writer, "", indent, newl, encoding) > File "/usr/local/google/home/guido/python/py3k-struni/Lib/xml/dom/minidom.py", > line 1747, in writexml > node.writexml(writer, indent, addindent, newl) > File "/usr/local/google/home/guido/python/py3k-struni/Lib/xml/dom/minidom.py", > line 817, in writexml > node.writexml(writer,indent+addindent,addindent,newl) > File "/usr/local/google/home/guido/python/py3k-struni/Lib/xml/dom/minidom.py", > line 1036, in writexml > _write_data(writer, "%s%s%s"%(indent, self.data, newl)) > File "/usr/local/google/home/guido/python/py3k-struni/Lib/xml/dom/minidom.py", > line 301, in _write_data > writer.write(data) > File "/usr/local/google/home/guido/python/py3k-struni/Lib/io.py", > line 1023, in write > b = s.encode(self._encoding) > UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in > position 0: ordinal not in range(256) What's your locale? My locale setting is de_DE.UTF-8. When I run the unit test of minidom with "LC_ALL=C ./python Lib/test/test_minidom.py" testEncoding is failing, too. > So true. I'm hoping the real author will identify himself. :-) His name is Lars Gustbel (probably Gust?bel). Christian From theller at ctypes.org Tue Aug 7 17:50:27 2007 From: theller at ctypes.org (Thomas Heller) Date: Tue, 07 Aug 2007 17:50:27 +0200 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: Message-ID: Guido van Rossum schrieb: > On 8/7/07, Thomas Heller wrote: >> Guido van Rossum schrieb: >> > test_ctypes >> > Recently one test started failing again, after Martin changed >> > PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1. >> >> I wanted to look into this and noticed that 'import time' on Windows >> doesn't work anymore on my machine. The reason is that PyUnicode_FromStringAndSize() >> is called for the string 'Westeurop?ische Normalzeit', and then fails with >> >> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11: invalid data > > I'm assuming that's a literal somewhere? In what encoding is it? That > function was recently changed to require the input to be UTF-8. If the > input isn't UTF-8, you'll have to use another API with an explicit > encoding, PyUnicode_Decode(). It's in Modules/timemodule.c, line 691: PyModule_AddObject(m, "tzname", Py_BuildValue("(zz)", tzname[0], tzname[1])); According to MSDN, tzname is a global variable; the contents is somehow derived from the TZ environment variable (which is not set in my case). Is there another Py_BuildValue code that should be used? BTW: There are other occurrences of Py_BuildValue("(zz)", ...) in this file; someone should probably check if the UTF8 can be assumed as input. > I'm pretty sure this change is also responsible for the one failure > (as it started around the time that change was made) but I don't > understand the failure well enough to track it down. (It looked like > uninitialized memory was being accessed though.) I'm not sure what failure you are talking about here. > In case you wonder why it was changed, it's for symmetry with > _PyUnicode_AsDefaultEncodedString(), which is the most common way to > turn Unicode back into a char* without specifying an encoding. (And > yes, that name needs to be changed.) > > See recent posts here. > From jeremy at alum.mit.edu Tue Aug 7 17:52:05 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 7 Aug 2007 11:52:05 -0400 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> Message-ID: On 8/6/07, Fred Drake wrote: > On Aug 6, 2007, at 4:46 PM, skip at pobox.com wrote: > > I thought rfc822 was going away. From the current module > > documentation: > > ... > > Shouldn't rfc822 be gone altogether in Python 3? > > Yes. And the answers to Jeremy's questions about what sort of IO is > appropriate for the email package should be left to the email-sig as > well, I suspect. It's good that they've come up. Hmmm. Should we being using the email package to parse HTTP headers? RFC 2616 says that HTTP headers follow the "same generic format" as RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers are arbitrary 8-bit values. You'd need to parse them differently. I also wonder if it makes sense for httplib to depend on email. If it is possible to write generic code, maybe it belongs in a common library rather than in either email or httplib. I meant my original email to ask a more general question: Does anyone have some suggestions about how to design libraries that could deal with bytes or strings? If an HTTP header value contains 8-bit binary data, does the client application expect bytes or a string in some encoding? If you have a library that consumes file-like objects, how do you deal with bytes vs. strings? Do you have two constructor options so that the client can specify what kind of output the file-like object products? Do you try to guess? Do you just write code assuming strings and let it fail on a bad lower() call when it gets bytes? Jeremy > > > -Fred > > -- > Fred Drake > > > > From jimjjewett at gmail.com Tue Aug 7 17:56:49 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 7 Aug 2007 11:56:49 -0400 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> Message-ID: On 8/7/07, Guido van Rossum wrote: > On 8/7/07, Nick Coghlan wrote: > > This would mean that the Unicode type would acquire all of the ambiquity > > currently associated with the 8-bit str type: does it contain actual > > text, or does it contain arbitrary latin-1 decoded binary data? ... > > A separate frozenbytes type (with the bytes API instead of the string > > API) would solve the problem far more cleanly. > But at a cost: an extra data type, more code to maintain, more docs to > write, thicker books, etc. I think that cost is already there, and we're making it even worse by trying to use the same name for two distinct concepts. (1) A mutable buffer (2) A literal which isn't "characters" Historically, most of the type(2) examples have just used ASCII (or at least Latin-1) for convenience, so that they *look* like characters. The actual requirements are on the bytes, though, so recoding them to a different output format is not OK. Also note that for type(2), immutability is important, not just for efficiency, but conceptually. These are generally compile-time constants, and letting them change *will* lead to confusion. (Even letting them get replaced is confusing, but that sort of monkey-patching is sufficiently rare and obvious that it seems to work out OK today.) -jJ From collinw at gmail.com Tue Aug 7 18:22:47 2007 From: collinw at gmail.com (Collin Winter) Date: Tue, 7 Aug 2007 09:22:47 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B834F6.7050307@canterbury.ac.nz> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> <46B7F869.6080007@v.loewis.de> <46B834F6.7050307@canterbury.ac.nz> Message-ID: <43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com> On 8/7/07, Greg Ewing wrote: > Martin v. L?wis wrote: > > Code that has been ported to the bytes type probably doesn't use it > > correctly yet, but to me, the need for a buffery thing where you > > can allocate some buffer, and then fill it byte-for-byte is quite > > obvious. > > We actually already *have* something like that, > i.e. array.array('B'). Could someone please explain to me the conceptual difference between array.array('B'), bytes(), buffer objects and simple lists of integers? I'm confused about when I should use which. Collin Winter From ncoghlan at gmail.com Tue Aug 7 18:22:51 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 08 Aug 2007 02:22:51 +1000 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> Message-ID: <46B89C5B.7090104@gmail.com> Guido van Rossum wrote: > On 8/7/07, Nick Coghlan wrote: >> A separate frozenbytes type (with the bytes API instead of the >> string API) would solve the problem far more cleanly. > > But at a cost: an extra data type, more code to maintain, more docs > to write, thicker books, etc. > > To me, the most important cost is that every time you need to use > bytes you would have to think about whether to use frozen or mutable > bytes. I agree this cost exists, but I don't think it is very high. I would expect the situation to be the same as with sets - you'd use the mutable version by default, unless there was some specific reason to want the frozen version (usually because you want something that is hashable, or easy to share safely amongst multiple clients). However, code also talks louder than words in this case, and I don't have any relevant code, so I am going to try to stay out of this thread now. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Tue Aug 7 18:35:00 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 09:35:00 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> <46B7F869.6080007@v.loewis.de> <46B834F6.7050307@canterbury.ac.nz> <43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com> Message-ID: On 8/7/07, Collin Winter wrote: > Could someone please explain to me the conceptual difference between > array.array('B'), bytes(), buffer objects and simple lists of > integers? I'm confused about when I should use which. Assuming you weren't being sarcastic, array('B') and bytes() are very close except bytes have a literal notation and many string-ish methods. The buffer objects returned by the buffer() builtin provide a read-only view on other objects that happen to have an internal buffer, like strings, bytes, arrays, PIL images, and numpy arrays. Lists of integers don't have the property that the other three share which is that their C representation is a contiguous array of bytes (char* in C). This representation is important because to do efficient I/O in C you need char*. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 7 18:39:26 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 09:39:26 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B89C5B.7090104@gmail.com> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B89C5B.7090104@gmail.com> Message-ID: On 8/7/07, Nick Coghlan wrote: > Guido van Rossum wrote: > > On 8/7/07, Nick Coghlan wrote: > >> A separate frozenbytes type (with the bytes API instead of the > >> string API) would solve the problem far more cleanly. > > > > But at a cost: an extra data type, more code to maintain, more docs > > to write, thicker books, etc. > > > > To me, the most important cost is that every time you need to use > > bytes you would have to think about whether to use frozen or mutable > > bytes. > > I agree this cost exists, but I don't think it is very high. I would > expect the situation to be the same as with sets - you'd use the mutable > version by default, unless there was some specific reason to want the > frozen version (usually because you want something that is hashable, or > easy to share safely amongst multiple clients). That would imply that b"..." should return a mutable bytes object, which many people have objected to. If b"..." is immutable, the immutable bytes type is in your face all the time and you'll have to deal with the difference all the time. E.g. is the result of concatenating a mutable and an immutable bytes object mutable? Does it matter whether the mutable operand is first or second? Is a slice of an immutable bytes array immutable itself? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Tue Aug 7 18:46:50 2007 From: skip at pobox.com (skip at pobox.com) Date: Tue, 7 Aug 2007 11:46:50 -0500 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: <18104.33617.706079.853923@montanaro.dyndns.org> Message-ID: <18104.41466.798544.931265@montanaro.dyndns.org> Guido> Odd. It passes for me. What platform? What locale? Have you tried Guido> svn up and rebuilding? Do you have any local changes (svn st)? I am completely up-to-date: >>> sys.subversion ('CPython', 'branches/py3k-struni', '56800') Running on Mac OS X (G4 Powerbook), no local modifications. Configured like so: ./configure --prefix=/Users/skip/local LDFLAGS=-L/opt/local/lib CPPFLAGS=-I/opt/local/include --with-pydebug Locale: >>> locale.getdefaultlocale() (None, 'mac-roman') Is there some environment variable I can set to run in a different locale? Guido> Note that in s\x00, the s prefix is produced by the repr() of a Guido> str8 object; this may be enough of a hint to track it Guido> down. Perhaps theres a call to PyString_From... that got missed Guido> by the conversion and only matters for certain locales? I don't see any PyString_From... calls left in Modules/_csv.c. Should I be looking elsewhere? Skip From guido at python.org Tue Aug 7 18:51:34 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 09:51:34 -0700 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> Message-ID: On 8/7/07, Jeremy Hylton wrote: > On 8/6/07, Fred Drake wrote: > > On Aug 6, 2007, at 4:46 PM, skip at pobox.com wrote: > > > I thought rfc822 was going away. From the current module > > > documentation: > > > ... > > > Shouldn't rfc822 be gone altogether in Python 3? > > > > Yes. And the answers to Jeremy's questions about what sort of IO is > > appropriate for the email package should be left to the email-sig as > > well, I suspect. It's good that they've come up. > > Hmmm. Should we being using the email package to parse HTTP headers? > RFC 2616 says that HTTP headers follow the "same generic format" as > RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers > are arbitrary 8-bit values. You'd need to parse them differently. I'm confused (and too lazy to read the RFCs). How can you have case insensitivity (as HTTP clearly has) if the headers are arbitrary 8-bit values? Assuming they mean it's an ASCII superset, does that mean that HTTP doesn't have case insensitivity for bytes with values > 127? > I also wonder if it makes sense for httplib to depend on email. If it > is possible to write generic code, maybe it belongs in a common > library rather than in either email or httplib. > > I meant my original email to ask a more general question: Does anyone > have some suggestions about how to design libraries that could deal > with bytes or strings? If an HTTP header value contains 8-bit binary > data, does the client application expect bytes or a string in some > encoding? > > If you have a library that consumes file-like objects, how do you deal > with bytes vs. strings? Do you have two constructor options so that > the client can specify what kind of output the file-like object > products? Do you try to guess? Do you just write code assuming > strings and let it fail on a bad lower() call when it gets bytes? In general I'm against writing polymorphic code that tries to work for strings as well as bytes, except very small algorithms. For larger amounts of code, you almost always run into the need for literals or hashing or case conversion or other differences (e.g. \n vs. \r\n when doing I/O). I think it's conceptually cleaner to pick a particular type for an API and stick to it. E.g. sockets, binary files (io.RawIOBase) and *dbm files read/write bytes; text files (io.TextIOBase) read/write strings. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 7 18:55:48 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 09:55:48 -0700 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: Message-ID: On 8/7/07, Thomas Heller wrote: > Guido van Rossum schrieb: > > On 8/7/07, Thomas Heller wrote: > >> Guido van Rossum schrieb: > >> > test_ctypes > >> > Recently one test started failing again, after Martin changed > >> > PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1. > >> > >> I wanted to look into this and noticed that 'import time' on Windows > >> doesn't work anymore on my machine. The reason is that PyUnicode_FromStringAndSize() > >> is called for the string 'Westeurop?ische Normalzeit', and then fails with > >> > >> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11: invalid data > > > > I'm assuming that's a literal somewhere? In what encoding is it? That > > function was recently changed to require the input to be UTF-8. If the > > input isn't UTF-8, you'll have to use another API with an explicit > > encoding, PyUnicode_Decode(). > > It's in Modules/timemodule.c, line 691: > PyModule_AddObject(m, "tzname", > Py_BuildValue("(zz)", tzname[0], tzname[1])); > > According to MSDN, tzname is a global variable; the contents is somehow > derived from the TZ environment variable (which is not set in my case). Is there anything from which you can guess the encoding (e.g. the filesystem encoding?). > Is there another Py_BuildValue code that should be used? BTW: There are > other occurrences of Py_BuildValue("(zz)", ...) in this file; someone should > probably check if the UTF8 can be assumed as input. These are all externally-provided strings. It will depend on the platform what the encoding is. I wonder if we need to add another format code to Py_BuildValue (and its friends) to designate "platform default encoding" instead of UTF-8. > > I'm pretty sure this change is also responsible for the one failure > > (as it started around the time that change was made) but I don't > > understand the failure well enough to track it down. (It looked like > > uninitialized memory was being accessed though.) > > I'm not sure what failure you are talking about here. When I run test_ctypes I get this (1 error out of 301 tests): ====================================================================== ERROR: test_functions (ctypes.test.test_stringptr.StringPtrTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/google/home/guido/python/py3k-struni/Lib/ctypes/test/test_stringptr.py", line 72, in test_functions x1 = r[0], r[1], r[2], r[3], r[4] UnicodeDecodeError: 'utf8' codec can't decode byte 0xdb in position 0: unexpected end of data -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Tue Aug 7 19:38:44 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 7 Aug 2007 13:38:44 -0400 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> Message-ID: On 8/7/07, Guido van Rossum wrote: > On 8/7/07, Jeremy Hylton wrote: > > On 8/6/07, Fred Drake wrote: > > > On Aug 6, 2007, at 4:46 PM, skip at pobox.com wrote: > > > > I thought rfc822 was going away. From the current module > > > > documentation: > > > > ... > > > > Shouldn't rfc822 be gone altogether in Python 3? > > > > > > Yes. And the answers to Jeremy's questions about what sort of IO is > > > appropriate for the email package should be left to the email-sig as > > > well, I suspect. It's good that they've come up. > > > > Hmmm. Should we being using the email package to parse HTTP headers? > > RFC 2616 says that HTTP headers follow the "same generic format" as > > RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers > > are arbitrary 8-bit values. You'd need to parse them differently. > > I'm confused (and too lazy to read the RFCs). How can you have case > insensitivity (as HTTP clearly has) if the headers are arbitrary 8-bit > values? Assuming they mean it's an ASCII superset, does that mean that > HTTP doesn't have case insensitivity for bytes with values > 127? For HTTP, the header names need to be ASCII, but the values can be great > 127. I haven't read enough of the spec to know which header values might include binary data and how you are supposed to interpret them. Assuming that the spec allows OCTET instead of token (which is ASCII) for a reason, it suggests that the header values need to be bytes. > > I also wonder if it makes sense for httplib to depend on email. If it > > is possible to write generic code, maybe it belongs in a common > > library rather than in either email or httplib. > > > > I meant my original email to ask a more general question: Does anyone > > have some suggestions about how to design libraries that could deal > > with bytes or strings? If an HTTP header value contains 8-bit binary > > data, does the client application expect bytes or a string in some > > encoding? > > > > If you have a library that consumes file-like objects, how do you deal > > with bytes vs. strings? Do you have two constructor options so that > > the client can specify what kind of output the file-like object > > products? Do you try to guess? Do you just write code assuming > > strings and let it fail on a bad lower() call when it gets bytes? > > In general I'm against writing polymorphic code that tries to work for > strings as well as bytes, except very small algorithms. For larger > amounts of code, you almost always run into the need for literals or > hashing or case conversion or other differences (e.g. \n vs. \r\n when > doing I/O). > > I think it's conceptually cleaner to pick a particular type for an API > and stick to it. E.g. sockets, binary files (io.RawIOBase) and *dbm > files read/write bytes; text files (io.TextIOBase) read/write strings. It certainly makes rfc822 tricky to update. Is it intended to work with files or sockets? In Python 2.x, it works with either. If we have some future email/rfc822/httpheaders library that parses the "generic format," will it work with sockets or files or will we have two versions? Jeremy From guido at python.org Tue Aug 7 19:52:26 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 10:52:26 -0700 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> Message-ID: On 8/7/07, Jeremy Hylton wrote: > On 8/7/07, Guido van Rossum wrote: > > On 8/7/07, Jeremy Hylton wrote: > > > Hmmm. Should we being using the email package to parse HTTP headers? > > > RFC 2616 says that HTTP headers follow the "same generic format" as > > > RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers > > > are arbitrary 8-bit values. You'd need to parse them differently. > > > > I'm confused (and too lazy to read the RFCs). How can you have case > > insensitivity (as HTTP clearly has) if the headers are arbitrary 8-bit > > values? Assuming they mean it's an ASCII superset, does that mean that > > HTTP doesn't have case insensitivity for bytes with values > 127? > > For HTTP, the header names need to be ASCII, but the values can be > great > 127. I haven't read enough of the spec to know which header > values might include binary data and how you are supposed to interpret > them. Assuming that the spec allows OCTET instead of token (which is > ASCII) for a reason, it suggests that the header values need to be > bytes. Bizarre. I'm not aware of any HTTP header that requires *binary* values. I can imagine though that they may contain *encoded* text and that they are leaving the encoding up to separate negotiations between client and server, or another header, or specified explicitly by the header, etc. It can't be pure binary because it's still subject to the \r\n line terminator. > > In general I'm against writing polymorphic code that tries to work for > > strings as well as bytes, except very small algorithms. For larger > > amounts of code, you almost always run into the need for literals or > > hashing or case conversion or other differences (e.g. \n vs. \r\n when > > doing I/O). > > > > I think it's conceptually cleaner to pick a particular type for an API > > and stick to it. E.g. sockets, binary files (io.RawIOBase) and *dbm > > files read/write bytes; text files (io.TextIOBase) read/write strings. > > It certainly makes rfc822 tricky to update. Is it intended to work > with files or sockets? In Python 2.x, it works with either. If we > have some future email/rfc822/httpheaders library that parses the > "generic format," will it work with sockets or files or will we have > two versions? It never worked with socket object, did it? If it worked with the objects returned by makefile(), why not use text mode ("r" or "w") as the mode arg? (Then you can even specify an encoding.) IMO it makes more sense to treat rfc822 headers as text, since they are for all intents and purposes meant to be human-readable, and there's case insensitivity implied. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Tue Aug 7 20:31:30 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 7 Aug 2007 14:31:30 -0400 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> Message-ID: On 8/7/07, Guido van Rossum wrote: > On 8/7/07, Jeremy Hylton wrote: > > On 8/7/07, Guido van Rossum wrote: > > > On 8/7/07, Jeremy Hylton wrote: > > > > Hmmm. Should we being using the email package to parse HTTP headers? > > > > RFC 2616 says that HTTP headers follow the "same generic format" as > > > > RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers > > > > are arbitrary 8-bit values. You'd need to parse them differently. > > > > > > I'm confused (and too lazy to read the RFCs). How can you have case > > > insensitivity (as HTTP clearly has) if the headers are arbitrary 8-bit > > > values? Assuming they mean it's an ASCII superset, does that mean that > > > HTTP doesn't have case insensitivity for bytes with values > 127? > > > > For HTTP, the header names need to be ASCII, but the values can be > > great > 127. I haven't read enough of the spec to know which header > > values might include binary data and how you are supposed to interpret > > them. Assuming that the spec allows OCTET instead of token (which is > > ASCII) for a reason, it suggests that the header values need to be > > bytes. > > Bizarre. I'm not aware of any HTTP header that requires *binary* > values. I can imagine though that they may contain *encoded* text and > that they are leaving the encoding up to separate negotiations between > client and server, or another header, or specified explicitly by the > header, etc. It can't be pure binary because it's still subject to the > \r\n line terminator. I did a little more reading. """The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047 [14]. TEXT = """ The odd thing here is that RFC 2047 (MIME) seems to be about encoding non-ASCII character sets in ASCII. So the spec is kind of odd here. The actual bytes on the wire seem to be ASCII, but they may an interpretation where those ASCII bytes represent a non-ASCII string. So the shared parsing with email/rfc822 does seem reasonable. > > > In general I'm against writing polymorphic code that tries to work for > > > strings as well as bytes, except very small algorithms. For larger > > > amounts of code, you almost always run into the need for literals or > > > hashing or case conversion or other differences (e.g. \n vs. \r\n when > > > doing I/O). > > > > > > I think it's conceptually cleaner to pick a particular type for an API > > > and stick to it. E.g. sockets, binary files (io.RawIOBase) and *dbm > > > files read/write bytes; text files (io.TextIOBase) read/write strings. > > > > It certainly makes rfc822 tricky to update. Is it intended to work > > with files or sockets? In Python 2.x, it works with either. If we > > have some future email/rfc822/httpheaders library that parses the > > "generic format," will it work with sockets or files or will we have > > two versions? > > It never worked with socket object, did it? If it worked with the > objects returned by makefile(), why not use text mode ("r" or "w") as > the mode arg? (Then you can even specify an encoding.) IMO it makes > more sense to treat rfc822 headers as text, since they are for all > intents and purposes meant to be human-readable, and there's case > insensitivity implied. We use the same makefile() object to read the headers and the body. We can't trust the body is text. I guess we could change the code to use two different makefile() calls--a text one for headers that is closed when the headers are done, and a binary one for the body. Jeremy From stephen at xemacs.org Tue Aug 7 20:49:53 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 08 Aug 2007 03:49:53 +0900 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> Message-ID: <873ayv89im.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > Bizarre. I'm not aware of any HTTP header that requires *binary* > values. I can imagine though that they may contain *encoded* text and > that they are leaving the encoding up to separate negotiations between > client and server, or another header, or specified explicitly by the > header, etc. It can't be pure binary because it's still subject to the > \r\n line terminator. I assume that the relevant explanation is from RFC 2616, sec 2.2 : The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047 [14]. TEXT = A CRLF is allowed in the definition of TEXT only as part of a header field continuation. It is expected that the folding LWS will be replaced with a single SP before interpretation of the TEXT value. Many parsed fields are made up of tokens, whose components are a subset of CHAR, which is US-ASCII characters as octets (also sec. 2.2). This is the ASCII coded character set (EBCDIC encoding of the ASCII repertoire won't do). Other parsed fields contain special data, such as dates, written with some subset of ASCII. From guido at python.org Tue Aug 7 20:38:25 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 11:38:25 -0700 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> Message-ID: On 8/7/07, Jeremy Hylton wrote: > We use the same makefile() object to read the headers and the body. > We can't trust the body is text. I guess we could change the code to > use two different makefile() calls--a text one for headers that is > closed when the headers are done, and a binary one for the body. That would cause problems with the buffering, but it is safe to extract the underlying binary buffered stream from the TextIOWrapper instance using the .buffer attribute -- this is intentionally not prefixed with an underscore. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lars at gustaebel.de Tue Aug 7 20:40:00 2007 From: lars at gustaebel.de (Lars =?iso-8859-15?Q?Gust=E4bel?=) Date: Tue, 7 Aug 2007 20:40:00 +0200 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: <46B80D73.5050009@cheimes.de> References: <46B80D73.5050009@cheimes.de> Message-ID: <20070807184000.GA26947@core.g33x.de> On Tue, Aug 07, 2007 at 08:13:07AM +0200, Christian Heimes wrote: > > test_tarfile > > Virgin territory again (but different owner :-). > > The tarfile should be addressed by either its original author or > somebody with lots of spare time. As stated earlier it's a beast. I > tried to fix it several weeks ago because I thought it is a low hanging > fruit. I was totally wrong. :/ Okay, I fixed tarfile.py. It isn't that hard if know how to tame the beast ;-) I hope everything works fine now. -- Lars Gust?bel lars at gustaebel.de Der Mensch kann zwar tun, was er will, aber er kann nicht wollen, was er will. (Arthur Schopenhauer) From guido at python.org Tue Aug 7 21:27:40 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 12:27:40 -0700 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: <20070807184000.GA26947@core.g33x.de> References: <46B80D73.5050009@cheimes.de> <20070807184000.GA26947@core.g33x.de> Message-ID: I still get these three failures on Ubuntu dapper: ====================================================================== ERROR: test_fileobj_iter (test.test_tarfile.Bz2UstarReadTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/google/home/guido/python/py3k-struni/Lib/test/test_tarfile.py", line 83, in test_fileobj_iter tarinfo = self.tar.getmember("ustar/regtype") File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 2055, in extract self._extract_member(tarinfo, os.path.join(path, tarinfo.name)) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 2131, in _extract_member self.makefile(tarinfo, targetpath) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 2169, in makefile copyfileobj(source, target) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 254, in copyfileobj shutil.copyfileobj(src, dst) File "/usr/local/google/home/guido/python/py3k-struni/Lib/shutil.py", line 21, in copyfileobj buf = fsrc.read(length) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 809, in read buf += self.fileobj.read(size - len(buf)) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 718, in read return self.readnormal(size) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 727, in readnormal return self.fileobj.read(size) ValueError: the bz2 library has received wrong parameters ====================================================================== ERROR: test_fileobj_readlines (test.test_tarfile.Bz2UstarReadTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/google/home/guido/python/py3k-struni/Lib/test/test_tarfile.py", line 67, in test_fileobj_readlines tarinfo = self.tar.getmember("ustar/regtype") File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 2055, in extract self._extract_member(tarinfo, os.path.join(path, tarinfo.name)) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 2131, in _extract_member self.makefile(tarinfo, targetpath) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 2169, in makefile copyfileobj(source, target) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 254, in copyfileobj shutil.copyfileobj(src, dst) File "/usr/local/google/home/guido/python/py3k-struni/Lib/shutil.py", line 21, in copyfileobj buf = fsrc.read(length) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 809, in read buf += self.fileobj.read(size - len(buf)) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 718, in read return self.readnormal(size) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 727, in readnormal return self.fileobj.read(size) ValueError: the bz2 library has received wrong parameters ====================================================================== ERROR: test_fileobj_seek (test.test_tarfile.Bz2UstarReadTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/google/home/guido/python/py3k-struni/Lib/test/test_tarfile.py", line 93, in test_fileobj_seek fobj = open(os.path.join(TEMPDIR, "ustar/regtype"), "rb") File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 2055, in extract self._extract_member(tarinfo, os.path.join(path, tarinfo.name)) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 2131, in _extract_member self.makefile(tarinfo, targetpath) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 2169, in makefile copyfileobj(source, target) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 254, in copyfileobj shutil.copyfileobj(src, dst) File "/usr/local/google/home/guido/python/py3k-struni/Lib/shutil.py", line 21, in copyfileobj buf = fsrc.read(length) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 809, in read buf += self.fileobj.read(size - len(buf)) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 718, in read return self.readnormal(size) File "/usr/local/google/home/guido/python/py3k-struni/Lib/tarfile.py", line 727, in readnormal return self.fileobj.read(size) ValueError: the bz2 library has received wrong parameters ---------------------------------------------------------------------- Ran 140 tests in 5.346s FAILED (errors=3) test test_tarfile failed -- errors occurred; run in verbose mode for details 1 test failed: test_tarfile [69852 refs] On 8/7/07, Lars Gust?bel wrote: > On Tue, Aug 07, 2007 at 08:13:07AM +0200, Christian Heimes wrote: > > > test_tarfile > > > Virgin territory again (but different owner :-). > > > > The tarfile should be addressed by either its original author or > > somebody with lots of spare time. As stated earlier it's a beast. I > > tried to fix it several weeks ago because I thought it is a low hanging > > fruit. I was totally wrong. :/ > > Okay, I fixed tarfile.py. It isn't that hard if know how to tame > the beast ;-) I hope everything works fine now. > > -- > Lars Gust?bel > lars at gustaebel.de > > Der Mensch kann zwar tun, was er will, > aber er kann nicht wollen, was er will. > (Arthur Schopenhauer) > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rrr at ronadam.com Tue Aug 7 21:52:55 2007 From: rrr at ronadam.com (Ron Adam) Date: Tue, 07 Aug 2007 14:52:55 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B83C7F.603@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B66E7E.4060209@canterbury.ac.nz> <46B6A7E8.7040001@ronadam.com> <46B6C335.4080504@canterbury.ac.nz> <46B6DE80.2050000@ronadam.com> <46B7C369.3040509@canterbury.ac.nz> <46B7DEA6.5050609@ronadam.com> <46B83C7F.603@canterbury.ac.nz> Message-ID: <46B8CD97.8020000@ronadam.com> Greg Ewing wrote: > Ron Adam wrote: >> What about mismatched specifiers? > > It's not clear exactly what you mean by a "mismatched" > specifier. > > Some types may recognise when they're being passed > a format spec that belongs to another type, and try > to convert themselves to that type (e.g. applying > 'f' to an int or 'd' to a float). After thinking about this a bit more, I think specifiers don't have types and don't belong to types. They are the instructions to convert an object to a string, and to format that string in a special ways. It seems they are mapped one to many, and not one to one. > If the type doesn't recognise the format at all, > and doesn't have a fallback type to delegate to > (as will probably be the case with str) then > you will get an exception. I agree. >> I think the opinion so far is to let the objects __format__ method >> determine this, but we need to figure this out what the built in types >> will do. > > My suggestions would be: > > int - understands all the 'integer' formats > (d, x, o, etc.) > - recognises the 'float' formats ('f', 'e', etc.) > and delegates to float > - delegates anything it doesn't recognise to str > > float - understands all the 'float' formats > - recognises the 'integer' formats and delegates to int > - delegates anything it doesn't recognise to str > > str - recognises the 'string' formats (only one?) > - raises an exception for anything it doesn't understand > > I've forgotten where 'r' was supposed to fit into > this scheme. Can anyone remind me? So if i want my own objects to work with these other specifiers, I need to do something like the following in every object? def __format__(self, specifier): if specifier[0] in ['i', 'x', 'o', etc]: return int(self).format(specifier) if specifier[0] in ['f', 'e', etc]: return float(self.).format(specifier) if specifier[0] == 'r': return repr(self) if specifier[0] == 's': return str(self).format(specifier) if specifier[0] in '...': ... ... my own specifier handler ... raise ValueError, 'invalid specifier for this object type' I'm currently playing with a model where specifiers are objects. This seems to simplify some things. The specifier object parses the specifier term and has a method to apply it to a value. It can know about all the standard built in specifiers. It could call an objects __format__ method for an unknown specifier or we can have the __format__ method have first crack at it and write the default __format__ method like this... def __format__(self, specifier): return specifier.apply(self) Then we can over ride it in various ways... def __format__(self, specifier): ... ... my specifier handler ... return result def __format__(self, specifier): if specifier[0] in '...': ... ... my specifier handler ... return result return specifier.apply(self) def __format__(self, specifier): try: return specifier.apply(self) Except ValueError: pass ... ... my specifier handler ... return result Or we can say the standard ones don't call your __format__ method, but if you use the '!' specifier, it will call your __format__ method. More limiting, but much simpler. I'm not sure I have a preference here yet. Cheers, Ron From guido at python.org Tue Aug 7 22:13:03 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 13:13:03 -0700 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: <18104.33617.706079.853923@montanaro.dyndns.org> References: <18104.33617.706079.853923@montanaro.dyndns.org> Message-ID: I see this too now, but only on OSX. On 8/7/07, skip at pobox.com wrote: > test_csv got removed from the failing list after Guido applied Adam Hupp's > patch. (I checked in a small update for one thing Adam missed.) I'm still > getting test failures though: > > ====================================================================== > FAIL: test_reader_attrs (__main__.Test_Csv) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "Lib/test/test_csv.py", line 63, in test_reader_attrs > self._test_default_attrs(csv.reader, []) > File "Lib/test/test_csv.py", line 47, in _test_default_attrs > self.assertEqual(obj.dialect.delimiter, ',') > AssertionError: s'\x00' != ',' > > This same exception crops up six times. Maybe this isn't > str->unicode-related, but it sure seems like it to me. I spent some time > over the past few days trying to figure it out, but I struck out. > > Skip > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rrr at ronadam.com Tue Aug 7 22:26:54 2007 From: rrr at ronadam.com (Ron Adam) Date: Tue, 07 Aug 2007 15:26:54 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B7CD8C.5070807@acm.org> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> Message-ID: <46B8D58E.5040501@ronadam.com> Talin wrote: > Ron Adam wrote: >> Now here's the problem with all of this. As we add the widths back >> into the format specifications, we are basically saying the idea of a >> separate field width specifier is wrong. >> >> So maybe it's not really a separate independent thing after all, and >> it just a convenient grouping for readability purposes only. > > I'm beginning to suspect that this is indeed the case. Yes, I believe so even more after experimenting last night with specifier objects. for now I'm using ','s for separating *all* the terms. I don't intend that should be used for a final version, but for now it makes parsing the terms and getting the behavior right much easier. f,.3,>7 right justify in field width 7, with 3 decimal places. s,^10,w20 Center in feild 10, expands up to width 20. f,.3,% This allows me to just split on ',' and experiment with ordering and see how some terms might need to interact with other terms and how to do that without having to fight the syntax problem for now. Later the syntax can be compressed and tested with a fairly complete doctest as a separate problem. > Before we go too much further, let me give out the URLs for the .Net > documentation on these topics, since much of the current design we're > discussing has been inspired by .Net: > > http://msdn2.microsoft.com/en-us/library/dwhawy9k.aspx > http://msdn2.microsoft.com/en-us/library/0c899ak8.aspx > http://msdn2.microsoft.com/en-us/library/0asazeez.aspx > http://msdn2.microsoft.com/en-us/library/c3s1ez6e.aspx > http://msdn2.microsoft.com/en-us/library/az4se3k1.aspx > http://msdn2.microsoft.com/en-us/library/txafckwd.aspx > > I'd suggest some study of these. Although I would warn against adopting > this wholesale, as there are a huge number of features described in > these documents, more than I think we need. > > One other URL for people who want to play around with implementing this > stuff is my Python prototype of the original version of the PEP. It has > all the code you need to format floats with decimal precision, > exponents, and so on: > > http://www.viridia.org/hg/python/string_format?f=5e4b833ed285;file=StringFormat.py;style=raw Thanks, I'll take a look at it. Cheers, Ron From jimjjewett at gmail.com Tue Aug 7 23:03:37 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 7 Aug 2007 17:03:37 -0400 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> <46B7F869.6080007@v.loewis.de> <46B834F6.7050307@canterbury.ac.nz> <43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com> Message-ID: On 8/7/07, Guido van Rossum wrote: > On 8/7/07, Collin Winter wrote: > > Could someone please explain to me the conceptual difference between > > array.array('B'), bytes(), buffer objects and simple lists of > > integers? I'm confused about when I should use which. [bytes and array.array are similar, but bytes have extra methods and a literal notation] [buffer is read-only to your code, but may not be immutable] > Lists of integers don't have the property that the other three share > which is that their C representation is a contiguous array of bytes > (char* in C). This representation is important because to do efficient > I/O in C you need char*. This sounds almost as if they were all interchangable implementations of the same interface, and you should choose based on quality of implementation. If the need for immutable isn't worth a distinct type, I'm not sure why "I want it to be fast" is worth two (or three) extra types, distinguished by the equivalent of cursor isolation level. FWLIW, I think the migration path for the existing three types makes sense, but I would like b" ... " to be an immutable bytes object, and bytes(" ... ") to be the constructor for something that can mutate. -jJ From nas at arctrix.com Tue Aug 7 23:12:01 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Tue, 7 Aug 2007 21:12:01 +0000 (UTC) Subject: [Python-3000] should rfc822 accept text io or binary io? References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> Message-ID: Jeremy Hylton wrote: > Hmmm. Should we being using the email package to parse HTTP headers? > RFC 2616 says that HTTP headers follow the "same generic format" as > RFC 822, but RFC 822 says headers are ASCII and RFC 2616 says headers > are arbitrary 8-bit values. You'd need to parse them differently. It would be good to have a good RFC 2616 header parser in the standard library. I believe every Python web framework implements it's own. Neil From jimjjewett at gmail.com Tue Aug 7 23:14:00 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 7 Aug 2007 17:14:00 -0400 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B89C5B.7090104@gmail.com> Message-ID: On 8/7/07, Guido van Rossum wrote: > If b"..." is immutable, the > immutable bytes type is in your face all the time and you'll have to > deal with the difference all the time. There is a conceptual difference between the main use cases for mutable (a buffer) and the main use cases for immutable (a protocol constant). I'm not sure why you would need a literal for the mutable version. How often do you create a new buffer with initial values? (Note: not pointing to existing memory; creating a new one.) > E.g. is the result of > concatenating a mutable and an immutable bytes object mutable? > Does it matter whether the mutable operand is first or second? I would say immutable; you're taking a snapshot. (I would have some sympathy for taking the type of the first operand, but then you need to worry about + vs +=, and whether the start of the new object will notice later state changes.) > Is a slice of an immutable bytes array immutable itself? Why wouldn't it be? The question is what to do with a slice from a *mutable* array. Most of python uses copies (and keeps type, so the result is also mutable). Numpy often shares state for efficiency. Making an immutable copy makes sense for every sane use case *I* can come up with. (The insane case is that you might want to pass output_buffer[60:70] to some other object for its status output.) -jJ From guido at python.org Wed Aug 8 00:41:40 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 15:41:40 -0700 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: Message-ID: Here's a followup. We need help from someone with a 64-bit Linux box; these tests are failing on 64-bit only: test_io, test_largefile, test_ossaudiodev, test_poll, test_shelve, test_socket_ssl. I suspect that the _fileio.c module probably is one of the culprits. Other news: On 8/6/07, Guido van Rossum wrote: > We're down to 11 failing test in the struni branch. I'd like to get > this down to zero ASAP so that we can retire the old p3yk (yes, with > typo!) branch and rename py3k-struni to py3k. > > Please help! Here's the list of failing tests: > > test_ctypes > Recently one test started failing again, after Martin changed > PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1. > > test_email > test_email_codecs > test_email_renamed > Can someone contact the email-sig and ask for help with these? > > test_minidom > Recently started failing again; probably shallow. > > test_sqlite > Virgin territory, probably best done by whoever wrote the code or at > least someone with time to spare. > > test_tarfile > Virgin territory again (but different owner :-). Lars Gustaebel fixed this except for a few bz2-related tests. > test_urllib2_localnet > test_urllib2net > I think Jeremy Hylton may be close to fixing these, he's done a lot of > work on urllib and httplib. > > test_xml_etree_c > Virgin territory again. > > There are also a few tests that only fail on CYGWIN or OSX; I won't > bother listing these. The two OSX tests listed at the time were fixed, thanks to those volunteers! We now only have an OSX-specific failure in test_csv. > If you want to help, please refer to this wiki page: > http://wiki.python.org/moin/Py3kStrUniTests > > There are also other tasks; see http://wiki.python.org/moin/Py3kToDo -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lars at gustaebel.de Wed Aug 8 01:21:14 2007 From: lars at gustaebel.de (Lars =?iso-8859-15?Q?Gust=E4bel?=) Date: Wed, 8 Aug 2007 01:21:14 +0200 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: <46B80D73.5050009@cheimes.de> <20070807184000.GA26947@core.g33x.de> Message-ID: <20070807232114.GA29701@core.g33x.de> On Tue, Aug 07, 2007 at 12:27:40PM -0700, Guido van Rossum wrote: > I still get these three failures on Ubuntu dapper: > > > ====================================================================== > ERROR: test_fileobj_iter (test.test_tarfile.Bz2UstarReadTest) > ---------------------------------------------------------------------- [...] > ValueError: the bz2 library has received wrong parameters This is actually a bug in the bz2 module. The read() method of bz2.BZ2File raises this ValueError with a size argument of 0. -- Lars Gust?bel lars at gustaebel.de A chicken is an egg's way of producing more eggs. (Anonymous) From guido at python.org Wed Aug 8 01:29:29 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 16:29:29 -0700 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: <20070807232114.GA29701@core.g33x.de> References: <46B80D73.5050009@cheimes.de> <20070807184000.GA26947@core.g33x.de> <20070807232114.GA29701@core.g33x.de> Message-ID: Thanks -- fixed! Committed revision 56814. On 8/7/07, Lars Gust?bel wrote: > On Tue, Aug 07, 2007 at 12:27:40PM -0700, Guido van Rossum wrote: > > I still get these three failures on Ubuntu dapper: > > > > > > ====================================================================== > > ERROR: test_fileobj_iter (test.test_tarfile.Bz2UstarReadTest) > > ---------------------------------------------------------------------- > [...] > > ValueError: the bz2 library has received wrong parameters > > This is actually a bug in the bz2 module. The read() method of > bz2.BZ2File raises this ValueError with a size argument of 0. > > -- > Lars Gust?bel > lars at gustaebel.de > > A chicken is an egg's way of producing more eggs. > (Anonymous) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Wed Aug 8 02:04:56 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Aug 2007 12:04:56 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> Message-ID: <46B908A8.8040605@canterbury.ac.nz> Guido van Rossum wrote: > On 8/7/07, Nick Coghlan wrote: > >>This would mean that the Unicode type would acquire all of the ambiquity >>currently associated with the 8-bit str type > > Not necessarily, as this kind of use is typically very localized. > Remember practicality beats purity. Has anyone considered that, depending on the implementation, a latin1-decoded unicode string could take 2-4 times as much memory? -- Greg From guido at python.org Wed Aug 8 02:11:53 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 17:11:53 -0700 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: <18104.33617.706079.853923@montanaro.dyndns.org> References: <18104.33617.706079.853923@montanaro.dyndns.org> Message-ID: Fixed now. This was OSX only due to an endianness issue; but the bug was universal: we were treating a unicode character using structmodule's T_CHAR. Since other similar fields of the dialect type were dealt with properly it seems this was merely an oversight. On 8/7/07, skip at pobox.com wrote: > test_csv got removed from the failing list after Guido applied Adam Hupp's > patch. (I checked in a small update for one thing Adam missed.) I'm still > getting test failures though: > > ====================================================================== > FAIL: test_reader_attrs (__main__.Test_Csv) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "Lib/test/test_csv.py", line 63, in test_reader_attrs > self._test_default_attrs(csv.reader, []) > File "Lib/test/test_csv.py", line 47, in _test_default_attrs > self.assertEqual(obj.dialect.delimiter, ',') > AssertionError: s'\x00' != ',' > > This same exception crops up six times. Maybe this isn't > str->unicode-related, but it sure seems like it to me. I spent some time > over the past few days trying to figure it out, but I struck out. > > Skip > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Wed Aug 8 02:26:22 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Aug 2007 12:26:22 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B89C5B.7090104@gmail.com> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B89C5B.7090104@gmail.com> Message-ID: <46B90DAE.8050506@canterbury.ac.nz> Nick Coghlan wrote: > I would > expect the situation to be the same as with sets - you'd use the mutable > version by default, unless there was some specific reason to want the > frozen version (usually because you want something that is hashable, or > easy to share safely amongst multiple clients). My instinct with regard to sets is the other way around, i.e. use immutable sets unless there's a reason they need to be mutable. The reason is safety -- accidentally trying to mutate an immutable object fails more quickly and obviously than the converse. If Python had had both mutable and immutable strings from the beginning, would you be giving the same advice, i.e. use mutable strings unless they need to be immutable? If not, what makes strings different from sets in this regard? -- Greg From greg.ewing at canterbury.ac.nz Wed Aug 8 02:57:35 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Aug 2007 12:57:35 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> Message-ID: <46B914FF.6030606@canterbury.ac.nz> Guido van Rossum wrote: > Currently the array module can be used for > this but I would like to get rid of it in favor of bytes and Travis > Oliphant's new buffer API I thought the plan was to *enhance* the array module so that it provides multi-dimensional arrays that support the new buffer protocol. If the plan is instead to axe it completely, then I'm disappointed. Bytes is only a replacement for array.array('B'), not any of the other types. -- Greg From greg.ewing at canterbury.ac.nz Wed Aug 8 03:14:47 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Aug 2007 13:14:47 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B8CD97.8020000@ronadam.com> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B66E7E.4060209@canterbury.ac.nz> <46B6A7E8.7040001@ronadam.com> <46B6C335.4080504@canterbury.ac.nz> <46B6DE80.2050000@ronadam.com> <46B7C369.3040509@canterbury.ac.nz> <46B7DEA6.5050609@ronadam.com> <46B83C7F.603@canterbury.ac.nz> <46B8CD97.8020000@ronadam.com> Message-ID: <46B91907.6030705@canterbury.ac.nz> Ron Adam wrote: > > Greg Ewing wrote: > >> Some types may recognise when they're being passed >> a format spec that belongs to another type, and try >> to convert themselves to that type (e.g. applying >> 'f' to an int or 'd' to a float). > > After thinking about this a bit more, I think specifiers don't have > types and don't belong to types. I agree - I was kind of speaking in shorthand there. What I really meant was that some types have some knowledge of format specifiers recognised by other types. E.g. int doesn't itself know how to format something using a spec that starts with 'f', but it knows that float *does* know, so it converts itself to a float and lets float handle it from there. If you were to pass an 'f' format to something that had no clue about it at all, e.g. a datetime, you would ultimately get an exception. And there's nothing stopping another type from recognising 'f' and doing something of its own with it that doesn't involve conversion to float (e.g. decimal). So the one-to-many mapping you mention is accommodated. -- Greg From greg.ewing at canterbury.ac.nz Wed Aug 8 01:53:42 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Aug 2007 11:53:42 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B6FEC5.9040503@gmail.com> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46B6FEC5.9040503@gmail.com> Message-ID: <46B90606.9070006@canterbury.ac.nz> Nick Coghlan wrote: > If __format__ receives both the alignment specifier and the format > specifier as arguments, My suggestion would be for it to receive the alignment spec pre-parsed, since apply_format has to at least partially parse it itself, and there doesn't seem to be anything gained by having *both* the format and alignment specs arbitrary, as anything type-specific can go in the format spec. So the alignment spec might as well have a fixed syntax. -- Greg From greg.ewing at canterbury.ac.nz Wed Aug 8 03:25:43 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Aug 2007 13:25:43 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B89C5B.7090104@gmail.com> Message-ID: <46B91B97.30603@canterbury.ac.nz> Guido van Rossum wrote: > That would imply that b"..." should return a mutable bytes object, > which many people have objected to. I'm still very uncomfortable about this. It's so completely unlike anything else in the language. I have a strong feeling that it is going to trip people up a lot, and end up being one of the Famous Warts To Be Fixed In Py4k. There's some evidence of this already in the way we're referring to it as a "bytes literal", when it's *not* actually a literal, but a constructor. Or at least it's a literal with an implied construction operation around it. > is the result of > concatenating a mutable and an immutable bytes object mutable? Does it > matter whether the mutable operand is first or second? Is a slice of > an immutable bytes array immutable itself? These are valid questions, but I don't think they're imponderable enough to be show-stoppers. With lists/tuples it's resolved by not allowing them to be concatenated with each other, but that's probably too restrictive here. My feeling is that like should produce like, and where there's a conflict, immutability should win. Mutable buffers tend to be used as an internal part of something else, such as an IOStream, and aren't exposed to the outside, or if they are, they're exposed in a read-only kind of way. So concatenating mutable and immutable should give the same result as concatenating two immutables, i.e. an immutable. If you need to add something to the end of your buffer, while keeping it mutable, you use extend(). This gives us immutable + immutable -> immutable mutable + immutable -> immutable immutable + mutable -> immutable mutable + mutable -> mutable (*) immutable[:] -> immutable mutable[:] -> mutable (*) (*) One might argue that these should be immutable, on the grounds of safety, but I think that would be too surprisingly different from the way other mutable sequence work. -- Greg From guido at python.org Wed Aug 8 03:52:58 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 18:52:58 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B914FF.6030606@canterbury.ac.nz> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> <46B914FF.6030606@canterbury.ac.nz> Message-ID: On 8/7/07, Greg Ewing wrote: > Guido van Rossum wrote: > > Currently the array module can be used for > > this but I would like to get rid of it in favor of bytes and Travis > > Oliphant's new buffer API > > I thought the plan was to *enhance* the array module > so that it provides multi-dimensional arrays that > support the new buffer protocol. > > If the plan is instead to axe it completely, then > I'm disappointed. Bytes is only a replacement for > array.array('B'), not any of the other types. I wouldn't ax it unless there was a replacement. But I'm not holding the replacement to any kind of compatibility with the old array module, and I expect it would more likely take the form of a wrapper around anything that supports the (new) buffer API, such as bytes. This would render the array module obsolete. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 8 04:01:35 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 19:01:35 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B90DAE.8050506@canterbury.ac.nz> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B89C5B.7090104@gmail.com> <46B90DAE.8050506@canterbury.ac.nz> Message-ID: On 8/7/07, Greg Ewing wrote: > My instinct with regard to sets is the other way around, > i.e. use immutable sets unless there's a reason they > need to be mutable. The reason is safety -- accidentally > trying to mutate an immutable object fails more quickly > and obviously than the converse. But this is impractical -- a very common way to work is to build up a set incrementally. With immutable sets this would quickly become O(N**2). That's why set() is mutable and {...} creates a set, and the only way to create an immutable set is to use frozenset(...). > If Python had had both mutable and immutable strings > from the beginning, would you be giving the same > advice, i.e. use mutable strings unless they need to > be immutable? If not, what makes strings different from > sets in this regard? That's easy. sets are mutable for the same reason lists are mutable -- lists are conceptually containers for "larger" amounts of data than strings. I don't adhere to the "let's just make copying really fast by using tricks like refcounting etc." school -- that was a pain in the B* for ABC. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From talin at acm.org Wed Aug 8 04:16:16 2007 From: talin at acm.org (Talin) Date: Tue, 07 Aug 2007 19:16:16 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> <46B7F869.6080007@v.loewis.de> <46B834F6.7050307@canterbury.ac.nz> <43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com> Message-ID: <46B92770.1080907@acm.org> Guido van Rossum wrote: > Assuming you weren't being sarcastic, array('B') and bytes() are very > close except bytes have a literal notation and many string-ish > methods. The buffer objects returned by the buffer() builtin provide a > read-only view on other objects that happen to have an internal > buffer, like strings, bytes, arrays, PIL images, and numpy arrays. > Lists of integers don't have the property that the other three share > which is that their C representation is a contiguous array of bytes > (char* in C). This representation is important because to do efficient > I/O in C you need char*. I've been following the discussion in a cursory way, and I can see that there is a lot of disagreement and confusion around the whole issue of mutable vs. immutable bytes. If it were up to me, and I was starting over from scratch, here's the design I would create: 1) I'd reserve the term 'bytes' to refer to an immutable byte string. 2) I would re-purpose the existing term 'buffer' to refer to a mutable, resizable byte buffer. Rationale: "buffer" has been used historically in many computer languages to refer to a mutable area of memory. The word 'bytes', on the other hand, seems to imply a *value* rather than a *location*, and values (like numbers) are generally considered immutable. 3) Both 'bytes' and 'buffer' would be derived from an abstract base class called ByteSequence. ByteSequence defines all of the read-only accessor methods common to both classes. 4) Literals of both types are available - using a prefix of small 'b' for bytes, and capitol B for 'buffer'. 5) Both 'bytes' and 'buffer' would support the 'buffer protocol', although in the former case it would be read-only. Other things which are not buffers could also support this protocol. 6) Library APIs that required a byte sequence would be written to test vs. the abstract ByteSequence type. 7) Both bytes and buffer objects would be inter-convertible using the appropriate constructors. -- Talin From shiblon at gmail.com Wed Aug 8 04:30:44 2007 From: shiblon at gmail.com (Chris Monson) Date: Tue, 7 Aug 2007 22:30:44 -0400 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B92770.1080907@acm.org> References: <46B637DD.7070905@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> <46B7F869.6080007@v.loewis.de> <46B834F6.7050307@canterbury.ac.nz> <43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com> <46B92770.1080907@acm.org> Message-ID: Wow. +1 for pure lucid reasoning. (Sorry for top-posting; blame the crackberry) On 8/7/07, Talin wrote: > Guido van Rossum wrote: > > > Assuming you weren't being sarcastic, array('B') and bytes() are very > > close except bytes have a literal notation and many string-ish > > methods. The buffer objects returned by the buffer() builtin provide a > > read-only view on other objects that happen to have an internal > > buffer, like strings, bytes, arrays, PIL images, and numpy arrays. > > Lists of integers don't have the property that the other three share > > which is that their C representation is a contiguous array of bytes > > (char* in C). This representation is important because to do efficient > > I/O in C you need char*. > > I've been following the discussion in a cursory way, and I can see that > there is a lot of disagreement and confusion around the whole issue of > mutable vs. immutable bytes. > > If it were up to me, and I was starting over from scratch, here's the > design I would create: > > 1) I'd reserve the term 'bytes' to refer to an immutable byte string. > > 2) I would re-purpose the existing term 'buffer' to refer to a mutable, > resizable byte buffer. > > Rationale: "buffer" has been used historically in many computer > languages to refer to a mutable area of memory. The word 'bytes', on the > other hand, seems to imply a *value* rather than a *location*, and > values (like numbers) are generally considered immutable. > > 3) Both 'bytes' and 'buffer' would be derived from an abstract base > class called ByteSequence. ByteSequence defines all of the read-only > accessor methods common to both classes. > > 4) Literals of both types are available - using a prefix of small 'b' > for bytes, and capitol B for 'buffer'. > > 5) Both 'bytes' and 'buffer' would support the 'buffer protocol', > although in the former case it would be read-only. Other things which > are not buffers could also support this protocol. > > 6) Library APIs that required a byte sequence would be written to test > vs. the abstract ByteSequence type. > > 7) Both bytes and buffer objects would be inter-convertible using the > appropriate constructors. > > -- Talin > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/shiblon%40gmail.com > From skip at pobox.com Wed Aug 8 04:49:52 2007 From: skip at pobox.com (skip at pobox.com) Date: Tue, 7 Aug 2007 21:49:52 -0500 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: <18104.33617.706079.853923@montanaro.dyndns.org> Message-ID: <18105.12112.271066.494963@montanaro.dyndns.org> Guido> Fixed now. This was OSX only due to an endianness issue; but the Guido> bug was universal: we were treating a unicode character using Guido> structmodule's T_CHAR. Since other similar fields of the dialect Guido> type were dealt with properly it seems this was merely an Guido> oversight. Thanks. I'm sure I would not have figured that out for quite awhile. Skip From ntoronto at cs.byu.edu Wed Aug 8 04:37:51 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Tue, 07 Aug 2007 20:37:51 -0600 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B91B97.30603@canterbury.ac.nz> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B89C5B.7090104@gmail.com> <46B91B97.30603@canterbury.ac.nz> Message-ID: <46B92C7F.3090500@cs.byu.edu> Greg Ewing wrote: > Guido van Rossum wrote: > >> That would imply that b"..." should return a mutable bytes object, >> which many people have objected to. >> > > I'm still very uncomfortable about this. It's so > completely unlike anything else in the language. > I have a strong feeling that it is going to trip > people up a lot, and end up being one of the > Famous Warts To Be Fixed In Py4k. > > There's some evidence of this already in the way > we're referring to it as a "bytes literal", when > it's *not* actually a literal, but a constructor. > Or at least it's a literal with an implied > construction operation around it. > Not only that, but it's the only *string prefix* that causes the interpreter to create and return a mutable object. It's not too late to go with Talin's suggestions (bytes = immutable, buffer = mutable), is it? I got warm fuzzies reading that. Neil From greg.ewing at canterbury.ac.nz Wed Aug 8 04:57:30 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Aug 2007 14:57:30 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B92770.1080907@acm.org> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> <46B7F869.6080007@v.loewis.de> <46B834F6.7050307@canterbury.ac.nz> <43aa6ff70708070922m645189dbvc744a1fbbbb88800@mail.gmail.com> <46B92770.1080907@acm.org> Message-ID: <46B9311A.7020908@canterbury.ac.nz> Talin wrote: > 4) Literals of both types are available - using a prefix of small 'b' > for bytes, and capitol B for 'buffer'. I don't see that it would be really necessary to have a distinct syntax for a buffer constructor (no literal!) because you could always write buffer(b"...") This is what it would have to be doing underneath anyway. -- Greg From rhamph at gmail.com Wed Aug 8 05:52:31 2007 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 7 Aug 2007 21:52:31 -0600 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B91B97.30603@canterbury.ac.nz> References: <46B637DD.7070905@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B89C5B.7090104@gmail.com> <46B91B97.30603@canterbury.ac.nz> Message-ID: On 8/7/07, Greg Ewing wrote: > So concatenating mutable and immutable should give the > same result as concatenating two immutables, i.e. an > immutable. If you need to add something to the end of > your buffer, while keeping it mutable, you use extend(). > > This gives us > > immutable + immutable -> immutable > mutable + immutable -> immutable > immutable + mutable -> immutable > mutable + mutable -> mutable (*) >>> () + [] Traceback (most recent call last): File "", line 1, in TypeError: can only concatenate tuple (not "list") to tuple Less confusing to prohibit concatenation of mismatched types. There's always trivial workarounds (ie () + tuple([]) or .extend()). -- Adam Olsen, aka Rhamphoryncus From martin at v.loewis.de Wed Aug 8 07:38:42 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 08 Aug 2007 07:38:42 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B908A8.8040605@canterbury.ac.nz> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B908A8.8040605@canterbury.ac.nz> Message-ID: <46B956E2.2040600@v.loewis.de> >>> This would mean that the Unicode type would acquire all of the ambiquity >>> currently associated with the 8-bit str type >> Not necessarily, as this kind of use is typically very localized. >> Remember practicality beats purity. > > Has anyone considered that, depending on the implementation, > a latin1-decoded unicode string could take 2-4 times as much > memory? I considered it, then ignored it. If you have the need for hashing, the string won't be long. Regards, Martin From martin at v.loewis.de Wed Aug 8 07:45:57 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 08 Aug 2007 07:45:57 +0200 Subject: [Python-3000] Binary compatibility In-Reply-To: References: <46B7EC3E.3070802@v.loewis.de> Message-ID: <46B95895.20705@v.loewis.de> >> Now, you seem to talk about different *Linux* systems. On Linux, >> use UCS-4. > > Yes, that's what we want. But Python 2.5 defaults to UCS-2 (at least > last time I tried), while many distros have used UCS-4. If Linux > always used UCS-4, that would be fine, but currently there's no > guarantee of that. I see why a guarantee would help, but I don't think it's necessary. Just provide UCS-4 binaries only on Linux, and when somebody complains, tell them to recompile Python, or to recompile your software themselves. The defaults in 2.5.x cannot be changed anymore. The defaults could be changed for Linux in 2.6, but then the question is: why just for Linux? Regards, Martin From nnorwitz at gmail.com Wed Aug 8 07:57:32 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Tue, 7 Aug 2007 22:57:32 -0700 Subject: [Python-3000] infinite recursion with python -v Message-ID: The wiki seems to be done, so sorry for the spam. python -v crashes due to infinite recursion (well, it tried to be infinite until it got a stack overflow :-) The problem seems to be that Lib/encodings/latin_1.py is loaded, but it tries to be converted to latin_1, so it tries to load the module, and ... Or something like that. See below for a call stack. Minimal version: PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches Lib/encodings/latin_1.py\n", f=) at Objects/fileobject.c:184 mywrite (name= "stderr", fp=, format= "# %s matches %s\n", va=) at Python/sysmodule.c:1350 PySys_WriteStderr (format= "# %s matches %s\n") at Python/sysmodule.c:1380 check_compiled_module (pathname= "Lib/encodings/latin_1.py", mtime=, cpathname= "Lib/encodings/latin_1.pyc") at Python/import.c:755 load_source_module (name= "encodings.latin_1", pathname= "Lib/encodings/latin_1.py", fp=) at Python/import.c:938 load_module (name= "encodings.latin_1", fp=,buf= "Lib/encodings/latin_1.py", type=1, loader=) at Python/import.c:1733 import_submodule (mod=, subname= "latin_1",fullname= "encodings.latin_1") at Python/import.c:2418 load_next (mod=,altmod=, p_name=,buf= "encodings.latin_1", p_buflen=) at Python/import.c:2213 import_module_level (name=, globals=, locals=, fromlist=, level=0) at Python/import.c:1992 PyImport_ImportModuleLevel (name= "encodings.latin_1", globals=, locals=, fromlist=, level=0) at Python/import.c:2056 builtin___import__ () at Python/bltinmodule.c:151 [...] _PyCodec_Lookup (encoding= "latin-1") at Python/codecs.c:147 codec_getitem (encoding= "latin-1",index=0) at Python/codecs.c:211 PyCodec_Encoder (encoding= "latin-1") at Python/codecs.c:275 PyCodec_Encode (object=,encoding= "latin-1", errors=) at Python/codecs.c:322 PyString_AsEncodedObject (str=,encoding= "latin-1", errors=) at Objects/stringobject.c:459 string_encode () at Objects/stringobject.c:3138 [...] PyFile_WriteObject (v=, f=, flags=1) at Objects/fileobject.c:159 PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches Lib/encodings/latin_1.py\n",f=) at Objects/fileobject.c:184 == Stack trace for python -v recursion (argument values are mostly trimmed) == PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches Lib/encodings/latin_1.py\n", f=) at Objects/fileobject.c:184 mywrite (name= "stderr", fp=, format= "# %s matches %s\n", va=) at Python/sysmodule.c:1350 PySys_WriteStderr (format= "# %s matches %s\n") at Python/sysmodule.c:1380 check_compiled_module (pathname= "Lib/encodings/latin_1.py", mtime=, cpathname= "Lib/encodings/latin_1.pyc") at Python/import.c:755 load_source_module (name= "encodings.latin_1", pathname= "Lib/encodings/latin_1.py", fp=) at Python/import.c:938 load_module (name= "encodings.latin_1", fp=,buf= "Lib/encodings/latin_1.py", type=1, loader=) at Python/import.c:1733 import_submodule (mod=, subname= "latin_1",fullname= "encodings.latin_1") at Python/import.c:2418 load_next (mod=,altmod=, p_name=,buf= "encodings.latin_1", p_buflen=) at Python/import.c:2213 import_module_level (name=, globals=, locals=, fromlist=, level=0) at Python/import.c:1992 PyImport_ImportModuleLevel (name= "encodings.latin_1", globals=, locals=, fromlist=, level=0) at Python/import.c:2056 builtin___import__ () at Python/bltinmodule.c:151 PyCFunction_Call () at Objects/methodobject.c:77 PyObject_Call () at Objects/abstract.c:1736 do_call () at Python/ceval.c:3764 call_function (pp_stack=, oparg=513) at Python/ceval.c:3574 PyEval_EvalFrameEx (f=, throwflag=0) at Python/ceval.c:2216 PyEval_EvalCodeEx () at Python/ceval.c:2835 function_call () at Objects/funcobject.c:634 PyObject_Call () at Objects/abstract.c:1736 PyEval_CallObjectWithKeywords () at Python/ceval.c:3431 _PyCodec_Lookup (encoding= "latin-1") at Python/codecs.c:147 codec_getitem (encoding= "latin-1",index=0) at Python/codecs.c:211 PyCodec_Encoder (encoding= "latin-1") at Python/codecs.c:275 PyCodec_Encode (object=,encoding= "latin-1", errors=) at Python/codecs.c:322 PyString_AsEncodedObject (str=,encoding= "latin-1", errors=) at Objects/stringobject.c:459 string_encode () at Objects/stringobject.c:3138 PyCFunction_Call () at Objects/methodobject.c:73 call_function () at Python/ceval.c:3551 PyEval_EvalFrameEx (f=, throwflag=0) at Python/ceval.c:2216 PyEval_EvalCodeEx () at Python/ceval.c:2835 function_call () at Objects/funcobject.c:634 PyObject_Call () at Objects/abstract.c:1736 method_call () at Objects/classobject.c:397 PyObject_Call () at Objects/abstract.c:1736 PyEval_CallObjectWithKeywords () at Python/ceval.c:3431 PyFile_WriteObject (v=, f=, flags=1) at Objects/fileobject.c:159 PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches Lib/encodings/latin_1.py\n",f=) at Objects/fileobject.c:184 From g.brandl at gmx.net Wed Aug 8 08:38:11 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 08 Aug 2007 08:38:11 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B89C5B.7090104@gmail.com> Message-ID: Jim Jewett schrieb: > On 8/7/07, Guido van Rossum wrote: > >> If b"..." is immutable, the >> immutable bytes type is in your face all the time and you'll have to >> deal with the difference all the time. > > There is a conceptual difference between the main use cases for > mutable (a buffer) and the main use cases for immutable (a protocol > constant). > > I'm not sure why you would need a literal for the mutable version. > How often do you create a new buffer with initial values? (Note: not > pointing to existing memory; creating a new one.) The same reason that you might create empty lists or dicts: to fill them. >> E.g. is the result of >> concatenating a mutable and an immutable bytes object mutable? >> Does it matter whether the mutable operand is first or second? > > I would say immutable; you're taking a snapshot. (I would have some > sympathy for taking the type of the first operand, but then you need > to worry about + vs +=, and whether the start of the new object will > notice later state changes.) But what about mutable = mutable + immutable mutable += immutable I'd expect it to stay mutable in both cases. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Wed Aug 8 08:40:02 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 08 Aug 2007 08:40:02 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B91B97.30603@canterbury.ac.nz> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B89C5B.7090104@gmail.com> <46B91B97.30603@canterbury.ac.nz> Message-ID: Greg Ewing schrieb: > So concatenating mutable and immutable should give the > same result as concatenating two immutables, i.e. an > immutable. If you need to add something to the end of > your buffer, while keeping it mutable, you use extend(). > > This gives us > > immutable + immutable -> immutable > mutable + immutable -> immutable > immutable + mutable -> immutable > mutable + mutable -> mutable (*) NB: when dealing with sets and frozensets, you get the type of the first operand. Doing something different here is confusing. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From martin at v.loewis.de Wed Aug 8 09:04:45 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 08 Aug 2007 09:04:45 +0200 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: Message-ID: <46B96B0D.6080605@v.loewis.de> >> It's in Modules/timemodule.c, line 691: >> PyModule_AddObject(m, "tzname", >> Py_BuildValue("(zz)", tzname[0], tzname[1])); >> >> According to MSDN, tzname is a global variable; the contents is somehow >> derived from the TZ environment variable (which is not set in my case). > > Is there anything from which you can guess the encoding (e.g. the > filesystem encoding?). It's in the locale's encoding. On Windows, that will be "mbcs"; on other systems, the timezone names are typically all in ASCII - this would allow for a quick work-around. Using the filesytemencoding would also work, although it would be an equal hack: it's *meant* to be used only for file names (and on OSX at least, it deviates from the locale's encoding - although I have no idea what tzname is encoded in on OSX). > These are all externally-provided strings. It will depend on the > platform what the encoding is. > > I wonder if we need to add another format code to Py_BuildValue (and > its friends) to designate "platform default encoding" instead of > UTF-8. For symmetry with ParseTuple, there could be the 'e' versions (es, ez, ...) which would take a codec name also. "platform default encoding" is a tricky concept, of course: Windows alone has two of them on each installation. Regards, Martin From jyasskin at gmail.com Wed Aug 8 09:57:05 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Wed, 8 Aug 2007 00:57:05 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B80F8A.7060906@v.loewis.de> References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B79815.1030504@v.loewis.de> <5d44f72f0708062126u1853d99fx500990f91841e6b6@mail.gmail.com> <46B7F869.6080007@v.loewis.de> <5d44f72f0708062245k32e79de4s4b545c59a974612a@mail.gmail.com> <46B80F8A.7060906@v.loewis.de> Message-ID: <5d44f72f0708080057u16e72916rf3bf369ea2889859@mail.gmail.com> I agree completely with Talin's suggestion for the arrangement of the mutable and immutable alternatives, but there are a couple points here that I wanted to answer. On 8/6/07, "Martin v. L?wis" wrote: > > For low-level I/O code, I totally agree that a mutable buffery object > > is needed. > > The code we are looking at right now (dbm interfaces) *is* low-level > I/O code. But you want an immutable interface to it that looks like a dict. I think that's entirely appropriate because the underlying C code is the real low-level I/O code, while the Python wrapper is actually pretty high-level. > > For example, to support re-using bytes buffers, socket.send() > > would need to take start and end offsets into its bytes argument. > > Otherwise, you have to slice the object to select the right data, > > which *because bytes are mutable* requires a copy. PEP 3116's .write() > > method has the same problem. Making those changes is, of course, > > doable, but it seems like something that should be consciously > > committed to. > > Sure. There are several ways to do that, including producing view > objects - which would be possible even though the underlying buffer > is mutable; the view would then be just as mutable. > > > Python 2 seems to have gotten away with doing all the buffery stuff in > > C. Is there a reason Python 3 shouldn't do the same? > > I think Python 2 has demonstrated that this doesn't really work. People > repeatedly did += on strings (leading to quadratic performance), This argues for mutable strings at least as much as it argues for mutable high-level bytes. Now that they exist, generators are a pretty natural way to build up immutable objects, so people certainly have the option to avoid quadratic performance whatever the mutability of their objects. > invented the buffer interface (which is semantically flawed), added > direct support for mmap, and so on. And those still exist in Python 3 (perhaps in an updated form). A mutable bytes doesn't obsolete them. It may be a handy concrete type for the buffer interface, but then so is array. > > me: [benchmarks showing 10% faster construction] [Probably this just means that something hasn't been optimized enough on Intel Macs] > Martin: [same benchmarks showing 10% faster copying] I'd really say it's the same result (and shouldn't have claimed otherwise in my email. Sorry). A 10% difference either way is likely to be dwarfed by the costs of actually doing I/O. Before picking interfaces around the notion that either allocation or copying is expensive, it would be wise to run benchmarks to figure out what the performance actually looks like. On 8/7/07, Guido van Rossum wrote: > [list()] would not work with low-level I/O (sometimes readinto() is useful) When is "sometimes"? Is it the same times that rewriting into C would be a good idea? I'd really like to see any benchmarks people have written to decide this. In any case, the obvious thing to do may well be different when you're writing performance-critical code and when you're writing code that just needs to be readable. I haven't seen any such distinguishing circumstance for the various hashing techniques. On 8/7/07, Guido van Rossum wrote: > On 8/6/07, Jeffrey Yasskin wrote: > ...why are you waiting > > for a show-stopper that requires an immutable bytes type rather than > > one that requires a mutable one? > > Well one reason of course is that we currently have a mutable bytes > object and that it works well in most situations. The status quo argument must be weaker given that bytes hasn't existed in any released Python. I was really asking why you picked mutable as the first type to experiment with, and I guess I/O is the answer to that, although it seems to me like a case of the tail wagging the dog. On 8/7/07, Greg Ewing wrote: > Jeffrey Yasskin wrote: > > If you have mutable bytes and need an > > immutable object, you could 1) convert it to an int (probably > > big-endian), > > That's not a reversible transformation, because you lose > information about leading zero bits. Good point. You'd need a length along with the data, unless you're dealing with a fixed-length thing like 4CCs. This is still probably among the most efficient representations. On 8/7/07, Guido van Rossum wrote: > But this is impractical -- a very common way to work is to build up a > set incrementally. With immutable sets this would quickly become > O(N**2). That's why set() is mutable and {...} creates a set, and the > only way to create an immutable set is to use frozenset(...). I would probably default to constructing an immutable set with a generator. If I needed to do something more complicated, I'd fall back to a mutable set. Of course, making the name for the immutable version 3 times as long biases the language toward the mutable version. -- Namast?, Jeffrey Yasskin From greg.ewing at canterbury.ac.nz Wed Aug 8 12:57:24 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Aug 2007 22:57:24 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B89C5B.7090104@gmail.com> <46B91B97.30603@canterbury.ac.nz> Message-ID: <46B9A194.90009@canterbury.ac.nz> Adam Olsen wrote: > Less confusing to prohibit concatenation of mismatched types. There's > always trivial workarounds (ie () + tuple([]) or .extend()). Normally I would agree, but in this case I feel that it would be inconvenient. With the scheme I proposed, code that treats bytes as read-only doesn't have to care whether it has a mutable or immutable object. If they were as rigidly separated as lists and tuples, every API would have to be strictly aware of whether it dealt with mutable or immutable bytes. I could be wrong, though. It may turn out that keeping them separate is the right thing to do. -- Greg From greg.ewing at canterbury.ac.nz Wed Aug 8 13:03:15 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Aug 2007 23:03:15 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B89C5B.7090104@gmail.com> Message-ID: <46B9A2F3.9050305@canterbury.ac.nz> Georg Brandl wrote: > mutable = mutable + immutable > > mutable += immutable I wouldn't have a problem with these being different. They're already different with list + tuple (although in that case, one of them is disallowed). -- Greg From greg.ewing at canterbury.ac.nz Wed Aug 8 13:04:58 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Aug 2007 23:04:58 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B78F86.9000505@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B89C5B.7090104@gmail.com> <46B91B97.30603@canterbury.ac.nz> Message-ID: <46B9A35A.7030406@canterbury.ac.nz> Georg Brandl wrote: > NB: when dealing with sets and frozensets, you get the type of > the first operand. Doing something different here is confusing. Hmmm, I don't think I would have designed it that way. I might be willing to go along with that precedent, though. -- Greg From victor.stinner at haypocalc.com Wed Aug 8 18:14:05 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 8 Aug 2007 18:14:05 +0200 Subject: [Python-3000] py3k-struni: proposition to fix ctypes bug, ctypes c_char creates bytes Message-ID: <200708081814.05073.victor.stinner@haypocalc.com> Hi, I hear Guido's request to fix last py3k-struni bugs. I downloaded subversion trunk and started to work on ctypes tests. The problem is that ctypes c_char (and c_char_p) creates unicode string instead of byte string. I attached a proposition (patch) to change this behaviour (use bytes for c_char). So in next example, it will display 'bytes' and not 'str': from ctypes import c_buffer, c_char buf = c_buffer("abcdef") print (type(buf[0])) Other behaviour changes: - repr(c_char) adds a "b" eg. repr(c_char('x')) is "c_char(b'x')" instead of "c_char('x')" - bytes is mutable whereas str is not: this may break some modules based on ctypes Victor Stinner aka haypo http://hachoir.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: py3k-struni-ctypes.diff Type: text/x-diff Size: 4992 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070808/d4bfd476/attachment.bin From guido at python.org Wed Aug 8 18:45:38 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 8 Aug 2007 09:45:38 -0700 Subject: [Python-3000] py3k-struni: proposition to fix ctypes bug, ctypes c_char creates bytes In-Reply-To: <200708081814.05073.victor.stinner@haypocalc.com> References: <200708081814.05073.victor.stinner@haypocalc.com> Message-ID: Thanks! Would you mind submitting to SF and assigning to Thomas Heller (theller I think)? And update the wiki (http://wiki.python.org/moin/Py3kStrUniTests) On 8/8/07, Victor Stinner wrote: > Hi, > > I hear Guido's request to fix last py3k-struni bugs. I downloaded subversion > trunk and started to work on ctypes tests. > > The problem is that ctypes c_char (and c_char_p) creates unicode string > instead of byte string. I attached a proposition (patch) to change this > behaviour (use bytes for c_char). > > So in next example, it will display 'bytes' and not 'str': > from ctypes import c_buffer, c_char > buf = c_buffer("abcdef") > print (type(buf[0])) > > Other behaviour changes: > - repr(c_char) adds a "b" > eg. repr(c_char('x')) is "c_char(b'x')" instead of "c_char('x')" > - bytes is mutable whereas str is not: > this may break some modules based on ctypes > > Victor Stinner aka haypo > http://hachoir.org/ > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From theller at ctypes.org Wed Aug 8 20:32:48 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 08 Aug 2007 20:32:48 +0200 Subject: [Python-3000] py3k-struni: proposition to fix ctypes bug, ctypes c_char creates bytes In-Reply-To: <200708081814.05073.victor.stinner@haypocalc.com> References: <200708081814.05073.victor.stinner@haypocalc.com> Message-ID: Victor Stinner schrieb: > Hi, > > I hear Guido's request to fix last py3k-struni bugs. I downloaded subversion > trunk and started to work on ctypes tests. > > The problem is that ctypes c_char (and c_char_p) creates unicode string > instead of byte string. I attached a proposition (patch) to change this > behaviour (use bytes for c_char). > > So in next example, it will display 'bytes' and not 'str': > from ctypes import c_buffer, c_char > buf = c_buffer("abcdef") > print (type(buf[0])) > > Other behaviour changes: > - repr(c_char) adds a "b" > eg. repr(c_char('x')) is "c_char(b'x')" instead of "c_char('x')" > - bytes is mutable whereas str is not: > this may break some modules based on ctypes This patch looks correct. I will test it and then commit if all works well. The problem I had fixing this is that I was not sure whether the c_char types should 'contain' bytes objects or str8 objects. str8 will be going away, so it seems the decision is clear. OTOH, I'm a little bit confused about the bytes type. I think this behaviour is a little bit confusing, but maybe that's just me: >>> b"abc"[:] b'abc' >>> b"abc"[:1] b'a' >>> b"abc"[1] 98 >>> b"abc"[1] = 42 >>> b"abc"[1] = "f" Traceback (most recent call last): File "", line 1, in TypeError: 'str' object cannot be interpreted as an integer >>> b"abc"[1] = b"f" Traceback (most recent call last): File "", line 1, in TypeError: 'bytes' object cannot be interpreted as an integer >>> Especially confusing is that the repr of a bytes object looks like a string, but bytes do not contain characters but integers instead. Thomas From guido at python.org Wed Aug 8 20:40:52 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 8 Aug 2007 11:40:52 -0700 Subject: [Python-3000] py3k-struni: proposition to fix ctypes bug, ctypes c_char creates bytes In-Reply-To: References: <200708081814.05073.victor.stinner@haypocalc.com> Message-ID: On 8/8/07, Thomas Heller wrote: > OTOH, I'm a little bit confused about the bytes type. I think this behaviour > is a little bit confusing, but maybe that's just me: > > >>> b"abc"[:] > b'abc' > >>> b"abc"[:1] > b'a' > >>> b"abc"[1] > 98 > >>> b"abc"[1] = 42 > >>> b"abc"[1] = "f" > Traceback (most recent call last): > File "", line 1, in > TypeError: 'str' object cannot be interpreted as an integer > >>> b"abc"[1] = b"f" > Traceback (most recent call last): > File "", line 1, in > TypeError: 'bytes' object cannot be interpreted as an integer > >>> > > Especially confusing is that the repr of a bytes object looks like a string, > but bytes do not contain characters but integers instead. I hope you can get used to it. This design is a bit of a compromise -- conceptually, bytes really contain small unsigned integers in [0, 256), but in order to be maximally useful, the bytes literal (and hence the bytes repr()) shows printable ASCII characters as themselves (controls and non-ASCII are shown as \xXX). This is more compact too. PBP! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Wed Aug 8 20:41:33 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 08 Aug 2007 20:41:33 +0200 Subject: [Python-3000] [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: References: Message-ID: <46BA0E5D.60109@v.loewis.de> > My second one is about Unicode. I really, but REALLY regard it as > a serious defect that there is no escape for printing characters. > Any code that checks arbitrary text is likely to need them - yes, > I know why Perl and hence PCRE doesn't have that, but let's skip > that. That is easy to add, though choosing a letter is tricky. > Currently \c and \C, for 'character' (I would prefer 'text' or > 'printable', but \t is obviously insane and \P is asking for > incompatibility with Perl and Java). Before discussing the escape, I'd like to see a specification of it first - what characters precisely would classify as "printing"? > But attempting to rebuild the Unicode database hasn't worked. > Tools/unicode is, er, a trifle incomplete and out of date. The > only file I need to change is Objects/unicodetype_db.h, but the > init attempts to run Tools/unicode/makeunicodedata.py have not > been successful. > > I may be able to reverse engineer the mechanism enough to get > the files off the Unicode site and run it, but I don't want to > spend forever on it. Any clues? I see that you managed to do something here, so I'm not sure what kind of help you still need. Regards, Martin From mike.klaas at gmail.com Wed Aug 8 20:56:58 2007 From: mike.klaas at gmail.com (Mike Klaas) Date: Wed, 8 Aug 2007 11:56:58 -0700 Subject: [Python-3000] [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: References: Message-ID: <9CBC9283-52BF-48AB-A39F-0DAE0E4EAFAE@gmail.com> On 8-Aug-07, at 2:28 AM, Nick Maclaren wrote: > I have needed to push my stack to teach REs (don't ask), and am > taking a look at the RE code. I may be able to extend it to support > RFE 694374 and (more importantly) atomic groups and possessive > quantifiers. While I regard such things as revolting beyond belief, > they make a HELL of a difference to the efficiency of recognising > things like HTML tags in a morass of mixed text. +1. I would use such a feature. > The other approach, which is to stick to true regular expressions, > and wholly or partially convert to DFAs, has already been rendered > impossible by even the limited Perl/PCRE extensions that Python > has adopted. Impossible? Surely, a sufficiently-competent re engine could detect when a DFA is possible to construct? -Mike From lists at cheimes.de Wed Aug 8 21:44:59 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 08 Aug 2007 21:44:59 +0200 Subject: [Python-3000] Removal of cStringIO and StringIO module Message-ID: I've spent some free time today to work on a patch that removes cStringIO and StringIO from the py3k-struni branch. The patch is available at http://www.python.org/sf/1770008 It adds a deprecation warning to StringIO.py and a facade cStringIO.py. Both modules act as an alias for io.StringIO. You may remove the files. I didn't noticed that 2to3 has a fixer for cStringIO and StringIO. But the files may be useful because the fixer doesn't fix doc tests. Some unit tests are failing because I don't know how to handle StringIO(buffer()). Georg Brandl suggested to use io.BytesIO but that doesn't work correctly. Christian From victor.stinner at haypocalc.com Wed Aug 8 21:56:11 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 8 Aug 2007 21:56:11 +0200 Subject: [Python-3000] py3k-struni: proposition to fix ctypes bug, ctypes c_char creates bytes In-Reply-To: References: <200708081814.05073.victor.stinner@haypocalc.com> Message-ID: <200708082156.11985.victor.stinner@haypocalc.com> On Wednesday 08 August 2007 18:45:38 you wrote: > Thanks! Would you mind submitting to SF and assigning to Thomas Heller > (theller I think)? > > And update the wiki (http://wiki.python.org/moin/Py3kStrUniTests) Thomas Heller did it. Thanks ;-) Victor Stinner aka haypo http://hachoir.org/ From brett at python.org Wed Aug 8 22:07:54 2007 From: brett at python.org (Brett Cannon) Date: Wed, 8 Aug 2007 13:07:54 -0700 Subject: [Python-3000] Removal of cStringIO and StringIO module In-Reply-To: References: Message-ID: On 8/8/07, Christian Heimes wrote: > I've spent some free time today to work on a patch that removes > cStringIO and StringIO from the py3k-struni branch. The patch is > available at http://www.python.org/sf/1770008 > Thanks for the work! And with Alexandre's Summer of Code project to have a C version of io.StringIO that is used transparently that should work out well! > It adds a deprecation warning to StringIO.py and a facade cStringIO.py. > Both modules act as an alias for io.StringIO. You may remove the files. > I didn't noticed that 2to3 has a fixer for cStringIO and StringIO. But > the files may be useful because the fixer doesn't fix doc tests. > Deprecation warnings for modules that are going away has not been handled yet. Stdlib stuff is on the table for after 3.0a1. Chances are the stdlib deprecations will be a 2.6 thing and 3.0 won't have any since we expect people to go 2.6 -> 3.0, not the other way around. There will be 2to3 fixers for the imports, but not until we tackle the stdlib and its cleanup. > Some unit tests are failing because I don't know how to handle > StringIO(buffer()). Georg Brandl suggested to use io.BytesIO but that > doesn't work correctly. Really? I did that in a couple of places and it worked for me. What's the problem specifically? -Brett From jimjjewett at gmail.com Wed Aug 8 22:50:14 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 8 Aug 2007 16:50:14 -0400 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B637DD.7070905@v.loewis.de> <46B82B88.5000804@canterbury.ac.nz> <46B866A0.2040800@gmail.com> <46B89C5B.7090104@gmail.com> Message-ID: On 8/8/07, Georg Brandl wrote: > Jim Jewett schrieb: > > I'm not sure why you would need a literal for the mutable version. > > How often do you create a new buffer with initial values? (Note: not > > pointing to existing memory; creating a new one.) > The same reason that you might create empty lists or dicts: to fill them. Let me rephrase that -- how often do you create new non-empty buffers? The equivalent of a empty list or dict is buffer(). If you really want to save keystrokes, call it buf(). The question is whether we really need to abbreviate >>> mybuf = buffer("abcde") as >>> mybuf = b"abcde" I would say leave literal syntax for the immutable type. (And other than this nit, I also lend my support to Talin's suggestion.) -jJ From guido at python.org Wed Aug 8 23:19:24 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 8 Aug 2007 14:19:24 -0700 Subject: [Python-3000] Moving to a "py3k" branch soon Message-ID: I would like to move to a new branch soon for all Py3k development. I plan to name the branch "py3k". It will be branched from py3k-struni. I will do one last set of merges from the trunk via p3yk (note typo!) and py3k-struni, and then I will *delete* the old py3k and py3k-struni branches (you will still be able to access their last known good status by syncing back to a previous revision). I will temporarily shut up some unit tests to avoid getting endless spam from Neal's buildbot. After the switch, you should be able to switch your workspaces to the new branch using the "svn switch" command. If anyone is in the middle of something that would become painful due to this changeover, let me know ASAP and I'll delay. I will send out another message when I start the move, and another when I finish it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Wed Aug 8 23:52:58 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Wed, 8 Aug 2007 14:52:58 -0700 Subject: [Python-3000] C API cleanup str In-Reply-To: <46B5FA11.5040404@v.loewis.de> References: <46B2C8E0.8080409@canterbury.ac.nz> <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> <46B5FA11.5040404@v.loewis.de> Message-ID: On 8/5/07, "Martin v. L?wis" wrote: > >> I agree. We should specify that somewhere, so we have a recorded > >> guideline to use in case of doubt. > > > > But where? Time to start a PEP for the C API perhaps? > > I would put it into the API documentation. We can put a daily-generated > version of the documentation online, just as the trunk documentation is > updated daily. That's already been done for a while: http://docs.python.org/dev/3.0/ It's even updated every 12 hours. :-) n From lists at cheimes.de Thu Aug 9 00:22:51 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 09 Aug 2007 00:22:51 +0200 Subject: [Python-3000] tp_bytes and __bytes__ magic method Message-ID: Hey Pythonistas! Victor Stinner just made a good point at #python. The py3k has no magic method and type slot for bytes. Python has magic methods like __int__ for int(ob) and __str__ for str(ob). Are you considering to add a __bytes__ method and tp_bytes? I can think of a bunch of use cases for a magic method. Christian From victor.stinner at haypocalc.com Thu Aug 9 00:49:30 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 9 Aug 2007 00:49:30 +0200 Subject: [Python-3000] tp_bytes and __bytes__ magic method In-Reply-To: References: Message-ID: <200708090049.30405.victor.stinner@haypocalc.com> On Thursday 09 August 2007 00:22:51 Christian Heimes wrote: > Hey Pythonistas! > > Victor Stinner just made a good point at #python. The py3k has no magic > method and type slot for bytes. And another problem: mix of __str__ and __unicode__ methods. class A: def __str__(self): return '__str__' class B: def __str__(self): return '__str__' def __unicode__(self): return '__unicode__' print (repr(str( A() ))) # display '__str__' print (repr(str( B() ))) # display '__unicode__' Proposition: __str__() -> str (2.x) becomes __bytes__() -> bytes (3000) __unicode__() -> unicode (2.x) becomes __str__() -> str (3000) Victor Stinner aka haypo From guido at python.org Thu Aug 9 00:54:47 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 8 Aug 2007 15:54:47 -0700 Subject: [Python-3000] tp_bytes and __bytes__ magic method In-Reply-To: References: Message-ID: On 8/8/07, Christian Heimes wrote: > Victor Stinner just made a good point at #python. The py3k has no magic > method and type slot for bytes. Python has magic methods like __int__ > for int(ob) and __str__ for str(ob). Are you considering to add a > __bytes__ method and tp_bytes? Never occurred to me. The intention is that bytes() has a fixed signature. It's far less important than str(). __int__() is different, numeric types must be convertible. > I can think of a bunch of use cases for a magic method. Such as? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Aug 9 00:55:40 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 8 Aug 2007 15:55:40 -0700 Subject: [Python-3000] tp_bytes and __bytes__ magic method In-Reply-To: <200708090049.30405.victor.stinner@haypocalc.com> References: <200708090049.30405.victor.stinner@haypocalc.com> Message-ID: The plan is to kill __unicode__ and only use __str__. But we're not quite there yet. On 8/8/07, Victor Stinner wrote: > On Thursday 09 August 2007 00:22:51 Christian Heimes wrote: > > Hey Pythonistas! > > > > Victor Stinner just made a good point at #python. The py3k has no magic > > method and type slot for bytes. > > And another problem: mix of __str__ and __unicode__ methods. > > class A: > def __str__(self): return '__str__' > > class B: > def __str__(self): return '__str__' > def __unicode__(self): return '__unicode__' > > print (repr(str( A() ))) # display '__str__' > print (repr(str( B() ))) # display '__unicode__' > > > Proposition: > > __str__() -> str (2.x) becomes __bytes__() -> bytes (3000) > __unicode__() -> unicode (2.x) becomes __str__() -> str (3000) > > Victor Stinner aka haypo > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From victor.stinner at haypocalc.com Thu Aug 9 01:13:35 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 9 Aug 2007 01:13:35 +0200 Subject: [Python-3000] tp_bytes and __bytes__ magic method In-Reply-To: References: Message-ID: <200708090113.35900.victor.stinner@haypocalc.com> On Thursday 09 August 2007 00:54:47 Guido van Rossum wrote: > On 8/8/07, Christian Heimes wrote: > > Victor Stinner just made a good point at #python. The py3k has no magic > > method and type slot for bytes (...) > > I can think of a bunch of use cases for a magic method. > > Such as? I'm writting on email module and I guess that some __str__ methods should return bytes instead of str (and so should be renamed to __bytes__). Maybe the one of Message class (Lib/email/message.py). Victor Stinner aka haypo http://hachoir.org/ From lists at cheimes.de Thu Aug 9 01:49:35 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 09 Aug 2007 01:49:35 +0200 Subject: [Python-3000] tp_bytes and __bytes__ magic method In-Reply-To: References: Message-ID: Guido van Rossum wrote: >> I can think of a bunch of use cases for a magic method. > > Such as? The __bytes__ method could be used to implement a byte representation of an arbitrary object. The byte representation can then be used to submit the object over wire or dump it into a file. In Python 2.x I could overwrite __str__ to send an object over a socket but in Python 3k str() returns a unicode object that can't be transmitted over sockets. Sockets support bytes only. Christian From victor.stinner at haypocalc.com Thu Aug 9 01:59:46 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 9 Aug 2007 01:59:46 +0200 Subject: [Python-3000] fix email module for bytes/str Message-ID: <200708090159.46301.victor.stinner@haypocalc.com> Hi, I started to work on email module, but I have trouble to understand if a function should returns bytes or str (because I don't know email module). Header.encode() -> bytes? Message.as_string() -> bytes? decode_header() -> list of (bytes, str|None) or (str, str|None)? base64MIME.encode() -> bytes? message_from_string() <- bytes? Message.get_payload() -> bytes or str? A charset name type is str, right? --------------- Things to change to get bytes: - replace StringIO with BytesIO - add 'b' prefix, eg. '' becomes b'' - replace "%s=%s" % (x, y) with b''.join((x, b'=', y)) => is it the best method to concatenate bytes? Problems (to port python 2.x code to 3000): - When obj.lower() is used, I expect obj to be str but it's bytes - obj.strip() doesn't work when obj is a byte, it requires an argument but I don't know the right value! Maybe b'\n\r\v\t '? - iterate on a bytes object gives number and not bytes object, eg. for c in b"small text": if re.match("(\n|\r)", c): ... Is it possible to 'bytes' regex? re.compile(b"x") raise an exception -- Victor Stinner aka haypo http://hachoir.org/ From guido at python.org Thu Aug 9 02:00:44 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 8 Aug 2007 17:00:44 -0700 Subject: [Python-3000] tp_bytes and __bytes__ magic method In-Reply-To: References: Message-ID: > On Thursday 09 August 2007 00:54:47 Guido van Rossum wrote: > > On 8/8/07, Christian Heimes wrote: > > > Victor Stinner just made a good point at #python. The py3k has no magic > > > method and type slot for bytes (...) > > > I can think of a bunch of use cases for a magic method. > > > > Such as? On 8/8/07, Victor Stinner wrote: > I'm writting on email module and I guess that some __str__ methods should > return bytes instead of str (and so should be renamed to __bytes__). Maybe > the one of Message class (Lib/email/message.py). On 8/8/07, Christian Heimes wrote: > The __bytes__ method could be used to implement a byte representation of > an arbitrary object. The byte representation can then be used to submit > the object over wire or dump it into a file. In Python 2.x I could > overwrite __str__ to send an object over a socket but in Python 3k str() > returns a unicode object that can't be transmitted over sockets. Sockets > support bytes only. This could just as well be done using a method on that specific object. I don't think having to write x.as_bytes() is worse than bytes(x), *unless* there are contexts where it's important to convert something to bytes without knowing what kind of thing it is. For str(), such a context exists: print(). For bytes(), I'm not so sure. The use cases given here seem to be either very specific to a certain class, or could be solved using other generic APIs like pickling. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Aug 9 02:01:14 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 8 Aug 2007 17:01:14 -0700 Subject: [Python-3000] fix email module for bytes/str In-Reply-To: <200708090159.46301.victor.stinner@haypocalc.com> References: <200708090159.46301.victor.stinner@haypocalc.com> Message-ID: You might want to send this to the email-sig. On 8/8/07, Victor Stinner wrote: > Hi, > > I started to work on email module, but I have trouble to understand if a > function should returns bytes or str (because I don't know email module). > > Header.encode() -> bytes? > Message.as_string() -> bytes? > decode_header() -> list of (bytes, str|None) or (str, str|None)? > base64MIME.encode() -> bytes? > > message_from_string() <- bytes? > > Message.get_payload() -> bytes or str? > > A charset name type is str, right? > > --------------- > > Things to change to get bytes: > - replace StringIO with BytesIO > - add 'b' prefix, eg. '' becomes b'' > - replace "%s=%s" % (x, y) with b''.join((x, b'=', y)) > => is it the best method to concatenate bytes? > > Problems (to port python 2.x code to 3000): > - When obj.lower() is used, I expect obj to be str but it's bytes > - obj.strip() doesn't work when obj is a byte, it requires an > argument but I don't know the right value! Maybe b'\n\r\v\t '? > - iterate on a bytes object gives number and not bytes object, eg. > for c in b"small text": > if re.match("(\n|\r)", c): ... > Is it possible to 'bytes' regex? re.compile(b"x") raise an exception > > -- > Victor Stinner aka haypo > http://hachoir.org/ > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From victor.stinner at haypocalc.com Thu Aug 9 04:27:19 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 9 Aug 2007 04:27:19 +0200 Subject: [Python-3000] bytes regular expression? Message-ID: <200708090427.19830.victor.stinner@haypocalc.com> Hi, Since Python 3000 regular expressions are now Unicode by default, how can I use bytes regex? Very simplified example of my problem: import re print( re.sub("(.)", b"[\\1]", b'abc') ) This code fails with exception: File "(...)/py3k-struni/Lib/re.py", line 241, in _compile_repl p = _cache_repl.get(key) TypeError: unhashable type: 'bytes' Does "frozen bytes type" (immutable) exist to be able to use a cache? Victor Stinner aka haypo http://hachoir.org/ From skip at pobox.com Thu Aug 9 04:55:26 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 8 Aug 2007 21:55:26 -0500 Subject: [Python-3000] C API cleanup str In-Reply-To: References: <46B2C8E0.8080409@canterbury.ac.nz> <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> <46B5FA11.5040404@v.loewis.de> Message-ID: <18106.33310.386717.634156@montanaro.dyndns.org> Neal> That's already been done for a while: Neal> http://docs.python.org/dev/3.0/ Cool. If the Google Sprint Chicago Edition becomes a reality (we've been discussing it on chicago at python.org) and I get to go I think I will probably devote much of my time to documentation. Skip From guido at python.org Thu Aug 9 06:07:12 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 8 Aug 2007 21:07:12 -0700 Subject: [Python-3000] bytes regular expression? In-Reply-To: <200708090427.19830.victor.stinner@haypocalc.com> References: <200708090427.19830.victor.stinner@haypocalc.com> Message-ID: A quick temporary hack is to use buffer(b'abc') instead. (buffer() is so incredibly broken that it lets you hash() even if the underlying object is broken. :-) The correct solution is to fix the re library to avoid using hash() directly on the underlying data type altogether; that never had sound semantics (as proven by the buffer() hack above). --Guido On 8/8/07, Victor Stinner wrote: > Hi, > > Since Python 3000 regular expressions are now Unicode by default, how can I > use bytes regex? Very simplified example of my problem: > import re > print( re.sub("(.)", b"[\\1]", b'abc') ) > > This code fails with exception: > File "(...)/py3k-struni/Lib/re.py", line 241, in _compile_repl > p = _cache_repl.get(key) > TypeError: unhashable type: 'bytes' > > Does "frozen bytes type" (immutable) exist to be able to use a cache? > > Victor Stinner aka haypo > http://hachoir.org/ > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Thu Aug 9 08:06:32 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 09 Aug 2007 08:06:32 +0200 Subject: [Python-3000] C API cleanup str In-Reply-To: <18106.33310.386717.634156@montanaro.dyndns.org> References: <46B2C8E0.8080409@canterbury.ac.nz> <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> <46B5FA11.5040404@v.loewis.de> <18106.33310.386717.634156@montanaro.dyndns.org> Message-ID: skip at pobox.com schrieb: > Neal> That's already been done for a while: > Neal> http://docs.python.org/dev/3.0/ > > Cool. If the Google Sprint Chicago Edition becomes a reality (we've been > discussing it on chicago at python.org) and I get to go I think I will probably > devote much of my time to documentation. When will that be? I think we should then switch over to the reST tree before you start. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From brett at python.org Thu Aug 9 08:31:08 2007 From: brett at python.org (Brett Cannon) Date: Wed, 8 Aug 2007 23:31:08 -0700 Subject: [Python-3000] C API cleanup str In-Reply-To: References: <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> <46B5FA11.5040404@v.loewis.de> <18106.33310.386717.634156@montanaro.dyndns.org> Message-ID: On 8/8/07, Georg Brandl wrote: > skip at pobox.com schrieb: > > Neal> That's already been done for a while: > > Neal> http://docs.python.org/dev/3.0/ > > > > Cool. If the Google Sprint Chicago Edition becomes a reality (we've been > > discussing it on chicago at python.org) and I get to go I think I will probably > > devote much of my time to documentation. > > When will that be? I think we should then switch over to the reST tree > before you start. Aug 22-25: http://wiki.python.org/moin/GoogleSprint?highlight=%28googlesprint%29 . -Brett From talex5 at gmail.com Wed Aug 8 20:27:56 2007 From: talex5 at gmail.com (Thomas Leonard) Date: Wed, 8 Aug 2007 18:27:56 +0000 (UTC) Subject: [Python-3000] Binary compatibility References: <46B7EC3E.3070802@v.loewis.de> <46B95895.20705@v.loewis.de> Message-ID: On Wed, 08 Aug 2007 07:45:57 +0200, Martin v. L?wis wrote: >>> Now, you seem to talk about different *Linux* systems. On Linux, >>> use UCS-4. >> >> Yes, that's what we want. But Python 2.5 defaults to UCS-2 (at least >> last time I tried), while many distros have used UCS-4. If Linux >> always used UCS-4, that would be fine, but currently there's no >> guarantee of that. > > I see why a guarantee would help, but I don't think it's necessary. > Just provide UCS-4 binaries only on Linux, and when somebody complains, > tell them to recompile Python, or to recompile your software themselves. Won't recompiling Python break every other Python program on their system, though? (e.g. anything that itself uses a C Python library) Also, anything involving recompiling isn't exactly user friendly... we might give Linux a bad name! > The defaults in 2.5.x cannot be changed anymore. The defaults could > be changed for Linux in 2.6, but then the question is: why just for > Linux? Are there different Windows python binaries around with different UCS-2/4 settings? If so, I'd imaging that would be a problem too, although as I say we don't have many Windows users for ROX. BTW, none of this is urgent. We experimented with Python/C hybrids in the past. It didn't work, so we carried on using pure C programs for anything that needed any part in C. So it's not causing actual problems for users right now. It would just be nice to have it sorted out one day, so we could use Python more in the future. -- Dr Thomas Leonard http://rox.sourceforge.net GPG: 9242 9807 C985 3C07 44A6 8B9A AE07 8280 59A5 3CC1 From talin at acm.org Thu Aug 9 09:31:37 2007 From: talin at acm.org (Talin) Date: Thu, 09 Aug 2007 00:31:37 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B8D58E.5040501@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> Message-ID: <46BAC2D9.2020902@acm.org> Ron Adam wrote: > Talin wrote: >> Ron Adam wrote: >>> Now here's the problem with all of this. As we add the widths back >>> into the format specifications, we are basically saying the idea of a >>> separate field width specifier is wrong. >>> >>> So maybe it's not really a separate independent thing after all, and >>> it just a convenient grouping for readability purposes only. >> >> I'm beginning to suspect that this is indeed the case. > > Yes, I believe so even more after experimenting last night with > specifier objects. > > for now I'm using ','s for separating *all* the terms. I don't intend > that should be used for a final version, but for now it makes parsing > the terms and getting the behavior right much easier. > > f,.3,>7 right justify in field width 7, with 3 decimal places. > > s,^10,w20 Center in feild 10, expands up to width 20. > > f,.3,% > > This allows me to just split on ',' and experiment with ordering and see > how some terms might need to interact with other terms and how to do > that without having to fight the syntax problem for now. > > Later the syntax can be compressed and tested with a fairly complete > doctest as a separate problem. When you get a chance, can you write down your current thinking in a single document? Right now, there are lots of suggestions scattered in a bunch of different messages, some of which have been superseded, and it's hard to sew them together. At this point, I think that as far as the mini-language goes, after wandering far afield from the original PEP we have arrived at a design that's not very far - at least semantically - from what we started with. In other words, other than the special case of 'repr', we find that pretty much everything can fit into a single specifier string; Attempts to break it up into two independent specifiers that are handled by two different entities run into the problem that the specifiers aren't independent and there are interactions between the two. Because the dividing line between "format specifier" and "alignment specifier" changes based on the type of data being formatted, trying to keep them separate results in redundancy and duplication, where we end up with more than one way to specify padding, alignment, or minimum width. So I'm tempted to just use what's in the PEP now as a starting point - perhaps re-arranging the order of attributes, as has been discussed, or perhaps not - and then handling 'repr' via a different prefix character other than ':'. The 'repr' flag does nothing more than call __repr__ on the object, and then call __format__ on the result using whatever conversion spec was specified. (There might be a similar flag that does a call to __str__, which has the effect of calling str.__format__ instead of the object's native __format__ function.) As far as requiring the different built-in versions of __format__ to have to parse the standard conversion specifier, that is not a problem in practice, as we'll have a little mini-parser that parses the conversion spec and fills in a C struct. There will also be a Python-accessible version of the same thing for people extending formatters in Python. So, the current action items are: 1) Get consensus the syntax of the formatting mini-language. 2) Create a pure-python implementation of the global 'format' function, which will be a new standard library function that formats a single value, given a conversion spec: format(value, conversion) 3) Write implementations of str.__format__, int.__format__, float.__format__, decimal.__format__ and so on. 4) Create C implementations of the above. 5) Write the code for complex, multi-value formatting as specified in the PEP, and hook up to the built-in string class. -- Talin From martin at v.loewis.de Thu Aug 9 10:30:53 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 09 Aug 2007 10:30:53 +0200 Subject: [Python-3000] Binary compatibility In-Reply-To: References: <46B7EC3E.3070802@v.loewis.de> <46B95895.20705@v.loewis.de> Message-ID: <46BAD0BD.7040905@v.loewis.de> >> I see why a guarantee would help, but I don't think it's necessary. >> Just provide UCS-4 binaries only on Linux, and when somebody complains, >> tell them to recompile Python, or to recompile your software themselves. > > Won't recompiling Python break every other Python program on their system, > though? (e.g. anything that itself uses a C Python library) It depends. If they use their own Python installation, just tell them to use the vendor-supplied one instead - it likely is UCS-4. If the vendor-supplied one is UCS-2, talk to the vendor. For the user, tell them to make a separate installation (e.g. in /usr/local/bin). This won't interfere with the existing installation. > Also, anything involving recompiling isn't exactly user friendly... we > might give Linux a bad name! Hmm. Some think that Linux has becoming worse when people stopped compiling the kernel themselves. >> The defaults in 2.5.x cannot be changed anymore. The defaults could >> be changed for Linux in 2.6, but then the question is: why just for >> Linux? > > Are there different Windows python binaries around with different UCS-2/4 > settings? No. The definition of Py_UNICODE on Windows is mandated by the operating system, which has Unicode APIs that are two-bytes, and Python wants to use them. > BTW, none of this is urgent. We experimented with Python/C hybrids in the > past. It didn't work, so we carried on using pure C programs for anything > that needed any part in C. So it's not causing actual problems for users > right now. It would just be nice to have it sorted out one day, so we > could use Python more in the future. I can understand the concern, but I believe it is fairly theoretical. Most distributions do use UCS-4 these days, so the problem went away by de-facto standardization, not by de-jure standardization. Regards, Martin From rrr at ronadam.com Thu Aug 9 12:42:56 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 09 Aug 2007 05:42:56 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BAC2D9.2020902@acm.org> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> Message-ID: <46BAEFB0.9050400@ronadam.com> Talin wrote: > Ron Adam wrote: >> Talin wrote: >>> Ron Adam wrote: >>>> Now here's the problem with all of this. As we add the widths back >>>> into the format specifications, we are basically saying the idea of >>>> a separate field width specifier is wrong. >>>> >>>> So maybe it's not really a separate independent thing after all, and >>>> it just a convenient grouping for readability purposes only. >>> >>> I'm beginning to suspect that this is indeed the case. >> >> Yes, I believe so even more after experimenting last night with >> specifier objects. >> >> for now I'm using ','s for separating *all* the terms. I don't intend >> that should be used for a final version, but for now it makes parsing >> the terms and getting the behavior right much easier. >> >> f,.3,>7 right justify in field width 7, with 3 decimal places. >> >> s,^10,w20 Center in feild 10, expands up to width 20. >> >> f,.3,% >> >> This allows me to just split on ',' and experiment with ordering and >> see how some terms might need to interact with other terms and how to >> do that without having to fight the syntax problem for now. >> >> Later the syntax can be compressed and tested with a fairly complete >> doctest as a separate problem. > > When you get a chance, can you write down your current thinking in a > single document? Right now, there are lots of suggestions scattered in a > bunch of different messages, some of which have been superseded, and > it's hard to sew them together. I'll see what I can come up with. But I think you pretty much covered it below. > At this point, I think that as far as the mini-language goes, after > wandering far afield from the original PEP we have arrived at a design > that's not very far - at least semantically - from what we started with. Yes, I agree. > In other words, other than the special case of 'repr', we find that > pretty much everything can fit into a single specifier string; Attempts > to break it up into two independent specifiers that are handled by two > different entities run into the problem that the specifiers aren't > independent and there are interactions between the two. Because the > dividing line between "format specifier" and "alignment specifier" > changes based on the type of data being formatted, trying to keep them > separate results in redundancy and duplication, where we end up with > more than one way to specify padding, alignment, or minimum width. Yes. Another deciding factor is weather or not users want a general formatting language that is very flexible, and allows them to combine and order instructions to do a wide variety of things. Some of which may not make much sense. (Just like you can create regular expressions that don't make sense.) Or do they want an option based system that limits what they can do to a set of well defined behaviors? It seems having well defined behaviors (limited to things that make sense.) is preferred. (Although I prefer the former myself.) > So I'm tempted to just use what's in the PEP now as a starting point - > perhaps re-arranging the order of attributes, as has been discussed, or > perhaps not - and then handling 'repr' via a different prefix character > other than ':'. The 'repr' flag does nothing more than call __repr__ on > the object, and then call __format__ on the result using whatever > conversion spec was specified. (There might be a similar flag that does > a call to __str__, which has the effect of calling str.__format__ > instead of the object's native __format__ function.) The way to think of 'repr' and 'str' is that of a general "object" format type/specifier. That puts str and repr into the same context as the rest of the format types. This is really a point of view issue and not so much of a semantic one. I think {0:r} and {0:s} are to "object", as {0:d} and {0:e} are to "float" ... just another relationship relative to the value being formatted. So I don't understand the need to treat them differently. > As far as requiring the different built-in versions of __format__ to > have to parse the standard conversion specifier, that is not a problem > in practice, as we'll have a little mini-parser that parses the > conversion spec and fills in a C struct. There will also be a > Python-accessible version of the same thing for people extending > formatters in Python. This is not too far from what I was thinking then. I'm not sure I can add much to that. My current experimental implementation, allows for pre-parsing a format string so the parsing step can be moved outside of a loop and doesn't have to be reparsed on each use, or it can be examined and possibly modified before applying it to arguments. I'm not sure how useful that is, but instead of iterating a string and handling each item sequentially, it parses the whole string and all the format fields at one time, then formats all the arguments, then does a list.join() operation to combine them. This may be faster in pure python, but probably slower in C. > So, the current action items are: > > 1) Get consensus the syntax of the formatting mini-language. Putting the syntax first can introduce side effects or limitations as a result of the syntax. So this might be better as a later step. By getting a consensus on the exact behaviors and then proceeding to the implementation, I think it will move things along faster. While this is for the most part is in the pep, I think any loose ends on the behavior side should be nailed down completely before the final syntax is worked out. Then we can find a syntax that works with the implementation, rather than try to make the implementation work with the syntax. > 2) Create a pure-python implementation of the global 'format' function, > which will be a new standard library function that formats a single > value, given a conversion spec: > > format(value, conversion) > > 3) Write implementations of str.__format__, int.__format__, > float.__format__, decimal.__format__ and so on. > > 4) Create C implementations of the above. > > 5) Write the code for complex, multi-value formatting as specified in > the PEP, and hook up to the built-in string class. I think finishing up #2 and #3 should come first with very extensive tests. (Using what ever syntax works for now.) I've been going over the tests in the sand box trying to get my experimental version to pass them. Once I get it to pass most of them I'll send you a copy. BTW.. I noticed str.center() has an odd behavior of alternating uneven padding widths on odd or even lengths strings. Is this intentional? >>> 'a'.center(2) 'a ' >>> 'aa'.center(3) ' aa' Cheers, Ron From nmm1 at cus.cam.ac.uk Wed Aug 8 11:28:16 2007 From: nmm1 at cus.cam.ac.uk (Nick Maclaren) Date: Wed, 08 Aug 2007 10:28:16 +0100 Subject: [Python-3000] Regular expressions, Unicode etc. Message-ID: I have needed to push my stack to teach REs (don't ask), and am taking a look at the RE code. I may be able to extend it to support RFE 694374 and (more importantly) atomic groups and possessive quantifiers. While I regard such things as revolting beyond belief, they make a HELL of a difference to the efficiency of recognising things like HTML tags in a morass of mixed text. The other approach, which is to stick to true regular expressions, and wholly or partially convert to DFAs, has already been rendered impossible by even the limited Perl/PCRE extensions that Python has adopted. My first question is whether this would clash with any ongoing work, including being superseded by any changes in Python 3000. Note that I am NOT proposing to do a fixed task, but will produce a proper proposal only when I know what I can achieve for a small amount of work. If the SRE engine turns out to be unsuitable to extend in these ways, I shall quietly abandon the project. My second one is about Unicode. I really, but REALLY regard it as a serious defect that there is no escape for printing characters. Any code that checks arbitrary text is likely to need them - yes, I know why Perl and hence PCRE doesn't have that, but let's skip that. That is easy to add, though choosing a letter is tricky. Currently \c and \C, for 'character' (I would prefer 'text' or 'printable', but \t is obviously insane and \P is asking for incompatibility with Perl and Java). But attempting to rebuild the Unicode database hasn't worked. Tools/unicode is, er, a trifle incomplete and out of date. The only file I need to change is Objects/unicodetype_db.h, but the init attempts to run Tools/unicode/makeunicodedata.py have not been successful. I may be able to reverse engineer the mechanism enough to get the files off the Unicode site and run it, but I don't want to spend forever on it. Any clues? Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679 From skip at pobox.com Thu Aug 9 13:14:07 2007 From: skip at pobox.com (skip at pobox.com) Date: Thu, 9 Aug 2007 06:14:07 -0500 Subject: [Python-3000] C API cleanup str In-Reply-To: References: <46B2C8E0.8080409@canterbury.ac.nz> <46B5C47B.5090703@v.loewis.de> <46B5F136.4010502@v.loewis.de> <46B5FA11.5040404@v.loewis.de> <18106.33310.386717.634156@montanaro.dyndns.org> Message-ID: <18106.63231.371228.836379@montanaro.dyndns.org> Georg> When will that be? I think we should then switch over to the reST Georg> tree before you start. Georg, Can you remind me how to get at your new doc tree? Skip From guido at python.org Thu Aug 9 15:57:58 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Aug 2007 06:57:58 -0700 Subject: [Python-3000] Moving to a "py3k" branch *NOW* Message-ID: I am starting now. Please, no more checkins to either p3yk ot py3k-struni. On 8/8/07, Guido van Rossum wrote: > I would like to move to a new branch soon for all Py3k development. > > I plan to name the branch "py3k". It will be branched from > py3k-struni. I will do one last set of merges from the trunk via p3yk > (note typo!) and py3k-struni, and then I will *delete* the old py3k > and py3k-struni branches (you will still be able to access their last > known good status by syncing back to a previous revision). I will > temporarily shut up some unit tests to avoid getting endless spam from > Neal's buildbot. > > After the switch, you should be able to switch your workspaces to the > new branch using the "svn switch" command. > > If anyone is in the middle of something that would become painful due > to this changeover, let me know ASAP and I'll delay. > > I will send out another message when I start the move, and another > when I finish it. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Aug 9 16:43:41 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Aug 2007 07:43:41 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* Message-ID: This is done. The new py3k branch is ready for business. If you currently have the py3k-struni branch checked out (at its top level), *don't update*, but issue the following commands: svn switch svn+ssh://pythondev at svn.python.org/python/branches/py3k svn update Only a small amount of activity should result (unless you didn't svn update for a long time). For the p3yk branch, the same instructions will work, but the svn update will update most of your tree. A "make clean" is recommended in this case. Left to do: - update the wikis - clean out the old branches - switch the buildbot and the doc builder to use the new branch (Neal) There are currently about 7 failing unit tests left: test_bsddb test_bsddb3 test_email test_email_codecs test_email_renamed test_sqlite test_urllib2_localnet See http://wiki.python.org/moin/Py3kStrUniTests for detailed status regarding these. --Guido On 8/9/07, Guido van Rossum wrote: > I am starting now. Please, no more checkins to either p3yk ot py3k-struni. > > On 8/8/07, Guido van Rossum wrote: > > I would like to move to a new branch soon for all Py3k development. > > > > I plan to name the branch "py3k". It will be branched from > > py3k-struni. I will do one last set of merges from the trunk via p3yk > > (note typo!) and py3k-struni, and then I will *delete* the old py3k > > and py3k-struni branches (you will still be able to access their last > > known good status by syncing back to a previous revision). I will > > temporarily shut up some unit tests to avoid getting endless spam from > > Neal's buildbot. > > > > After the switch, you should be able to switch your workspaces to the > > new branch using the "svn switch" command. > > > > If anyone is in the middle of something that would become painful due > > to this changeover, let me know ASAP and I'll delay. > > > > I will send out another message when I start the move, and another > > when I finish it. > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From victor.stinner at haypocalc.com Thu Aug 9 17:40:58 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 9 Aug 2007 17:40:58 +0200 Subject: [Python-3000] bytes regular expression? In-Reply-To: References: <200708090427.19830.victor.stinner@haypocalc.com> Message-ID: <200708091740.59070.victor.stinner@haypocalc.com> Hi, On Thursday 09 August 2007 06:07:12 Guido van Rossum wrote: > A quick temporary hack is to use buffer(b'abc') instead. (buffer() is > so incredibly broken that it lets you hash() even if the underlying > object is broken. :-) I prefer str8 which looks to be a good candidate for "frozenbytes" type. > The correct solution is to fix the re library to avoid using hash() > directly on the underlying data type altogether; that never had sound > semantics (as proven by the buffer() hack above). re module uses a dictionary to store compiled expressions and the key is a tuple (pattern, flags) where pattern is a bytes (str8) or str and flags is an int. re module bugs: 1. _compile() doesn't support bytes 2. escape() doesn't support bytes My attached patch fix both bugs: - convert bytes to str8 in _compile() to be able to hash it - add a special version of escape() for bytes I don't know the best method to create a bytes in a for. In Python 2.x, the best method is to use a list() and ''.join(). Since bytes is mutable I choosed to use append() and concatenation (a += b). I also added new unit test for escape() function with bytes argument. You may not apply my patch directly. I don't know Python 3000 very well nor Python coding style. But my patch should help to fix the bugs ;-) ----- Why re module has code for Python < 2.2 (optional finditer() function)? Since the code is now specific to Python 3000, we should use new types like set (use a set for _alphanum instead of a dictionary) and functions like enumerate (in _escape for str block). Victor Stinner http://hachoir.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: py3k-struni-re.diff Type: text/x-diff Size: 3440 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070809/23e53b6a/attachment-0001.bin From jason.orendorff at gmail.com Thu Aug 9 17:42:46 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Thu, 9 Aug 2007 11:42:46 -0400 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7FACC.8030503@v.loewis.de> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> Message-ID: On 8/7/07, "Martin v. L?wis" wrote: > My concern is that people need to access existing databases. It's > all fine that the code accessing them breaks, and that they have > to actively port to Py3k. However, telling them that they have to > represent the keys in their dbm disk files in a different manner > might cause a revolt... Too true. Offhand, why not provide hooks for serializing and deserializing keys? The same for values, too. People porting to py3k could use those. Besides, this thread makes it sound like people usually write their own wrapper classes whenever they use *dbm. Hooks would help with that, or even eliminate the need altogether. -j From jjb5 at cornell.edu Thu Aug 9 18:11:16 2007 From: jjb5 at cornell.edu (Joel Bender) Date: Thu, 09 Aug 2007 12:11:16 -0400 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> Message-ID: <46BB3CA4.9010904@cornell.edu> Jason Orendorff wrote: > Hooks would help with that, or even eliminate the need altogether. IMHO, having a __bytes__ method would go a long way. Joel From kbk at shore.net Thu Aug 9 18:21:58 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Thu, 09 Aug 2007 12:21:58 -0400 Subject: [Python-3000] IDLE in new py3k Message-ID: <87y7gkel09.fsf@hydra.hampton.thirdcreek.com> After a clean checkout in py3k, IDLE fails even w/o subprocess... trader ~/PYDOTORG/projects/python/branches/py3k$ ./python Lib/idlelib/idle.py Fatal Python error: PyEval_SaveThread: NULL tstate Aborted trader ~/PYDOTORG/projects/python/branches/py3k$ ./python Python 3.0x (py3k:56858, Aug 9 2007, 12:09:06) [GCC 4.1.2 20061027 (prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> trader ~/PYDOTORG/projects/python/branches/py3k$ ./python Lib/idlelib/idle.py -n Fatal Python error: PyEval_SaveThread: NULL tstate Aborted Any ideas on where to look? -- KBK From jason.orendorff at gmail.com Thu Aug 9 18:34:11 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Thu, 9 Aug 2007 12:34:11 -0400 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46BB3CA4.9010904@cornell.edu> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <46BB3CA4.9010904@cornell.edu> Message-ID: On 8/9/07, Joel Bender wrote: > Jason Orendorff wrote: > > Hooks would help with that, or even eliminate the need altogether. > > IMHO, having a __bytes__ method would go a long way. Well, it would go halfway--you also need to deserialize. __bytes__ alone would be useless. Of course Python does have a library for serializing and deserializing practically anything: pickle. What I proposed is a generalization of shelve. http://docs.python.org/lib/module-shelve.html -j From guido at python.org Thu Aug 9 18:36:27 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Aug 2007 09:36:27 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46BB3CA4.9010904@cornell.edu> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <46BB3CA4.9010904@cornell.edu> Message-ID: On 8/9/07, Joel Bender wrote: > Jason Orendorff wrote: > > > Hooks would help with that, or even eliminate the need altogether. > > IMHO, having a __bytes__ method would go a long way. I've heard this before, but there are many different, equally attractive ways to serialize objects to bytes (e.g. marshal, pickle, repr + encode, etc.). Bytes are not the same as strings, and you need to think differently about them. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From victor.stinner at haypocalc.com Thu Aug 9 18:38:00 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 9 Aug 2007 18:38:00 +0200 Subject: [Python-3000] bytes regular expression? In-Reply-To: <200708091740.59070.victor.stinner@haypocalc.com> References: <200708090427.19830.victor.stinner@haypocalc.com> <200708091740.59070.victor.stinner@haypocalc.com> Message-ID: <200708091838.00298.victor.stinner@haypocalc.com> On Thursday 09 August 2007 17:40:58 I wrote: > My attached patch fix both bugs: > - convert bytes to str8 in _compile() to be able to hash it > - add a special version of escape() for bytes My first try was buggy for this snippet code: import re assert type(re.sub(b'', b'', b'')) is bytes assert type(re.sub(b'(x)', b'[\\1]', b'x')) is bytes My first patch mix bytes and str8 and so re.sub fails in some cases. So here is a new patch using str8 in dictionary key and str in regex parsing (sre_parse.py) (and then reconvert to bytes for 'literals' variable). Victor Stinner http://hachoir.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: py3k-struni-re2.diff Type: text/x-diff Size: 5398 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070809/7ff5e743/attachment.bin From guido at python.org Thu Aug 9 18:44:58 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Aug 2007 09:44:58 -0700 Subject: [Python-3000] IDLE in new py3k In-Reply-To: <87y7gkel09.fsf@hydra.hampton.thirdcreek.com> References: <87y7gkel09.fsf@hydra.hampton.thirdcreek.com> Message-ID: On 8/9/07, Kurt B. Kaiser wrote: > > After a clean checkout in py3k, IDLE fails even w/o subprocess... > > trader ~/PYDOTORG/projects/python/branches/py3k$ ./python Lib/idlelib/idle.py > Fatal Python error: PyEval_SaveThread: NULL tstate > Aborted > > trader ~/PYDOTORG/projects/python/branches/py3k$ ./python > Python 3.0x (py3k:56858, Aug 9 2007, 12:09:06) > [GCC 4.1.2 20061027 (prerelease)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> > > trader ~/PYDOTORG/projects/python/branches/py3k$ ./python Lib/idlelib/idle.py -n > Fatal Python error: PyEval_SaveThread: NULL tstate > Aborted So it does. :-( > Any ideas on where to look? No, but I'll see if I can find anything with gdb. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jarausch at igpm.rwth-aachen.de Thu Aug 9 18:38:18 2007 From: jarausch at igpm.rwth-aachen.de (Helmut Jarausch) Date: Thu, 09 Aug 2007 18:38:18 +0200 (CEST) Subject: [Python-3000] idle3.0 - is is supposed to work? Message-ID: Hi, probably, I am too impatient. I've just installed py3k (the new branch). Trying idle3.0 I get Traceback (most recent call last): File "/usr/local/bin/idle3.0", line 3, in from idlelib.PyShell import main File "/usr/local/lib/python3.0/idlelib/PyShell.py", line 26, in from .EditorWindow import EditorWindow, fixwordbreaks File "/usr/local/lib/python3.0/idlelib/EditorWindow.py", line 16, in from . import GrepDialog File "/usr/local/lib/python3.0/idlelib/GrepDialog.py", line 5, in import SearchEngine ImportError: No module named SearchEngine Thanks all of you for improving Python even more, Helmut Jarausch Lehrstuhl fuer Numerische Mathematik RWTH - Aachen University D 52056 Aachen, Germany From theller at ctypes.org Thu Aug 9 19:21:27 2007 From: theller at ctypes.org (Thomas Heller) Date: Thu, 09 Aug 2007 19:21:27 +0200 Subject: [Python-3000] bytes regular expression? In-Reply-To: <200708091740.59070.victor.stinner@haypocalc.com> References: <200708090427.19830.victor.stinner@haypocalc.com> <200708091740.59070.victor.stinner@haypocalc.com> Message-ID: Victor Stinner schrieb: > > I prefer str8 which looks to be a good candidate for "frozenbytes" type. > I love this idea! Leave str8 as it is, maybe extend Python so that it understands the s"..." literals and we are done. Thomas From kbk at shore.net Thu Aug 9 19:22:07 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Thu, 09 Aug 2007 13:22:07 -0400 Subject: [Python-3000] idle3.0 - is is supposed to work? In-Reply-To: (Helmut Jarausch's message of "Thu, 09 Aug 2007 18:38:18 +0200 (CEST)") References: Message-ID: <87tzr8ei80.fsf@hydra.hampton.thirdcreek.com> Helmut Jarausch writes: > probably, I am too impatient. > I've just installed py3k (the new branch). > Trying > idle3.0 > > I get > > Traceback (most recent call last): > File "/usr/local/bin/idle3.0", line 3, in > from idlelib.PyShell import main > File "/usr/local/lib/python3.0/idlelib/PyShell.py", line 26, in > from .EditorWindow import EditorWindow, fixwordbreaks > File "/usr/local/lib/python3.0/idlelib/EditorWindow.py", line 16, in > from . import GrepDialog > File "/usr/local/lib/python3.0/idlelib/GrepDialog.py", line 5, in > import SearchEngine > ImportError: No module named SearchEngine I just checked in a fix - GrepDialog.py wasn't using relative imports. I'm not sure why you hit this exception and I did't. Probably a sys.path difference. Try again. What platform are you using? On Linux trader 2.6.18-ARCH #1 SMP PREEMPT Sun Nov 19 09:14:35 CET 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux (and whatever GvR is running) IDLE isn't starting at all in py3k. -- KBK From guido at python.org Thu Aug 9 19:31:18 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Aug 2007 10:31:18 -0700 Subject: [Python-3000] idle3.0 - is is supposed to work? In-Reply-To: <87tzr8ei80.fsf@hydra.hampton.thirdcreek.com> References: <87tzr8ei80.fsf@hydra.hampton.thirdcreek.com> Message-ID: On 8/9/07, Kurt B. Kaiser wrote: > What platform are you using? On > > Linux trader 2.6.18-ARCH #1 SMP PREEMPT Sun Nov 19 09:14:35 CET 2006 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux > > (and whatever GvR is running) IDLE isn't starting at all in py3k. I get the same failure on OSX (PPC) and on Linux (x86 Ubuntu). It has to do with the Tcl/Tk wrapping code, in particular it's in the LEAVE_PYTHON macro on line 1995 on _tkinter.c. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From theller at ctypes.org Thu Aug 9 19:40:12 2007 From: theller at ctypes.org (Thomas Heller) Date: Thu, 09 Aug 2007 19:40:12 +0200 Subject: [Python-3000] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: <46B96B0D.6080605@v.loewis.de> References: <46B96B0D.6080605@v.loewis.de> Message-ID: Martin v. L?wis schrieb: >>> It's in Modules/timemodule.c, line 691: >>> PyModule_AddObject(m, "tzname", >>> Py_BuildValue("(zz)", tzname[0], tzname[1])); >>> >>> According to MSDN, tzname is a global variable; the contents is somehow >>> derived from the TZ environment variable (which is not set in my case). >> >> Is there anything from which you can guess the encoding (e.g. the >> filesystem encoding?). > > It's in the locale's encoding. On Windows, that will be "mbcs"; on other > systems, the timezone names are typically all in ASCII - this would > allow for a quick work-around. Using the filesytemencoding would also > work, although it would be an equal hack: it's *meant* to be used only > for file names (and on OSX at least, it deviates from the locale's > encoding - although I have no idea what tzname is encoded in on OSX). > >> These are all externally-provided strings. It will depend on the >> platform what the encoding is. >> >> I wonder if we need to add another format code to Py_BuildValue (and >> its friends) to designate "platform default encoding" instead of >> UTF-8. > > For symmetry with ParseTuple, there could be the 'e' versions > (es, ez, ...) which would take a codec name also. That would be great, imo. OTOH, I have not time to do this. Currently, I have to set TZ=GMT to be able to start python 3. Thomas From steven.bethard at gmail.com Thu Aug 9 19:39:50 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Thu, 9 Aug 2007 11:39:50 -0600 Subject: [Python-3000] bytes regular expression? In-Reply-To: <200708091740.59070.victor.stinner@haypocalc.com> References: <200708090427.19830.victor.stinner@haypocalc.com> <200708091740.59070.victor.stinner@haypocalc.com> Message-ID: On 8/9/07, Victor Stinner wrote: > re module uses a dictionary to store compiled expressions and the key is a > tuple (pattern, flags) where pattern is a bytes (str8) or str and flags is an > int. So why not just skip caching for anything that doesn't hash()? If you're really worried about efficiency, simply re.compile() the expression once and don't rely on the re module's internal cache. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From martin at v.loewis.de Thu Aug 9 23:04:46 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 09 Aug 2007 23:04:46 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> Message-ID: <46BB816E.1070507@v.loewis.de> Jason Orendorff schrieb: > On 8/7/07, "Martin v. L?wis" wrote: >> My concern is that people need to access existing databases. It's >> all fine that the code accessing them breaks, and that they have >> to actively port to Py3k. However, telling them that they have to >> represent the keys in their dbm disk files in a different manner >> might cause a revolt... > > Too true. Offhand, why not provide hooks for serializing and > deserializing keys? Perhaps YAGNI? We already support pickling values (dbshelve), and I added support for encoding/decoding strings as either keys or values (though in a limited manner). In any case, somebody would have to make a specification for that, and then somebody would have to provide an implementation of it. Regards, Martin From guido at python.org Thu Aug 9 23:49:44 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Aug 2007 14:49:44 -0700 Subject: [Python-3000] IDLE in new py3k In-Reply-To: References: <87y7gkel09.fsf@hydra.hampton.thirdcreek.com> Message-ID: I've checked in a fix for the immediate cause of the fatal error: an error path in PythonCmd() was passign through the LEAVE_PYTHON macro twice. This bug was present even on the trunk, where I fixed it too (and probably in 2.5 as well, but I didn't check). But the reason we got here was that an AsString() call failed. Why? Here's the traceback: Traceback (most recent call last): File "/usr/local/google/home/guido/python/py3k/Lib/runpy.py", line 83, in run_module filename, loader, alter_sys) File "/usr/local/google/home/guido/python/py3k/Lib/runpy.py", line 50, in _run_module_code mod_name, mod_fname, mod_loader) File "/usr/local/google/home/guido/python/py3k/Lib/runpy.py", line 32, in _run_code exec(code, run_globals) File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/idle.py", line 21, in idlelib.PyShell.main() File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/PyShell.py", line 1385, in main shell = flist.open_shell() File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/PyShell.py", line 272, in open_shell self.pyshell = PyShell(self) File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/PyShell.py", line 795, in __init__ OutputWindow.__init__(self, flist, None, None) File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/OutputWindow.py", line 16, in __init__ EditorWindow.__init__(self, *args) File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/EditorWindow.py", line 231, in __init__ per.insertfilter(color) File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/Percolator.py", line 35, in insertfilter filter.setdelegate(self.top) File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/ColorDelegator.py", line 49, in setdelegate self.config_colors() File "/usr/local/google/home/guido/python/py3k/Lib/idlelib/ColorDelegator.py", line 56, in config_colors self.tag_configure(tag, **cnf) File "/usr/local/google/home/guido/python/py3k/Lib/lib-tk/Tkinter.py", line 3066, in tag_configure return self._configure(('tag', 'configure', tagName), cnf, kw) File "/usr/local/google/home/guido/python/py3k/Lib/lib-tk/Tkinter.py", line 1187, in _configure self.tk.call(_flatten((self._w, cmd)) + self._options(cnf)) _tkinter.TclError: unknown option "#000000" --Guido On 8/9/07, Guido van Rossum wrote: > On 8/9/07, Kurt B. Kaiser wrote: > > > > After a clean checkout in py3k, IDLE fails even w/o subprocess... > > > > trader ~/PYDOTORG/projects/python/branches/py3k$ ./python Lib/idlelib/idle.py > > Fatal Python error: PyEval_SaveThread: NULL tstate > > Aborted > > > > trader ~/PYDOTORG/projects/python/branches/py3k$ ./python > > Python 3.0x (py3k:56858, Aug 9 2007, 12:09:06) > > [GCC 4.1.2 20061027 (prerelease)] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> > > > > trader ~/PYDOTORG/projects/python/branches/py3k$ ./python Lib/idlelib/idle.py -n > > Fatal Python error: PyEval_SaveThread: NULL tstate > > Aborted > > So it does. :-( > > > Any ideas on where to look? > > No, but I'll see if I can find anything with gdb. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jason.orendorff at gmail.com Fri Aug 10 00:00:58 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Thu, 9 Aug 2007 18:00:58 -0400 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46BB816E.1070507@v.loewis.de> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <46BB816E.1070507@v.loewis.de> Message-ID: On 8/9/07, "Martin v. L?wis" wrote: > > Too true. Offhand, why not provide hooks for serializing and > > deserializing keys? > > Perhaps YAGNI? We already support pickling values (dbshelve), > and I added support for encoding/decoding strings as either > keys or values (though in a limited manner). You don't need to go outside this thread to find a use case not covered by either of those. > In any case, somebody would have to make a specification > for that, and then somebody would have to provide an > implementation of it. It was just a suggestion. I wish this could occasionally go without saying. -j From martin at v.loewis.de Fri Aug 10 00:08:09 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 10 Aug 2007 00:08:09 +0200 Subject: [Python-3000] bytes regular expression? In-Reply-To: References: <200708090427.19830.victor.stinner@haypocalc.com> <200708091740.59070.victor.stinner@haypocalc.com> Message-ID: <46BB9049.7090406@v.loewis.de> >> I prefer str8 which looks to be a good candidate for "frozenbytes" type. >> > > I love this idea! Leave str8 as it is, maybe extend Python so that it understands > the s"..." literals and we are done. Please, no. Two string-like types with literals are enough. Regards, Martin From martin at v.loewis.de Fri Aug 10 00:16:35 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 10 Aug 2007 00:16:35 +0200 Subject: [Python-3000] IDLE in new py3k In-Reply-To: References: <87y7gkel09.fsf@hydra.hampton.thirdcreek.com> Message-ID: <46BB9243.60101@v.loewis.de> > But the reason we got here was that an AsString() call failed. Why? The only reason I can see is that PyObject_Str failed; that may happen if PyObject_Str fails. That, in turn, can happen for bytes objects if they are not UTF-8. I think _tkinter should get rid of AsString, and use the Tcl object API instead (not sure how to do that specifically, though) > Here's the traceback: Are you sure these are related? This traceback looks like a Tcl error - so what does that have to do with AsString? Regards, Martin From martin at v.loewis.de Fri Aug 10 00:36:15 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 10 Aug 2007 00:36:15 +0200 Subject: [Python-3000] idle3.0 - is is supposed to work? In-Reply-To: References: <87tzr8ei80.fsf@hydra.hampton.thirdcreek.com> Message-ID: <46BB96DF.5060305@v.loewis.de> > I get the same failure on OSX (PPC) and on Linux (x86 Ubuntu). It has > to do with the Tcl/Tk wrapping code, in particular it's in the > LEAVE_PYTHON macro on line 1995 on _tkinter.c. I'm not convinced. The actual failure is that "tag configure" is invoked with a None tagname (which then gets stripped through flatten, apparently). The ColorDelegator it originates from has these colors: [('COMMENT', {'foreground': '#dd0000', 'background': '#ffffff'}), ('DEFINITION', {'foreground': '#0000ff', 'background': '#ffffff'}), ('hit', {'foreground': '#ffffff', 'background': '#000000'}), ('STRING', {'foreground': '#00aa00', 'background': '#ffffff'}), ('KEYWORD', {'foreground': '#ff7700', 'background': '#ffffff'}), ('stdout', {'foreground': 'blue', 'background': '#ffffff'}), ('stdin', {'foreground': None, 'background': None}), ('SYNC', {'foreground': None, 'background': None}), ('BREAK', {'foreground': 'black', 'background': '#ffff55'}), ('BUILTIN', {'foreground': '#900090', 'background': '#ffffff'}), ('stderr', {'foreground': 'red', 'background': '#ffffff'}), ('ERROR', {'foreground': '#000000', 'background': '#ff7777'}), (None, {'foreground': '#000000', 'background': '#ffffff'}), ('console', {'foreground': '#770000', 'background': '#ffffff'}), ('TODO', {'foreground': None, 'background': None}) and invokes this code: for tag, cnf in self.tagdefs.items(): if cnf: self.tag_configure(tag, **cnf) so if None is a dictionary key (as it is), you get the error you see. Regards, Martin From guido at python.org Fri Aug 10 00:37:23 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Aug 2007 15:37:23 -0700 Subject: [Python-3000] IDLE in new py3k In-Reply-To: <46BB9243.60101@v.loewis.de> References: <87y7gkel09.fsf@hydra.hampton.thirdcreek.com> <46BB9243.60101@v.loewis.de> Message-ID: On 8/9/07, "Martin v. L?wis" wrote: > > But the reason we got here was that an AsString() call failed. Why? > > The only reason I can see is that PyObject_Str failed; that may happen > if PyObject_Str fails. That, in turn, can happen for bytes objects > if they are not UTF-8. > > I think _tkinter should get rid of AsString, and use the Tcl object > API instead (not sure how to do that specifically, though) > > > Here's the traceback: > > Are you sure these are related? This traceback looks like a Tcl > error - so what does that have to do with AsString? My only evidence is that after I fixed the segfault, this traceback appeared. Possibly something in IDLE is catching the AsString() error in a later stage. There may also be timing dependencies since idle makes heavy use of after(). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Fri Aug 10 00:45:28 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 10 Aug 2007 00:45:28 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <46BB816E.1070507@v.loewis.de> Message-ID: <46BB9908.8000704@v.loewis.de> Jason Orendorff schrieb: > On 8/9/07, "Martin v. L?wis" wrote: >>> Too true. Offhand, why not provide hooks for serializing and >>> deserializing keys? > > It was just a suggestion. I wish this could occasionally > go without saying. Perhaps using "I suggest" instead of asking "why not" would have clued me; English is not my native language, and I take questions as literally asking something. Normally, the answer to "why not do XYZ" is "because nobody has the time to do that", but too many people asking this specific question still haven't learned this, so I feel obliged to provide the obvious answer rather than ignoring the poster. This is in turn because I heard too many times "I posted this years ago, but nobody listened". In some cases, posters genuinely don't know that nobody else will work on it and that they need to become active if they want to see things happen. So telling them has the small chance that we get more contributions out of it than mere suggestions; this is well worth the time spent to tell people like you what they already knew. Regards, Martin From kbk at shore.net Fri Aug 10 00:44:35 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Thu, 09 Aug 2007 18:44:35 -0400 Subject: [Python-3000] idle3.0 - is is supposed to work? In-Reply-To: <46BB96DF.5060305@v.loewis.de> (Martin v. =?iso-8859-1?Q?L=F6?= =?iso-8859-1?Q?wis's?= message of "Fri, 10 Aug 2007 00:36:15 +0200") References: <87tzr8ei80.fsf@hydra.hampton.thirdcreek.com> <46BB96DF.5060305@v.loewis.de> Message-ID: <87ps1we3ak.fsf@hydra.hampton.thirdcreek.com> "Martin v. L?wis" writes: > I'm not convinced. The actual failure is that "tag configure" is invoked > with a None tagname (which then gets stripped through flatten, apparently). OTOH, IDLE ran w/o this error in p3yk... -- KBK From rrr at ronadam.com Fri Aug 10 00:50:50 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 09 Aug 2007 17:50:50 -0500 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46BB816E.1070507@v.loewis.de> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <46BB816E.1070507@v.loewis.de> Message-ID: <46BB9A4A.3070201@ronadam.com> Martin v. L?wis wrote: > Jason Orendorff schrieb: >> On 8/7/07, "Martin v. L?wis" wrote: >>> My concern is that people need to access existing databases. It's >>> all fine that the code accessing them breaks, and that they have >>> to actively port to Py3k. However, telling them that they have to >>> represent the keys in their dbm disk files in a different manner >>> might cause a revolt... >> Too true. Offhand, why not provide hooks for serializing and >> deserializing keys? > > Perhaps YAGNI? We already support pickling values (dbshelve), > and I added support for encoding/decoding strings as either > keys or values (though in a limited manner). > > In any case, somebody would have to make a specification > for that, and then somebody would have to provide an > implementation of it. > > Regards, > Martin Just a thought... Would some sort of an indirect reference type help. Possibly an object_id_based_reference as keys instead of using or hashing the object itself? This wouldn't change if the object mutates between accesses and could be immutable. Ron From guido at python.org Fri Aug 10 00:58:58 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Aug 2007 15:58:58 -0700 Subject: [Python-3000] infinite recursion with python -v In-Reply-To: References: Message-ID: I've checked a band-aid fix for this (r56878). The band-aid works by pre-importing the latin-1 codec (and also the utf-8 codec, just to be sure) *before* setting sys.stdout and sys.stderr (this happens in site.py, in installnewio()). This is a horrible hack though, and only works because, as long as sys.stderr isn't set, the call to PyFile_WriteString() in mywrite() (in sysmodule.c) returns a quick error, causing mywrite() to write directly to C's stderr. I've also checked in a change to PyFile_WriteString() to call PyUnicode_FromString() instead of PyString_FromString(), but that doesn't appear to make any difference (r56879). FWIW, I've attached an edited version of the traceback mailed by Neal; the email mangled the formatting too much. Maybe someone else has a bright idea. --Guido On 8/7/07, Neal Norwitz wrote: > The wiki seems to be done, so sorry for the spam. > > python -v crashes due to infinite recursion (well, it tried to be > infinite until it got a stack overflow :-) The problem seems to be > that Lib/encodings/latin_1.py is loaded, but it tries to be converted > to latin_1, so it tries to load the module, and ... Or something like > that. See below for a call stack. > > Minimal version: > > PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches > Lib/encodings/latin_1.py\n", f=) at Objects/fileobject.c:184 > mywrite (name= "stderr", fp=, format= "# %s matches %s\n", va=) at > Python/sysmodule.c:1350 > PySys_WriteStderr (format= "# %s matches %s\n") at Python/sysmodule.c:1380 > check_compiled_module (pathname= "Lib/encodings/latin_1.py", mtime=, > cpathname= "Lib/encodings/latin_1.pyc") at Python/import.c:755 > load_source_module (name= "encodings.latin_1", pathname= > "Lib/encodings/latin_1.py", fp=) at Python/import.c:938 > load_module (name= "encodings.latin_1", fp=,buf= > "Lib/encodings/latin_1.py", type=1, loader=) at Python/import.c:1733 > import_submodule (mod=, subname= "latin_1",fullname= > "encodings.latin_1") at Python/import.c:2418 > load_next (mod=,altmod=, p_name=,buf= "encodings.latin_1", p_buflen=) > at Python/import.c:2213 > import_module_level (name=, globals=, locals=, fromlist=, level=0) at > Python/import.c:1992 > PyImport_ImportModuleLevel (name= "encodings.latin_1", globals=, > locals=, fromlist=, level=0) at Python/import.c:2056 > builtin___import__ () at Python/bltinmodule.c:151 > [...] > _PyCodec_Lookup (encoding= "latin-1") at Python/codecs.c:147 > codec_getitem (encoding= "latin-1",index=0) at Python/codecs.c:211 > PyCodec_Encoder (encoding= "latin-1") at Python/codecs.c:275 > PyCodec_Encode (object=,encoding= "latin-1", errors=) at Python/codecs.c:322 > PyString_AsEncodedObject (str=,encoding= "latin-1", errors=) at > Objects/stringobject.c:459 > string_encode () at Objects/stringobject.c:3138 > [...] > PyFile_WriteObject (v=, f=, flags=1) at Objects/fileobject.c:159 > PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches > Lib/encodings/latin_1.py\n",f=) at Objects/fileobject.c:184 > > == Stack trace for python -v recursion (argument values are mostly trimmed) == > > PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches > Lib/encodings/latin_1.py\n", f=) at Objects/fileobject.c:184 > mywrite (name= "stderr", fp=, format= "# %s matches %s\n", va=) at > Python/sysmodule.c:1350 > PySys_WriteStderr (format= "# %s matches %s\n") at Python/sysmodule.c:1380 > check_compiled_module (pathname= "Lib/encodings/latin_1.py", mtime=, > cpathname= "Lib/encodings/latin_1.pyc") at Python/import.c:755 > load_source_module (name= "encodings.latin_1", pathname= > "Lib/encodings/latin_1.py", fp=) at Python/import.c:938 > load_module (name= "encodings.latin_1", fp=,buf= > "Lib/encodings/latin_1.py", type=1, loader=) at Python/import.c:1733 > import_submodule (mod=, subname= "latin_1",fullname= > "encodings.latin_1") at Python/import.c:2418 > load_next (mod=,altmod=, p_name=,buf= "encodings.latin_1", p_buflen=) > at Python/import.c:2213 > import_module_level (name=, globals=, locals=, fromlist=, level=0) at > Python/import.c:1992 > PyImport_ImportModuleLevel (name= "encodings.latin_1", globals=, > locals=, fromlist=, level=0) at Python/import.c:2056 > builtin___import__ () at Python/bltinmodule.c:151 > PyCFunction_Call () at Objects/methodobject.c:77 > PyObject_Call () at Objects/abstract.c:1736 > do_call () at Python/ceval.c:3764 > call_function (pp_stack=, oparg=513) at Python/ceval.c:3574 > PyEval_EvalFrameEx (f=, throwflag=0) at Python/ceval.c:2216 > PyEval_EvalCodeEx () at Python/ceval.c:2835 > function_call () at Objects/funcobject.c:634 > PyObject_Call () at Objects/abstract.c:1736 > PyEval_CallObjectWithKeywords () at Python/ceval.c:3431 > _PyCodec_Lookup (encoding= "latin-1") at Python/codecs.c:147 > codec_getitem (encoding= "latin-1",index=0) at Python/codecs.c:211 > PyCodec_Encoder (encoding= "latin-1") at Python/codecs.c:275 > PyCodec_Encode (object=,encoding= "latin-1", errors=) at Python/codecs.c:322 > PyString_AsEncodedObject (str=,encoding= "latin-1", errors=) at > Objects/stringobject.c:459 > string_encode () at Objects/stringobject.c:3138 > PyCFunction_Call () at Objects/methodobject.c:73 > call_function () at Python/ceval.c:3551 > PyEval_EvalFrameEx (f=, throwflag=0) at Python/ceval.c:2216 > PyEval_EvalCodeEx () at Python/ceval.c:2835 > function_call () at Objects/funcobject.c:634 > PyObject_Call () at Objects/abstract.c:1736 > method_call () at Objects/classobject.c:397 > PyObject_Call () at Objects/abstract.c:1736 > PyEval_CallObjectWithKeywords () at Python/ceval.c:3431 > PyFile_WriteObject (v=, f=, flags=1) at Objects/fileobject.c:159 > PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches > Lib/encodings/latin_1.py\n",f=) at Objects/fileobject.c:184 > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) -------------- next part -------------- short: PyFile_WriteString (s="# Lib/encodings/latin_1.pyc matches Lib/encodings/latin_1.py\n", f=) at Objects/fileobject.c:184 mywrite (name="stderr", fp=, format="# %s matches %s\n", va=) at Python/sysmodule.c:1350 PySys_WriteStderr (format="# %s matches %s\n") at Python/sysmodule.c:1380 check_compiled_module (pathname="Lib/encodings/latin_1.py", mtime=, cpathname="Lib/encodings/latin_1.pyc") at Python/import.c:755 load_source_module (name="encodings.latin_1", pathname="Lib/encodings/latin_1.py", fp=) at Python/import.c:938 load_module (name="encodings.latin_1", fp=,buf="Lib/encodings/latin_1.py", type=1, loader=) at Python/import.c:1733 import_submodule (mod=, subname="latin_1",fullname="encodings.latin_1") at Python/import.c:2418 load_next (mod=,altmod=, p_name=,buf="encodings.latin_1", p_buflen=) at Python/import.c:2213 import_module_level (name=, globals=, locals=, fromlist=, level=0) at Python/import.c:1992 PyImport_ImportModuleLevel (name="encodings.latin_1", globals=, locals=, fromlist=, level=0) at Python/import.c:2056 builtin___import__ () at Python/bltinmodule.c:151 [...] _PyCodec_Lookup (encoding="latin-1") at Python/codecs.c:147 codec_getitem (encoding="latin-1",index=0) at Python/codecs.c:211 PyCodec_Encoder (encoding="latin-1") at Python/codecs.c:275 PyCodec_Encode (object=,encoding="latin-1", errors=) at Python/codecs.c:322 PyString_AsEncodedObject (str=,encoding="latin-1", errors=) at Objects/stringobject.c:459 string_encode () at Objects/stringobject.c:3138 [...] PyFile_WriteObject (v=, f=, flags=1) at Objects/fileobject.c:159 PyFile_WriteString (s="# Lib/encodings/latin_1.pyc matches Lib/encodings/latin_1.py\n",f=) at Objects/fileobject.c:184 long: PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches Lib/encodings/latin_1.py\n", f=) at Objects/fileobject.c:184 mywrite (name= "stderr", fp=, format= "# %s matches %s\n", va=) at Python/sysmodule.c:1350 PySys_WriteStderr (format= "# %s matches %s\n") at Python/sysmodule.c:1380 check_compiled_module (pathname= "Lib/encodings/latin_1.py", mtime=, cpathname= "Lib/encodings/latin_1.pyc") at Python/import.c:755 load_source_module (name= "encodings.latin_1", pathname= "Lib/encodings/latin_1.py", fp=) at Python/import.c:938 load_module (name= "encodings.latin_1", fp=,buf= "Lib/encodings/latin_1.py", type=1, loader=) at Python/import.c:1733 import_submodule (mod=, subname= "latin_1",fullname= "encodings.latin_1") at Python/import.c:2418 load_next (mod=,altmod=, p_name=,buf= "encodings.latin_1", p_buflen=) at Python/import.c:2213 import_module_level (name=, globals=, locals=, fromlist=, level=0) at Python/import.c:1992 PyImport_ImportModuleLevel (name= "encodings.latin_1", globals=, locals=, fromlist=, level=0) at Python/import.c:2056 builtin___import__ () at Python/bltinmodule.c:151 PyCFunction_Call () at Objects/methodobject.c:77 PyObject_Call () at Objects/abstract.c:1736 do_call () at Python/ceval.c:3764 call_function (pp_stack=, oparg=513) at Python/ceval.c:3574 PyEval_EvalFrameEx (f=, throwflag=0) at Python/ceval.c:2216 PyEval_EvalCodeEx () at Python/ceval.c:2835 function_call () at Objects/funcobject.c:634 PyObject_Call () at Objects/abstract.c:1736 PyEval_CallObjectWithKeywords () at Python/ceval.c:3431 _PyCodec_Lookup (encoding= "latin-1") at Python/codecs.c:147 codec_getitem (encoding= "latin-1",index=0) at Python/codecs.c:211 PyCodec_Encoder (encoding= "latin-1") at Python/codecs.c:275 PyCodec_Encode (object=,encoding= "latin-1", errors=) at Python/codecs.c:322 PyString_AsEncodedObject (str=,encoding= "latin-1", errors=) at Objects/stringobject.c:459 string_encode () at Objects/stringobject.c:3138 PyCFunction_Call () at Objects/methodobject.c:73 call_function () at Python/ceval.c:3551 PyEval_EvalFrameEx (f=, throwflag=0) at Python/ceval.c:2216 PyEval_EvalCodeEx () at Python/ceval.c:2835 function_call () at Objects/funcobject.c:634 PyObject_Call () at Objects/abstract.c:1736 method_call () at Objects/classobject.c:397 PyObject_Call () at Objects/abstract.c:1736 PyEval_CallObjectWithKeywords () at Python/ceval.c:3431 PyFile_WriteObject (v=, f=, flags=1) at Objects/fileobject.c:159 PyFile_WriteString (s= "# Lib/encodings/latin_1.pyc matches Lib/encodings/latin_1.py\n",f=) at Objects/fileobject.c:184 From guido at python.org Fri Aug 10 00:59:49 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Aug 2007 15:59:49 -0700 Subject: [Python-3000] idle3.0 - is is supposed to work? In-Reply-To: <87ps1we3ak.fsf@hydra.hampton.thirdcreek.com> References: <87tzr8ei80.fsf@hydra.hampton.thirdcreek.com> <46BB96DF.5060305@v.loewis.de> <87ps1we3ak.fsf@hydra.hampton.thirdcreek.com> Message-ID: On 8/9/07, Kurt B. Kaiser wrote: > "Martin v. L?wis" writes: > > > I'm not convinced. The actual failure is that "tag configure" is invoked > > with a None tagname (which then gets stripped through flatten, apparently). > > OTOH, IDLE ran w/o this error in p3yk... Yeah, in the new branch there will be more occurrences of PyUnicode, which causes _tkinter.c to take different paths in many cases. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Fri Aug 10 01:01:43 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 10 Aug 2007 01:01:43 +0200 Subject: [Python-3000] idle3.0 - is is supposed to work? In-Reply-To: <87ps1we3ak.fsf@hydra.hampton.thirdcreek.com> References: <87tzr8ei80.fsf@hydra.hampton.thirdcreek.com> <46BB96DF.5060305@v.loewis.de> <87ps1we3ak.fsf@hydra.hampton.thirdcreek.com> Message-ID: <46BB9CD7.2030301@v.loewis.de> >> I'm not convinced. The actual failure is that "tag configure" is invoked >> with a None tagname (which then gets stripped through flatten, apparently). > > OTOH, IDLE ran w/o this error in p3yk... Yes. Somebody would have to study what precisely the problem is: is it that there is a None key in that dictionary, and that you must not use None as a tag name? In that case: where does the None come from? Or else: is it that you can use None as a tagname in 2.x, but can't anymore in 3.0? If so: why not? Regards, Martin From greg.ewing at canterbury.ac.nz Fri Aug 10 01:24:13 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 10 Aug 2007 11:24:13 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BAEFB0.9050400@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> Message-ID: <46BBA21D.9060403@canterbury.ac.nz> > Talin wrote: >>In other words, other than the special case of 'repr', we find that >>pretty much everything can fit into a single specifier string; I think there might still be merit in separating the field width and alignment spec, at least syntactically, since all format specs will have it and it should have a uniform syntax across all of them, and it would be good if it didn't have to be parsed by all the __format__ methods individually. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Fri Aug 10 02:15:51 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 10 Aug 2007 12:15:51 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46BB9908.8000704@v.loewis.de> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <46BB816E.1070507@v.loewis.de> <46BB9908.8000704@v.loewis.de> Message-ID: <46BBAE37.8090600@canterbury.ac.nz> Martin v. L?wis wrote: > Perhaps using "I suggest" instead of asking "why not" would > have clued me; English is not my native language, and I take > questions as literally asking something. Well, it's really a suggestion and a question. If there's some reason the suggestion is a bad idea, there's nothing wrong with pointing that out in a reply. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Fri Aug 10 02:18:25 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 10 Aug 2007 12:18:25 +1200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46BB9A4A.3070201@ronadam.com> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <46BB816E.1070507@v.loewis.de> <46BB9A4A.3070201@ronadam.com> Message-ID: <46BBAED1.8090700@canterbury.ac.nz> Ron Adam wrote: > Would some sort of an indirect reference type help. Possibly an > object_id_based_reference as keys instead of using or hashing the object > itself? This wouldn't change if the object mutates between accesses and > could be immutable. But if the object did mutate, the cached re would be out of date, and this wouldn't be noticed. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From kbk at shore.net Fri Aug 10 02:26:49 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Thu, 09 Aug 2007 20:26:49 -0400 Subject: [Python-3000] idle3.0 - is is supposed to work? In-Reply-To: <46BB9CD7.2030301@v.loewis.de> (Martin v. =?iso-8859-1?Q?L=F6?= =?iso-8859-1?Q?wis's?= message of "Fri, 10 Aug 2007 01:01:43 +0200") References: <87tzr8ei80.fsf@hydra.hampton.thirdcreek.com> <46BB96DF.5060305@v.loewis.de> <87ps1we3ak.fsf@hydra.hampton.thirdcreek.com> <46BB9CD7.2030301@v.loewis.de> Message-ID: <87lkckdyk6.fsf@hydra.hampton.thirdcreek.com> "Martin v. L?wis" writes: >> OTOH, IDLE ran w/o this error in p3yk... > > Yes. Somebody would have to study what precisely the problem is: is it > that there is a None key in that dictionary, and that you must not use > None as a tag name? In that case: where does the None come from? > Or else: is it that you can use None as a tagname in 2.x, but can't > anymore in 3.0? If so: why not? OK, I'll start looking at it. -- KBK From greg.ewing at canterbury.ac.nz Fri Aug 10 02:30:38 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 10 Aug 2007 12:30:38 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BAEFB0.9050400@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> Message-ID: <46BBB1AE.5010207@canterbury.ac.nz> Ron Adam wrote: > The way to think of 'repr' and 'str' is that of a general "object" format > type/specifier. That puts str and repr into the same context as the rest > of the format types. This is really a point of view issue and not so much > of a semantic one. I think {0:r} and {0:s} are to "object", as {0:d} and > {0:e} are to "float" ... just another relationship relative to the value > being formatted. So I don't understand the need to treat them differently. There's no need to treat 's' specially, but 'r' is different, at least if we want "{0:r}".format(x) to always mean the same thing as "{0:s}".format(repr(x)) To achieve that without requiring every __format__ method to recognise 'r' and handle it itself is going to require format() to intercept 'r' before calling the __format__ method, as far as I can see. It can't be done by str, since by the time str.__format__ gets called, the object has already been passed through str(), and it's too late to call repr() on it. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From eric+python-dev at trueblade.com Fri Aug 10 03:26:30 2007 From: eric+python-dev at trueblade.com (Eric V. Smith) Date: Thu, 09 Aug 2007 21:26:30 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46B5FBD9.4020301@acm.org> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> Message-ID: <46BBBEC6.5030705@trueblade.com> I'm just getting back from vacation and trying to catch up. I think I've caught the sense of the discussion, but forgive me if I haven't. Talin wrote: > The reason is that for some types, the __format__ method can define its > own interpretation of the format string which may include the letters > 'rtgd' as part of its regular syntax. Basically, he wants no constraints > on what __format__ is allowed to do. Why would this not be true for all types? Why have int's interpret "f", or other things that don't apply to int's? If you want: x = 3 "{0:f}".format(x) then be explicit and write: "{0:f}".format(float(x)) I realize it's a little more verbose, but now the __format__ functions only need to worry about what applies to their own type, and we get out of the business of deciding how and when to convert between ints and floats and decimals and whatever other types are involved. And once you decide that the entire specifier is interpreted only by the type, you no longer need the default specifier (":f" in this case), and you could just write: "{0}".format(float(x)) That is, since we already know the type, we don't need to specify the type in the specifier. Now the "d", or "x", or whatever could just be used by int's (for example), and only as needed. I grant that repr() might be a different case, as Greg Ewing points out in a subsequent message. But maybe we use something other than a colon to get __repr__ called, like: "{`0`}".format(float(x)) I'm kidding with the back-ticks, of course. Find some syntax which can be disambiguated from all specifiers. Maybe: "{0#}".format(float(x)) Or something similar. Eric. From rrr at ronadam.com Fri Aug 10 06:35:34 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 09 Aug 2007 23:35:34 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BBB1AE.5010207@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> Message-ID: <46BBEB16.2040205@ronadam.com> Greg Ewing wrote: > Ron Adam wrote: >> The way to think of 'repr' and 'str' is that of a general "object" format >> type/specifier. That puts str and repr into the same context as the rest >> of the format types. This is really a point of view issue and not so much >> of a semantic one. I think {0:r} and {0:s} are to "object", as {0:d} and >> {0:e} are to "float" ... just another relationship relative to the value >> being formatted. So I don't understand the need to treat them differently. > > There's no need to treat 's' specially, but 'r' is different, > at least if we want > > "{0:r}".format(x) > > to always mean the same thing as > > "{0:s}".format(repr(x)) > > To achieve that without requiring every __format__ method > to recognise 'r' and handle it itself is going to require > format() to intercept 'r' before calling the __format__ > method, as far as I can see. It can't be done by str, > since by the time str.__format__ gets called, the object > has already been passed through str(), and it's too late > to call repr() on it. This doesn't require a different syntax to do. Lets start at the top... what will the str.format() method look like? Maybe an approximation might be: (only one possible variation) class str(object): ... def format(self, *args, **kwds): return format(self, *args, **kwds) #calls global function. ... And then for each format field, it will call the __format__ method of the matching position or named value. class object(): ... def __format__(self, value, format_spec): return value, format_spec ... It doesn't actually do anything because it's a pre-format hook so that users can override the default behavior. An overridden __format__ method can do one of the following... - handle the format spec on it's own and return a (string, None) [*] - alter the value and return (new_value, format_spec) - alter the format_spec and return (value, new_format_spec) - do logging of some values, and return the (value, format_spec) unchanged. - do something entirely different and return ('', None) [* None could indicate an already formatted value in this case.] Does this look ok, or would you do it a different way? If we do it this way, then the 'r' formatter isn't handled any different than any of the others. The exceptional case is the custom formatters in __format__ methods not the 'r' case. Cheers, Ron From talin at acm.org Fri Aug 10 06:39:12 2007 From: talin at acm.org (Talin) Date: Thu, 09 Aug 2007 21:39:12 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46BB3CA4.9010904@cornell.edu> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <46BB3CA4.9010904@cornell.edu> Message-ID: <46BBEBF0.80508@acm.org> Joel Bender wrote: > Jason Orendorff wrote: > >> Hooks would help with that, or even eliminate the need altogether. > > IMHO, having a __bytes__ method would go a long way. This would be better done with generic functions once we have them. I general, I feel it's better not to embed knowledge of a particular serialization scheme in an object. Otherwise, you'd end up with every class having to know about 'pickle' and 'shelve' and 'marshall' and 'JSON' and 'serialize-to-XML' and every other weird serialization format that people come up with. Instead, this is exactly what GFs are good for, so that neither the object nor the serializer have to handle the N*M combinations of the two. Of course, this criticism also works against having a __str__ method, instead of simply defining 'str()' as a GF. And there is some validity to that point. But for historical reasons, we're not likely to change it. And there's also some validity to the argument that a 'printable' representation is the one universal converter that deserves special status. -- Talin From g.brandl at gmx.net Fri Aug 10 07:10:00 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 10 Aug 2007 07:10:00 +0200 Subject: [Python-3000] Console encoding detection broken Message-ID: Well, subject says it all. While 2.5 sets sys.std*.encoding correctly to UTF-8, 3k sets it to 'latin-1', breaking output of Unicode strings. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From nnorwitz at gmail.com Fri Aug 10 07:37:48 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Thu, 9 Aug 2007 22:37:48 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: On 8/9/07, Guido van Rossum wrote: > This is done. The new py3k branch is ready for business. > > Left to do: > > - switch the buildbot and the doc builder to use the new branch (Neal) I've updated to use the new branch. I got the docs building, but there are many more problems. I won't re-enable the cronjob until more things are working. > There are currently about 7 failing unit tests left: > > test_bsddb > test_bsddb3 > test_email > test_email_codecs > test_email_renamed > test_sqlite > test_urllib2_localnet Ok, I disabled these, so if only they fail, mail shouldn't be sent (when I enable the script). There are other problems: * had to kill test_poplib due to taking all cpu without progress * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3: test_foo test_bar ...) * at least one test fails with a fatal error * make install fails Here are the details (probably best to update the wiki with status before people start working on these): I'm not sure what was happening with test_poplib. I had to kill test_poplib due to taking all cpu without progress. When I ran it by itself, it was fine. So there was some bad interaction with another test. Ref leaks and fatal error (see http://docs.python.org/dev/3.0/results/make-test-refleak.out): test_array leaked [11, 11, 11] references, sum=33 test_bytes leaked [4, 4, 4] references, sum=12 test_codeccallbacks leaked [21, 21, 21] references, sum=63 test_codecs leaked [260, 260, 260] references, sum=780 test_ctypes leaked [10, 10, 10] references, sum=30 Fatal Python error: /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at 0xb60b19c8 has negative ref count -4 There are probably more, but I haven't had a chance to run more after test_datetime. This failure occurred while running with -R: test test_coding failed -- Traceback (most recent call last): File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", line 12, in test_bad_coding2 self.verify_bad_module(module_name) File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", line 20, in verify_bad_module text = fp.read() File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read res += decoder.decode(self.buffer.read(), True) File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128) See http://docs.python.org/dev/3.0/results/make-install.out for this failure: Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ... Traceback (most recent call last): File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line 162, in exit_status = int(not main()) File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line 152, in main force, rx, quiet): File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line 89, in compile_dir if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet): File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line 65, in compile_dir ok = py_compile.compile(fullname, None, dfile, True) File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line 144, in compile py_exc = PyCompileError(err.__class__,err.args,dfile or file) File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line 49, in __init__ tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value)) File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line 179, in format_exception_only filename = value.filename or "" AttributeError: 'tuple' object has no attribute 'filename' I'm guessing this came from the change in exception args handling? File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line 144, in compile py_exc = PyCompileError(err.__class__,err.args,dfile or file) n From theller at ctypes.org Fri Aug 10 07:49:51 2007 From: theller at ctypes.org (Thomas Heller) Date: Fri, 10 Aug 2007 07:49:51 +0200 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: Neal Norwitz schrieb: > On 8/9/07, Guido van Rossum wrote: >> This is done. The new py3k branch is ready for business. >> >> Left to do: >> >> - switch the buildbot and the doc builder to use the new branch (Neal) Shouldn't there be a py3k buildbot at http://www.python.org/dev/buildbot/ as well? Thomas From nnorwitz at gmail.com Fri Aug 10 08:15:43 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Thu, 9 Aug 2007 23:15:43 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: On 8/9/07, Thomas Heller wrote: > Neal Norwitz schrieb: > > On 8/9/07, Guido van Rossum wrote: > >> This is done. The new py3k branch is ready for business. > >> > >> Left to do: > >> > >> - switch the buildbot and the doc builder to use the new branch (Neal) > > Shouldn't there be a py3k buildbot at http://www.python.org/dev/buildbot/ > as well? I plan to add one, but things still need to settle down more first. As long as there are failing tests, it's not too worthwhile. Plus with all the other failures. My plan is to get things working really well on one platform first and then enable the buildbots. We still have quite a bit of work to do (see my previous mail in this thread). n From martin at v.loewis.de Fri Aug 10 08:17:32 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 10 Aug 2007 08:17:32 +0200 Subject: [Python-3000] Console encoding detection broken In-Reply-To: References: Message-ID: <46BC02FC.6080107@v.loewis.de> Georg Brandl schrieb: > Well, subject says it all. While 2.5 sets sys.std*.encoding correctly to > UTF-8, 3k sets it to 'latin-1', breaking output of Unicode strings. And not surprisingly so: io.py says if encoding is None: # XXX This is questionable encoding = sys.getfilesystemencoding() or "latin-1" First, at the point where this call is made, sys.getfilesystemencoding is still None, plus the code is broken as getfilesystemencoding is not the correct value for sys.stdout.encoding. Instead, the way it should be computed is: 1. On Unix, use the same value that sys.getfilesystemencoding will get, namely the result of nl_langinfo(CODESET); if that is not available, fall back - to anything, but the most logical choices are UTF-8 (if you want output to always succeed) and ASCII (if you don't want to risk mojibake). 2. On Windows, if output is to a terminal, use GetConsoleOutputCP. Else fall back, probably to CP_ACP (ie. "mbcs") 3. On OSX, I don't know. If output is to a terminal, UTF-8 may be a good bet (although some people operate their Terminal.apps not in UTF-8; there is no way to find out). Otherwise, use the locale's encoding - not sure how to find out what that is. Regards, Martin From nnorwitz at gmail.com Fri Aug 10 08:31:13 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Thu, 9 Aug 2007 23:31:13 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: I wonder if a lot of the refleaks may have the same cause as this one: b'\xff'.decode("utf8", "ignore") No leaks jumped out at me. Here is the rest of the leaks that have been reported so far. I don't know how many have the same cause. test_multibytecodec leaked [72, 72, 72] references, sum=216 test_parser leaked [5, 5, 5] references, sum=15 The other failures that occurred with -R: test test_collections failed -- errors occurred; run in verbose mode for details test test_gzip failed -- Traceback (most recent call last): File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in test_many_append ztxt = zgfile.read(8192) File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read self._read(readsize) File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read self._read_eof() File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof crc32 = read32(self.fileobj) File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32 return struct.unpack(" wrote: > On 8/9/07, Guido van Rossum wrote: > > This is done. The new py3k branch is ready for business. > > > > Left to do: > > > > - switch the buildbot and the doc builder to use the new branch (Neal) > > I've updated to use the new branch. I got the docs building, but > there are many more problems. I won't re-enable the cronjob until > more things are working. > > > There are currently about 7 failing unit tests left: > > > > test_bsddb > > test_bsddb3 > > test_email > > test_email_codecs > > test_email_renamed > > test_sqlite > > test_urllib2_localnet > > Ok, I disabled these, so if only they fail, mail shouldn't be sent > (when I enable the script). > > There are other problems: > * had to kill test_poplib due to taking all cpu without progress > * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3: > test_foo test_bar ...) > * at least one test fails with a fatal error > * make install fails > > Here are the details (probably best to update the wiki with status > before people start working on these): > > I'm not sure what was happening with test_poplib. I had to kill > test_poplib due to taking all cpu without progress. When I ran it by > itself, it was fine. So there was some bad interaction with another > test. > > Ref leaks and fatal error (see > http://docs.python.org/dev/3.0/results/make-test-refleak.out): > test_array leaked [11, 11, 11] references, sum=33 > test_bytes leaked [4, 4, 4] references, sum=12 > test_codeccallbacks leaked [21, 21, 21] references, sum=63 > test_codecs leaked [260, 260, 260] references, sum=780 > test_ctypes leaked [10, 10, 10] references, sum=30 > Fatal Python error: > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at > 0xb60b19c8 has negative ref count -4 > > There are probably more, but I haven't had a chance to run more after > test_datetime. > > This failure occurred while running with -R: > > test test_coding failed -- Traceback (most recent call last): > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > line 12, in test_bad_coding2 > self.verify_bad_module(module_name) > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > line 20, in verify_bad_module > text = fp.read() > File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read > res += decoder.decode(self.buffer.read(), True) > File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py", > line 26, in decode > return codecs.ascii_decode(input, self.errors)[0] > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position > 0: ordinal not in range(128) > > > See http://docs.python.org/dev/3.0/results/make-install.out for this failure: > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ... > Traceback (most recent call last): > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > 162, in > exit_status = int(not main()) > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > 152, in main > force, rx, quiet): > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > 89, in compile_dir > if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet): > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > 65, in compile_dir > ok = py_compile.compile(fullname, None, dfile, True) > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > 144, in compile > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > 49, in __init__ > tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value)) > File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line > 179, in format_exception_only > filename = value.filename or "" > AttributeError: 'tuple' object has no attribute 'filename' > > I'm guessing this came from the change in exception args handling? > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > 144, in compile > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > n > From martin at v.loewis.de Fri Aug 10 08:55:34 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 10 Aug 2007 08:55:34 +0200 Subject: [Python-3000] idle3.0 - is is supposed to work? In-Reply-To: <87lkckdyk6.fsf@hydra.hampton.thirdcreek.com> References: <87tzr8ei80.fsf@hydra.hampton.thirdcreek.com> <46BB96DF.5060305@v.loewis.de> <87ps1we3ak.fsf@hydra.hampton.thirdcreek.com> <46BB9CD7.2030301@v.loewis.de> <87lkckdyk6.fsf@hydra.hampton.thirdcreek.com> Message-ID: <46BC0BE6.90908@v.loewis.de> >>> OTOH, IDLE ran w/o this error in p3yk... >> Yes. Somebody would have to study what precisely the problem is: is it >> that there is a None key in that dictionary, and that you must not use >> None as a tag name? In that case: where does the None come from? >> Or else: is it that you can use None as a tagname in 2.x, but can't >> anymore in 3.0? If so: why not? > > OK, I'll start looking at it. So did I, somewhat. It looks like a genuine bug in IDLE to me: you can't use None as a tag name, AFAIU. I'm not quite sure why this doesn't cause an exception in 2.x; if I try to give a None tag separately (i.e. in a stand-alone program) in 2.5, it gives me the same exception. Regards, Martin From greg.ewing at canterbury.ac.nz Fri Aug 10 09:21:04 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 10 Aug 2007 19:21:04 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BBBEC6.5030705@trueblade.com> References: <46B13ADE.7080901@acm.org> <05A9B19D-8D18-411A-B881-3EC0852CAC9A@nicko.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com> Message-ID: <46BC11E0.9040201@canterbury.ac.nz> Eric V. Smith wrote: > If you want: > > x = 3 > "{0:f}".format(x) > > then be explicit and write: > > "{0:f}".format(float(x)) That would be quite inconvenient, I think. It's very common to use ints and floats interchangeably in contexts which are conceptually float. The rest of the language facilitates this, and it would be a nuisance if it didn't extend to formatting. So I think that ints and floats should know a little about each other's format specs, just enough to know when to delegate to each other. > And once you decide that the entire specifier is interpreted only by the > type, you no longer need the default specifier (":f" in this case), and > you could just write: > "{0}".format(float(x)) That can be done. > I grant that repr() might be a different case, as Greg Ewing points out > in a subsequent message. But maybe we use something other than a colon > to get __repr__ called, like: > "{`0`}".format(float(x)) I'm inclined to think the best thing is just to declare that 'r' is special and gets intercepted by format(). It means that __format__ methods don't *quite* get complete control, but I think it would be a practical solution. Another way would be to ditch 'r' completely and just tell people to wrap repr() around their arguments if they want it. That might be seen as a backward step in terms of convenience, though. -- Greg From nnorwitz at gmail.com Fri Aug 10 09:31:29 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Fri, 10 Aug 2007 00:31:29 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: Bah, who needs sleep anyways. This list of problems should be fairly complete when running with -R. (it skips the fatal error from test_datetime though) Code to trigger a leak: b'\xff'.decode("utf8", "ignore") Leaks: test_array leaked [11, 11, 11] references, sum=33 test_bytes leaked [4, 4, 4] references, sum=12 test_codeccallbacks leaked [21, 21, 21] references, sum=63 test_codecs leaked [260, 260, 260] references, sum=780 test_ctypes leaked [-22, 43, 10] references, sum=31 test_multibytecodec leaked [72, 72, 72] references, sum=216 test_parser leaked [5, 5, 5] references, sum=15 test_unicode leaked [4, 4, 4] references, sum=12 test_xml_etree leaked [128, 128, 128] references, sum=384 test_xml_etree_c leaked [128, 128, 128] references, sum=384 test_zipimport leaked [29, 29, 29] references, sum=87 Failures with -R: test test_collections failed -- errors occurred; run in verbose mode for details test test_gzip failed -- Traceback (most recent call last): File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in test_many_append ztxt = zgfile.read(8192) File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read self._read(readsize) File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read self._read_eof() File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof crc32 = read32(self.fileobj) File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32 return struct.unpack(" wrote: > I wonder if a lot of the refleaks may have the same cause as this one: > > b'\xff'.decode("utf8", "ignore") > > No leaks jumped out at me. Here is the rest of the leaks that have > been reported so far. I don't know how many have the same cause. > > test_multibytecodec leaked [72, 72, 72] references, sum=216 > test_parser leaked [5, 5, 5] references, sum=15 > > The other failures that occurred with -R: > > test test_collections failed -- errors occurred; run in verbose mode for details > > test test_gzip failed -- Traceback (most recent call last): > File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in > test_many_append > ztxt = zgfile.read(8192) > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read > self._read(readsize) > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read > self._read_eof() > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof > crc32 = read32(self.fileobj) > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32 > return struct.unpack(" File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack > return o.unpack(s) > struct.error: unpack requires a string argument of length 4 > > test test_runpy failed -- Traceback (most recent call last): > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230, > in test_run_module > self._check_module(depth) > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168, > in _check_module > d2 = run_module(mod_name) # Read from bytecode > File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module > raise ImportError("No module named %s" % mod_name) > ImportError: No module named runpy_test > > test_textwrap was the last test to complete. test_thread was still running. > > n > -- > On 8/9/07, Neal Norwitz wrote: > > On 8/9/07, Guido van Rossum wrote: > > > This is done. The new py3k branch is ready for business. > > > > > > Left to do: > > > > > > - switch the buildbot and the doc builder to use the new branch (Neal) > > > > I've updated to use the new branch. I got the docs building, but > > there are many more problems. I won't re-enable the cronjob until > > more things are working. > > > > > There are currently about 7 failing unit tests left: > > > > > > test_bsddb > > > test_bsddb3 > > > test_email > > > test_email_codecs > > > test_email_renamed > > > test_sqlite > > > test_urllib2_localnet > > > > Ok, I disabled these, so if only they fail, mail shouldn't be sent > > (when I enable the script). > > > > There are other problems: > > * had to kill test_poplib due to taking all cpu without progress > > * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3: > > test_foo test_bar ...) > > * at least one test fails with a fatal error > > * make install fails > > > > Here are the details (probably best to update the wiki with status > > before people start working on these): > > > > I'm not sure what was happening with test_poplib. I had to kill > > test_poplib due to taking all cpu without progress. When I ran it by > > itself, it was fine. So there was some bad interaction with another > > test. > > > > Ref leaks and fatal error (see > > http://docs.python.org/dev/3.0/results/make-test-refleak.out): > > test_array leaked [11, 11, 11] references, sum=33 > > test_bytes leaked [4, 4, 4] references, sum=12 > > test_codeccallbacks leaked [21, 21, 21] references, sum=63 > > test_codecs leaked [260, 260, 260] references, sum=780 > > test_ctypes leaked [10, 10, 10] references, sum=30 > > Fatal Python error: > > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at > > 0xb60b19c8 has negative ref count -4 > > > > There are probably more, but I haven't had a chance to run more after > > test_datetime. > > > > This failure occurred while running with -R: > > > > test test_coding failed -- Traceback (most recent call last): > > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > > line 12, in test_bad_coding2 > > self.verify_bad_module(module_name) > > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > > line 20, in verify_bad_module > > text = fp.read() > > File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read > > res += decoder.decode(self.buffer.read(), True) > > File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py", > > line 26, in decode > > return codecs.ascii_decode(input, self.errors)[0] > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position > > 0: ordinal not in range(128) > > > > > > See http://docs.python.org/dev/3.0/results/make-install.out for this failure: > > > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ... > > Traceback (most recent call last): > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > 162, in > > exit_status = int(not main()) > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > 152, in main > > force, rx, quiet): > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > 89, in compile_dir > > if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet): > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > 65, in compile_dir > > ok = py_compile.compile(fullname, None, dfile, True) > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > 144, in compile > > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > 49, in __init__ > > tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value)) > > File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line > > 179, in format_exception_only > > filename = value.filename or "" > > AttributeError: 'tuple' object has no attribute 'filename' > > > > I'm guessing this came from the change in exception args handling? > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > 144, in compile > > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > > > n > > > From greg.ewing at canterbury.ac.nz Fri Aug 10 09:32:09 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 10 Aug 2007 19:32:09 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BBEB16.2040205@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> Message-ID: <46BC1479.30405@canterbury.ac.nz> Ron Adam wrote: > - alter the value and return (new_value, format_spec) > - alter the format_spec and return (value, new_format_spec) > - do logging of some values, and return the (value, format_spec) > unchanged. I would ditch all of these. They're not necessary, as the same effect can be achieved by explicitly calling another __format__ method, or one's own __format__ method with different args, and returning the result. > - do something entirely different and return ('', None) I don't understand. What is meant to happen in that case? > Does this look ok, or would you do it a different way? You haven't explained how this addresses the 'r' issue without requiring every __format__ method to recognise and deal with it. -- Greg From brett at python.org Fri Aug 10 09:45:33 2007 From: brett at python.org (Brett Cannon) Date: Fri, 10 Aug 2007 00:45:33 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: On 8/9/07, Neal Norwitz wrote: [SNIP] > See http://docs.python.org/dev/3.0/results/make-install.out for this failure: > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ... > Traceback (most recent call last): > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > 162, in > exit_status = int(not main()) > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > 152, in main > force, rx, quiet): > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > 89, in compile_dir > if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet): > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > 65, in compile_dir > ok = py_compile.compile(fullname, None, dfile, True) > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > 144, in compile > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > 49, in __init__ > tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value)) > File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line > 179, in format_exception_only > filename = value.filename or "" > AttributeError: 'tuple' object has no attribute 'filename' > > I'm guessing this came from the change in exception args handling? What change are you thinking of? 'args' was not changed, only the removal of 'message'. -Brett From lists at cheimes.de Fri Aug 10 10:57:26 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 10 Aug 2007 10:57:26 +0200 Subject: [Python-3000] tp_bytes and __bytes__ magic method In-Reply-To: References: Message-ID: Guido van Rossum wrote: > This could just as well be done using a method on that specific > object. I don't think having to write x.as_bytes() is worse than > bytes(x), *unless* there are contexts where it's important to convert > something to bytes without knowing what kind of thing it is. For > str(), such a context exists: print(). For bytes(), I'm not so sure. > The use cases given here seem to be either very specific to a certain > class, or could be solved using other generic APIs like pickling. I see your point. Since nobody else beside Victor and me are interested in __bytes__ I retract my proposal. Thanks for your time. Christian From lists at cheimes.de Fri Aug 10 11:20:23 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 10 Aug 2007 11:20:23 +0200 Subject: [Python-3000] No (C) optimization flag Message-ID: Good morning py3k-dev! If I understand correctly the new C optimization for io,py by Alexandre Vassalotti and possible other optimization for modules likes pickle.py are going to be dropped in automatically. The Python implementation is a reference implementation and will be used as fall back only. On the one hand it is an improvement. We are getting instant optimization without teaching people to use a cFoo module. But on the other hand it is going to make debugging with pdb much harder because pdb can't step into C code. I like to propose a --disable-optimization (-N for no optimization) flag for Python that disables the usage of optimized implementation. The status of the flag can be set by either a command line argument or a C function call before Py_Initialize() and it can be queried by sys.getoptimization(). It's not possible to chance the flag during runtime. That should make the code simple and straight forward. When the flag is set modules like io and pickle must not use their optimized version and fall back to their Python implementation. I'm willing to give it a try writing the necessary code myself. I think I'm familiar enough with the Python C API after my work on PythonNet for this simple task. Christian From victor.stinner at haypocalc.com Fri Aug 10 12:09:30 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 10 Aug 2007 12:09:30 +0200 Subject: [Python-3000] bytes regular expression? In-Reply-To: References: <200708090427.19830.victor.stinner@haypocalc.com> <200708091740.59070.victor.stinner@haypocalc.com> Message-ID: <200708101209.30281.victor.stinner@haypocalc.com> On Thursday 09 August 2007 19:21:27 Thomas Heller wrote: > Victor Stinner schrieb: > > I prefer str8 which looks to be a good candidate for "frozenbytes" type. > > I love this idea! Leave str8 as it is, maybe extend Python so that it > understands the s"..." literals and we are done. Hum, today str8 is between bytes and str types. str8 has more methods (eg. lower()) than bytes, its behaviour is different in comparaison (b'a' != 'a' but str8('a') == 'a') and issubclass(str8, basestring) is True. I think that a frozenbytes type is required for backward compatibility (in Python 2.x, "a" is immutable). Eg. use bytes as key for a dictionary (looks to be needed in re and dbm modules). Victor Stinner aka haypo http://hachoir.org/ From walter at livinglogic.de Fri Aug 10 12:12:44 2007 From: walter at livinglogic.de (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=) Date: Fri, 10 Aug 2007 12:12:44 +0200 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: <46BC3A1C.205@livinglogic.de> Neal Norwitz wrote: > [...] > Code to trigger a leak: b'\xff'.decode("utf8", "ignore") This should be fixed in r56894. > [...] Servus, Walter From jimjjewett at gmail.com Fri Aug 10 16:27:07 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 10 Aug 2007 10:27:07 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BBBEC6.5030705@trueblade.com> References: <46B13ADE.7080901@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com> Message-ID: On 8/9/07, Eric V. Smith wrote: > If you want: > > x = 3 > "{0:f}".format(x) > > then be explicit and write: > > "{0:f}".format(float(x)) Because then you can't really create formatting strings. Instead of >>> print("The high temperature at {place:s}, on {date:YYYY-MM-DD} was {temp:f}" % tempsdict) >>> print("{name} scored {score:f}" % locals()) You would have to write >>> _tempsdict_copy = dict(tempsdict) >>> _tempsdict_copy['place'] = str(_tempsdict_copy['place']) >>> _tempsdict_copy['date'] = ... datetime.date(_tempsdict_copy['date']).isoformat() >>> _tempsdict_copy['temp'] = float(_tempsdict_copy['temp']) >>> print("The high temperature at {place}, on {date} was {temp}" % _tempsdict_copy) >>> _f_score = float(score) >>> print("{name} scored {score}" % locals()) -jJ From jimjjewett at gmail.com Fri Aug 10 16:33:21 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 10 Aug 2007 10:33:21 -0400 Subject: [Python-3000] No (C) optimization flag In-Reply-To: References: Message-ID: On 8/10/07, Christian Heimes wrote: > I like to propose a --disable-optimization (-N for no optimization) flag > for Python that disables the usage of optimized implementation. The > status of the flag can be set by either a command line argument or a C > function call before Py_Initialize() and it can be queried by > sys.getoptimization(). It's not possible to chance the flag during > runtime. That should make the code simple and straight forward. So you want it global and not per-module? It strikes me as something that ought to be controllable at a finer-grained level, if only to ensure that regression tests continue to also test the python version automatically. -jJ From guido at python.org Fri Aug 10 16:44:14 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Aug 2007 07:44:14 -0700 Subject: [Python-3000] No (C) optimization flag In-Reply-To: References: Message-ID: If you really need to step through the Python code, you can just sabotage the loading of the non-Python version, e.g. remove or rename the .so or .dll file temporarily. I wonder about the usefulness of this debugging though -- if you're debugging something that requires you to step through the C code, how do you know that the same bug is present in the Python code you're stepping through instead? Otherwise (if you're debugging a bug in your own program) I'm not sure I see how stepping through the I/O library is helpful. Sounds like what you're really after is *understanding* how the I/O library works. For that, perhaps reading the docs and then reading the source code would be more effective. --Guido On 8/10/07, Christian Heimes wrote: > Good morning py3k-dev! > > If I understand correctly the new C optimization for io,py by Alexandre > Vassalotti and possible other optimization for modules likes pickle.py > are going to be dropped in automatically. The Python implementation is a > reference implementation and will be used as fall back only. > > On the one hand it is an improvement. We are getting instant > optimization without teaching people to use a cFoo module. But on the > other hand it is going to make debugging with pdb much harder because > pdb can't step into C code. > > I like to propose a --disable-optimization (-N for no optimization) flag > for Python that disables the usage of optimized implementation. The > status of the flag can be set by either a command line argument or a C > function call before Py_Initialize() and it can be queried by > sys.getoptimization(). It's not possible to chance the flag during > runtime. That should make the code simple and straight forward. > > When the flag is set modules like io and pickle must not use their > optimized version and fall back to their Python implementation. I'm > willing to give it a try writing the necessary code myself. I think I'm > familiar enough with the Python C API after my work on PythonNet for > this simple task. > > Christian > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From eric+python-dev at trueblade.com Fri Aug 10 17:26:55 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Fri, 10 Aug 2007 11:26:55 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: References: <46B13ADE.7080901@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com> Message-ID: <46BC83BF.3000407@trueblade.com> Jim Jewett wrote: > On 8/9/07, Eric V. Smith wrote: > >> If you want: >> >> x = 3 >> "{0:f}".format(x) >> >> then be explicit and write: >> >> "{0:f}".format(float(x)) > > Because then you can't really create formatting strings. Instead of > > >>> print("The high temperature at {place:s}, on {date:YYYY-MM-DD} > was {temp:f}" % tempsdict) > >>> print("{name} scored {score:f}" % locals()) > > You would have to write > > >>> _tempsdict_copy = dict(tempsdict) > >>> _tempsdict_copy['place'] = str(_tempsdict_copy['place']) > >>> _tempsdict_copy['date'] = > ... datetime.date(_tempsdict_copy['date']).isoformat() > >>> _tempsdict_copy['temp'] = float(_tempsdict_copy['temp']) > >>> print("The high temperature at {place}, on {date} was {temp}" > % _tempsdict_copy) > > >>> _f_score = float(score) > >>> print("{name} scored {score}" % locals()) > > -jJ > I concede your point that while using dictionaries it's convenient not to have to convert types manually. However, your date example wouldn't require conversion to a string, since YYYY-MM would just be passed to datetime.date.__format__(). And "{score}" in your second example would need to be changed to "{_f_score}". Anyway, if we're keeping conversions, I see two approaches: 1: "".format() (or Talin's format_field, actually) understands which types can be converted to other types, and does the conversions. This is how Patrick and I wrote the original PEP 3101 sandbox prototype. 2: each type's __format__ function understands how to convert to some subset of all types (int can convert to float and decimal, for example). I was going to argue for approach 2, but after describing it, it became too difficult to understand, and I think I'll instead argue for approach 1. The problem with approach 2 is that there's logic in int.__format__() that understands float.__format__() specifiers, and vice-versa. At least with approach 1, all of this logic is in one place. So I think format_field() has logic like: def format_field(value, specifier): # handle special repr case if is_repr_specifier(specifier): return value.__repr__() # handle special string case if is_string_specifier(specifier): return str(value).__format__(specifier) # handle built-in conversions if (isinstance(value, (float, basestring)) and is_int_specifier(specifier)): return int(value).__format__(specifier) if (isinstance(value, (int, basestring) and is_float_specifier(specifier)): return float(value).__format__(specifier) # handle all other cases return value.__format__(specifier) This implies that string and repr specifiers are discernible across all types, and int and float specifiers are unique amongst themselves. The trick, of course, is what's in is_XXX_specifier. I don't know enough about decimal to know if it's possible or desirable to automatically convert it to other types, or other types to it. Eric. From nnorwitz at gmail.com Fri Aug 10 18:27:01 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Fri, 10 Aug 2007 09:27:01 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: On 8/10/07, Brett Cannon wrote: > On 8/9/07, Neal Norwitz wrote: > [SNIP] > > See http://docs.python.org/dev/3.0/results/make-install.out for this failure: > > > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ... > > Traceback (most recent call last): > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > 162, in > > exit_status = int(not main()) > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > 152, in main > > force, rx, quiet): > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > 89, in compile_dir > > if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet): > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > 65, in compile_dir > > ok = py_compile.compile(fullname, None, dfile, True) > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > 144, in compile > > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > 49, in __init__ > > tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value)) > > File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line > > 179, in format_exception_only > > filename = value.filename or "" > > AttributeError: 'tuple' object has no attribute 'filename' > > > > I'm guessing this came from the change in exception args handling? > > What change are you thinking of? 'args' was not changed, only the > removal of 'message'. That was probably the change I was thinking of. Though wasn't there also a change with unpacking args when catching an exception? I didn't dig into this problem or the code, it was a guess so could be totally off. I was really more thinking out loud (hence the question). Hoping it might trigger some better ideas (or get people looking into the problem). n From ncoghlan at gmail.com Fri Aug 10 18:36:16 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Aug 2007 02:36:16 +1000 Subject: [Python-3000] No (C) optimization flag In-Reply-To: References: Message-ID: <46BC9400.70803@gmail.com> Guido van Rossum wrote: > If you really need to step through the Python code, you can just > sabotage the loading of the non-Python version, e.g. remove or rename > the .so or .dll file temporarily. > > I wonder about the usefulness of this debugging though -- if you're > debugging something that requires you to step through the C code, how > do you know that the same bug is present in the Python code you're > stepping through instead? Otherwise (if you're debugging a bug in your > own program) I'm not sure I see how stepping through the I/O library > is helpful. > > Sounds like what you're really after is *understanding* how the I/O > library works. For that, perhaps reading the docs and then reading the > source code would be more effective. However we select between Python and native module versions, the build bots need be set up to run the modules both ways (with and without C optimisation). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From rhamph at gmail.com Fri Aug 10 19:02:43 2007 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 10 Aug 2007 11:02:43 -0600 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BC83BF.3000407@trueblade.com> References: <46B13ADE.7080901@acm.org> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com> <46BC83BF.3000407@trueblade.com> Message-ID: On 8/10/07, Eric Smith wrote: > Anyway, if we're keeping conversions, I see two approaches: > > 1: "".format() (or Talin's format_field, actually) understands which > types can be converted to other types, and does the conversions. This > is how Patrick and I wrote the original PEP 3101 sandbox prototype. > > 2: each type's __format__ function understands how to convert to some > subset of all types (int can convert to float and decimal, for example). I feel I must be missing something obvious here, but could somebody explain the problem with __format__ returning NotImplemented to mean "use a fallback"? It seems like it'd have the advantages of both, ie repr, str, and several other formats are automatic, while it's still possible to override any format or create new ones. -- Adam Olsen, aka Rhamphoryncus From guido at python.org Fri Aug 10 19:26:13 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Aug 2007 10:26:13 -0700 Subject: [Python-3000] Console encoding detection broken In-Reply-To: <46BC02FC.6080107@v.loewis.de> References: <46BC02FC.6080107@v.loewis.de> Message-ID: On 8/9/07, "Martin v. L?wis" wrote: > Georg Brandl schrieb: > > Well, subject says it all. While 2.5 sets sys.std*.encoding correctly to > > UTF-8, 3k sets it to 'latin-1', breaking output of Unicode strings. > > And not surprisingly so: io.py says > > if encoding is None: > # XXX This is questionable > encoding = sys.getfilesystemencoding() or "latin-1" Guilty as charged. Alas, I don't know much about the machinery of console and filesystem encodings, so I need help! > First, at the point where this call is made, sys.getfilesystemencoding > is still None, What can we do about this? Set it earlier? It should really be set by the time site.py is imported (which sets sys.stdin/out/err), as this is the first time a lot of Python code is run that touches the filesystem (e.g. sys.path mangling). > plus the code is broken as getfilesystemencoding is not > the correct value for sys.stdout.encoding. Instead, the way it should > be computed is: > > 1. On Unix, use the same value that sys.getfilesystemencoding will get, > namely the result of nl_langinfo(CODESET); if that is not available, > fall back - to anything, but the most logical choices are UTF-8 > (if you want output to always succeed) and ASCII (if you don't want > to risk mojibake). > 2. On Windows, if output is to a terminal, use GetConsoleOutputCP. > Else fall back, probably to CP_ACP (ie. "mbcs") > 3. On OSX, I don't know. If output is to a terminal, UTF-8 may be > a good bet (although some people operate their Terminal.apps > not in UTF-8; there is no way to find out). Otherwise, use the > locale's encoding - not sure how to find out what that is. Feel free to add code that implements this. I suppose it would be a good idea to have a separate function io.guess_console_encoding(...) which takes some argument (perhaps a raw file?) and returns an encoding name, never None. This could then be implemented by switching on the platform into platform-specific functions and a default. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 10 19:56:00 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Aug 2007 10:56:00 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: On 8/9/07, Neal Norwitz wrote: > > There are currently about 7 failing unit tests left: > > > > test_bsddb Looks like this (trivial) test now passes, at least on the one box I have where it isn't skipped (an ancient red hat 7.3 box that just won't die :-). > > test_bsddb3 FWIW, this one *hangs* for me on the only box where I have the right environment to build bsddb, after running many tests successfully. > There are other problems: > * had to kill test_poplib due to taking all cpu without progress I tried to reproduce this but couldn't (using a debug build). Is it reproducible for you? What are the exact arguments you pass to Python and regrtest and what platform did you use? > * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3: > test_foo test_bar ...) Did Walter's submit fix these? If not, can you provide more details of the remaining leaks? > * at least one test fails with a fatal error I'll look into this. > * make install fails I think I fixed this. The traceback module no longer likes getting a tuple instead of an Exception instance, so a small adjustment had to be made to py_compile.py. The syntax error that py_compile was trying to report is now no longer fatal. > Here are the details (probably best to update the wiki with status > before people start working on these): I don't think it hurts if multiple people look into this. These are all tough problems. Folks, do report back here as soon as you've got a result (positive or negative). > I'm not sure what was happening with test_poplib. I had to kill > test_poplib due to taking all cpu without progress. When I ran it by > itself, it was fine. So there was some bad interaction with another > test. And I can't reproduce it either. Can you? > Ref leaks and fatal error (see > http://docs.python.org/dev/3.0/results/make-test-refleak.out): > test_array leaked [11, 11, 11] references, sum=33 > test_bytes leaked [4, 4, 4] references, sum=12 > test_codeccallbacks leaked [21, 21, 21] references, sum=63 > test_codecs leaked [260, 260, 260] references, sum=780 > test_ctypes leaked [10, 10, 10] references, sum=30 > Fatal Python error: > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at > 0xb60b19c8 has negative ref count -4 > > There are probably more, but I haven't had a chance to run more after > test_datetime. I'm running regrtest.py -uall -R4:3: now. > This failure occurred while running with -R: > > test test_coding failed -- Traceback (most recent call last): > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > line 12, in test_bad_coding2 > self.verify_bad_module(module_name) > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > line 20, in verify_bad_module > text = fp.read() > File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read > res += decoder.decode(self.buffer.read(), True) > File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py", > line 26, in decode > return codecs.ascii_decode(input, self.errors)[0] > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position > 0: ordinal not in range(128) This doesn't fail in isolation for me. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rrr at ronadam.com Fri Aug 10 20:09:13 2007 From: rrr at ronadam.com (Ron Adam) Date: Fri, 10 Aug 2007 13:09:13 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BC1479.30405@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> Message-ID: <46BCA9C9.1010306@ronadam.com> Greg Ewing wrote: > Ron Adam wrote: >> - alter the value and return (new_value, format_spec) >> - alter the format_spec and return (value, new_format_spec) >> - do logging of some values, and return the (value, format_spec) >> unchanged. > > I would ditch all of these. They're not necessary, as > the same effect can be achieved by explicitly calling > another __format__ method, or one's own __format__ > method with different args, and returning the result. I'm not sure what you mean by "ditch all of these". Do you mean not document them, or not have the format function do anything further on the (value, format_spec) after it is returned from the __format__ method? >> - do something entirely different and return ('', None) > > I don't understand. What is meant to happen in that case? The output in this case is a null string. Setting the format spec to None tell the format() function not to do anything more to it. What ever else happens in the __format__ method is up to the programmer. >> Does this look ok, or would you do it a different way? > > You haven't explained how this addresses the 'r' issue > without requiring every __format__ method to recognise > and deal with it. The format function recognizes and deals with 'r' just as you suggest, but it also recognizes and deals with all the other standard formatter types in the same way. The format function would first call the objects __format__ method and give it a chance to have control, and depending on what is returned, try to handle it or not. If you want the 'r' specifier to always have precedence over even custom __format__ methods, then you can do that too, but I don't see the need. Ron From nas at arctrix.com Fri Aug 10 20:17:21 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Fri, 10 Aug 2007 18:17:21 +0000 (UTC) Subject: [Python-3000] No (C) optimization flag References: <46BC9400.70803@gmail.com> Message-ID: Nick Coghlan wrote: > However we select between Python and native module versions, the build > bots need be set up to run the modules both ways (with and without C > optimisation). If there is a way to explictly import each module separately then I think that meets both needs. Neil From guido at python.org Fri Aug 10 20:18:35 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Aug 2007 11:18:35 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: Status update: The following still leak (regrtest.py -R4:3:) test_array leaked [11, 11, 11] references, sum=33 test_multibytecodec leaked [72, 72, 72] references, sum=216 test_parser leaked [5, 5, 5] references, sum=15 test_zipimport leaked [29, 29, 29] references, sum=87 I can't reproduce the test_shelve failure. I *do* see the test_structmember failure, will investigate. I see a failure but no segfault in test_datetime; will investigate. Regarding test_univnewlines, this is virgin territory. I've never met anyone who used the newlines attribute on file objects. I'll make a separate post to call it out. --Guido On 8/10/07, Neal Norwitz wrote: > Bah, who needs sleep anyways. This list of problems should be fairly > complete when running with -R. (it skips the fatal error from > test_datetime though) > > Code to trigger a leak: b'\xff'.decode("utf8", "ignore") > > Leaks: > test_array leaked [11, 11, 11] references, sum=33 > test_bytes leaked [4, 4, 4] references, sum=12 > test_codeccallbacks leaked [21, 21, 21] references, sum=63 > test_codecs leaked [260, 260, 260] references, sum=780 > test_ctypes leaked [-22, 43, 10] references, sum=31 > test_multibytecodec leaked [72, 72, 72] references, sum=216 > test_parser leaked [5, 5, 5] references, sum=15 > test_unicode leaked [4, 4, 4] references, sum=12 > test_xml_etree leaked [128, 128, 128] references, sum=384 > test_xml_etree_c leaked [128, 128, 128] references, sum=384 > test_zipimport leaked [29, 29, 29] references, sum=87 > > Failures with -R: > > test test_collections failed -- errors occurred; run in verbose mode for details > > test test_gzip failed -- Traceback (most recent call last): > File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in > test_many_append > ztxt = zgfile.read(8192) > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read > self._read(readsize) > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read > self._read_eof() > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof > crc32 = read32(self.fileobj) > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32 > return struct.unpack(" File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack > return o.unpack(s) > struct.error: unpack requires a string argument of length 4 > > test test_runpy failed -- Traceback (most recent call last): > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230, > in test_run_module > self._check_module(depth) > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168, > in _check_module > d2 = run_module(mod_name) # Read from bytecode > File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module > raise ImportError("No module named %s" % mod_name) > ImportError: No module named runpy_test > > test test_shelve failed -- errors occurred; run in verbose mode for details > > test test_structmembers failed -- errors occurred; run in verbose mode > for details > > test_univnewlines skipped -- This Python does not have universal newline support > > Traceback (most recent call last): > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 222, in > handle_request > self.process_request(request, client_address) > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 241, in > process_request > self.finish_request(request, client_address) > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 254, in > finish_request > self.RequestHandlerClass(request, client_address, self) > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 522, in __init__ > self.handle() > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 316, in handle > self.handle_one_request() > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 303, > in handle_one_request > if not self.parse_request(): # An error code has been sent, just exit > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 281, > in parse_request > self.headers = self.MessageClass(self.rfile, 0) > File "/home/neal/python/dev/py3k/Lib/mimetools.py", line 16, in __init__ > rfc822.Message.__init__(self, fp, seekable) > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 104, in __init__ > self.readheaders() > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 172, in readheaders > headerseen = self.isheader(line) > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 202, in isheader > return line[:i].lower() > AttributeError: 'bytes' object has no attribute 'lower' > > On 8/9/07, Neal Norwitz wrote: > > I wonder if a lot of the refleaks may have the same cause as this one: > > > > b'\xff'.decode("utf8", "ignore") > > > > No leaks jumped out at me. Here is the rest of the leaks that have > > been reported so far. I don't know how many have the same cause. > > > > test_multibytecodec leaked [72, 72, 72] references, sum=216 > > test_parser leaked [5, 5, 5] references, sum=15 > > > > The other failures that occurred with -R: > > > > test test_collections failed -- errors occurred; run in verbose mode for details > > > > test test_gzip failed -- Traceback (most recent call last): > > File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in > > test_many_append > > ztxt = zgfile.read(8192) > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read > > self._read(readsize) > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read > > self._read_eof() > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof > > crc32 = read32(self.fileobj) > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32 > > return struct.unpack(" > File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack > > return o.unpack(s) > > struct.error: unpack requires a string argument of length 4 > > > > test test_runpy failed -- Traceback (most recent call last): > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230, > > in test_run_module > > self._check_module(depth) > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168, > > in _check_module > > d2 = run_module(mod_name) # Read from bytecode > > File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module > > raise ImportError("No module named %s" % mod_name) > > ImportError: No module named runpy_test > > > > test_textwrap was the last test to complete. test_thread was still running. > > > > n > > -- > > On 8/9/07, Neal Norwitz wrote: > > > On 8/9/07, Guido van Rossum wrote: > > > > This is done. The new py3k branch is ready for business. > > > > > > > > Left to do: > > > > > > > > - switch the buildbot and the doc builder to use the new branch (Neal) > > > > > > I've updated to use the new branch. I got the docs building, but > > > there are many more problems. I won't re-enable the cronjob until > > > more things are working. > > > > > > > There are currently about 7 failing unit tests left: > > > > > > > > test_bsddb > > > > test_bsddb3 > > > > test_email > > > > test_email_codecs > > > > test_email_renamed > > > > test_sqlite > > > > test_urllib2_localnet > > > > > > Ok, I disabled these, so if only they fail, mail shouldn't be sent > > > (when I enable the script). > > > > > > There are other problems: > > > * had to kill test_poplib due to taking all cpu without progress > > > * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3: > > > test_foo test_bar ...) > > > * at least one test fails with a fatal error > > > * make install fails > > > > > > Here are the details (probably best to update the wiki with status > > > before people start working on these): > > > > > > I'm not sure what was happening with test_poplib. I had to kill > > > test_poplib due to taking all cpu without progress. When I ran it by > > > itself, it was fine. So there was some bad interaction with another > > > test. > > > > > > Ref leaks and fatal error (see > > > http://docs.python.org/dev/3.0/results/make-test-refleak.out): > > > test_array leaked [11, 11, 11] references, sum=33 > > > test_bytes leaked [4, 4, 4] references, sum=12 > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63 > > > test_codecs leaked [260, 260, 260] references, sum=780 > > > test_ctypes leaked [10, 10, 10] references, sum=30 > > > Fatal Python error: > > > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at > > > 0xb60b19c8 has negative ref count -4 > > > > > > There are probably more, but I haven't had a chance to run more after > > > test_datetime. > > > > > > This failure occurred while running with -R: > > > > > > test test_coding failed -- Traceback (most recent call last): > > > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > > > line 12, in test_bad_coding2 > > > self.verify_bad_module(module_name) > > > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > > > line 20, in verify_bad_module > > > text = fp.read() > > > File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read > > > res += decoder.decode(self.buffer.read(), True) > > > File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py", > > > line 26, in decode > > > return codecs.ascii_decode(input, self.errors)[0] > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position > > > 0: ordinal not in range(128) > > > > > > > > > See http://docs.python.org/dev/3.0/results/make-install.out for this failure: > > > > > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ... > > > Traceback (most recent call last): > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > 162, in > > > exit_status = int(not main()) > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > 152, in main > > > force, rx, quiet): > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > 89, in compile_dir > > > if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet): > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > 65, in compile_dir > > > ok = py_compile.compile(fullname, None, dfile, True) > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > 144, in compile > > > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > 49, in __init__ > > > tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value)) > > > File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line > > > 179, in format_exception_only > > > filename = value.filename or "" > > > AttributeError: 'tuple' object has no attribute 'filename' > > > > > > I'm guessing this came from the change in exception args handling? > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > 144, in compile > > > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > > > > > n > > > > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 10 20:23:45 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Aug 2007 11:23:45 -0700 Subject: [Python-3000] Universal newlines support in Python 3.0 Message-ID: Python 3.0 currently has limited universal newlines support: by default, \r\n is translated into \n for text files, but this can be controlled by the newline= keyword parameter. For details on how, see PEP 3116. The PEP prescribes that a lone \r must also be translated, though this hasn't been implemented yet (any volunteers?). However, the old universal newlines feature also set an attibute named 'newlines' on the file object to a tuple of up to three elements giving the actual line endings that were observed on the file so far (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not implemented. I'm tempted to kill it. Does anyone have a use case for this? Has anyone even ever used this? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 10 20:28:05 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Aug 2007 11:28:05 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: Um, Neal reported some more failures with -R earlier. I can reproduce the failures for test_collections and test_runpy, but test_gzip passes fine for me (standalone). I'll look into these. I'm still running the full set as well, it'll take all day. I can't reproduce Neal's problem with test_poplib (which was pegging the CPU for him). On 8/10/07, Guido van Rossum wrote: > Status update: > > The following still leak (regrtest.py -R4:3:) > > test_array leaked [11, 11, 11] references, sum=33 > test_multibytecodec leaked [72, 72, 72] references, sum=216 > test_parser leaked [5, 5, 5] references, sum=15 > test_zipimport leaked [29, 29, 29] references, sum=87 > > I can't reproduce the test_shelve failure. > > I *do* see the test_structmember failure, will investigate. > > I see a failure but no segfault in test_datetime; will investigate. > > Regarding test_univnewlines, this is virgin territory. I've never met > anyone who used the newlines attribute on file objects. I'll make a > separate post to call it out. > > --Guido > > On 8/10/07, Neal Norwitz wrote: > > Bah, who needs sleep anyways. This list of problems should be fairly > > complete when running with -R. (it skips the fatal error from > > test_datetime though) > > > > Code to trigger a leak: b'\xff'.decode("utf8", "ignore") > > > > Leaks: > > test_array leaked [11, 11, 11] references, sum=33 > > test_bytes leaked [4, 4, 4] references, sum=12 > > test_codeccallbacks leaked [21, 21, 21] references, sum=63 > > test_codecs leaked [260, 260, 260] references, sum=780 > > test_ctypes leaked [-22, 43, 10] references, sum=31 > > test_multibytecodec leaked [72, 72, 72] references, sum=216 > > test_parser leaked [5, 5, 5] references, sum=15 > > test_unicode leaked [4, 4, 4] references, sum=12 > > test_xml_etree leaked [128, 128, 128] references, sum=384 > > test_xml_etree_c leaked [128, 128, 128] references, sum=384 > > test_zipimport leaked [29, 29, 29] references, sum=87 > > > > Failures with -R: > > > > test test_collections failed -- errors occurred; run in verbose mode for details > > > > test test_gzip failed -- Traceback (most recent call last): > > File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in > > test_many_append > > ztxt = zgfile.read(8192) > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read > > self._read(readsize) > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read > > self._read_eof() > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof > > crc32 = read32(self.fileobj) > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32 > > return struct.unpack(" > File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack > > return o.unpack(s) > > struct.error: unpack requires a string argument of length 4 > > > > test test_runpy failed -- Traceback (most recent call last): > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230, > > in test_run_module > > self._check_module(depth) > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168, > > in _check_module > > d2 = run_module(mod_name) # Read from bytecode > > File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module > > raise ImportError("No module named %s" % mod_name) > > ImportError: No module named runpy_test > > > > test test_shelve failed -- errors occurred; run in verbose mode for details > > > > test test_structmembers failed -- errors occurred; run in verbose mode > > for details > > > > test_univnewlines skipped -- This Python does not have universal newline support > > > > Traceback (most recent call last): > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 222, in > > handle_request > > self.process_request(request, client_address) > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 241, in > > process_request > > self.finish_request(request, client_address) > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 254, in > > finish_request > > self.RequestHandlerClass(request, client_address, self) > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 522, in __init__ > > self.handle() > > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 316, in handle > > self.handle_one_request() > > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 303, > > in handle_one_request > > if not self.parse_request(): # An error code has been sent, just exit > > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 281, > > in parse_request > > self.headers = self.MessageClass(self.rfile, 0) > > File "/home/neal/python/dev/py3k/Lib/mimetools.py", line 16, in __init__ > > rfc822.Message.__init__(self, fp, seekable) > > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 104, in __init__ > > self.readheaders() > > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 172, in readheaders > > headerseen = self.isheader(line) > > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 202, in isheader > > return line[:i].lower() > > AttributeError: 'bytes' object has no attribute 'lower' > > > > On 8/9/07, Neal Norwitz wrote: > > > I wonder if a lot of the refleaks may have the same cause as this one: > > > > > > b'\xff'.decode("utf8", "ignore") > > > > > > No leaks jumped out at me. Here is the rest of the leaks that have > > > been reported so far. I don't know how many have the same cause. > > > > > > test_multibytecodec leaked [72, 72, 72] references, sum=216 > > > test_parser leaked [5, 5, 5] references, sum=15 > > > > > > The other failures that occurred with -R: > > > > > > test test_collections failed -- errors occurred; run in verbose mode for details > > > > > > test test_gzip failed -- Traceback (most recent call last): > > > File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in > > > test_many_append > > > ztxt = zgfile.read(8192) > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read > > > self._read(readsize) > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read > > > self._read_eof() > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof > > > crc32 = read32(self.fileobj) > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32 > > > return struct.unpack(" > > File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack > > > return o.unpack(s) > > > struct.error: unpack requires a string argument of length 4 > > > > > > test test_runpy failed -- Traceback (most recent call last): > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230, > > > in test_run_module > > > self._check_module(depth) > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168, > > > in _check_module > > > d2 = run_module(mod_name) # Read from bytecode > > > File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module > > > raise ImportError("No module named %s" % mod_name) > > > ImportError: No module named runpy_test > > > > > > test_textwrap was the last test to complete. test_thread was still running. > > > > > > n > > > -- > > > On 8/9/07, Neal Norwitz wrote: > > > > On 8/9/07, Guido van Rossum wrote: > > > > > This is done. The new py3k branch is ready for business. > > > > > > > > > > Left to do: > > > > > > > > > > - switch the buildbot and the doc builder to use the new branch (Neal) > > > > > > > > I've updated to use the new branch. I got the docs building, but > > > > there are many more problems. I won't re-enable the cronjob until > > > > more things are working. > > > > > > > > > There are currently about 7 failing unit tests left: > > > > > > > > > > test_bsddb > > > > > test_bsddb3 > > > > > test_email > > > > > test_email_codecs > > > > > test_email_renamed > > > > > test_sqlite > > > > > test_urllib2_localnet > > > > > > > > Ok, I disabled these, so if only they fail, mail shouldn't be sent > > > > (when I enable the script). > > > > > > > > There are other problems: > > > > * had to kill test_poplib due to taking all cpu without progress > > > > * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3: > > > > test_foo test_bar ...) > > > > * at least one test fails with a fatal error > > > > * make install fails > > > > > > > > Here are the details (probably best to update the wiki with status > > > > before people start working on these): > > > > > > > > I'm not sure what was happening with test_poplib. I had to kill > > > > test_poplib due to taking all cpu without progress. When I ran it by > > > > itself, it was fine. So there was some bad interaction with another > > > > test. > > > > > > > > Ref leaks and fatal error (see > > > > http://docs.python.org/dev/3.0/results/make-test-refleak.out): > > > > test_array leaked [11, 11, 11] references, sum=33 > > > > test_bytes leaked [4, 4, 4] references, sum=12 > > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63 > > > > test_codecs leaked [260, 260, 260] references, sum=780 > > > > test_ctypes leaked [10, 10, 10] references, sum=30 > > > > Fatal Python error: > > > > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at > > > > 0xb60b19c8 has negative ref count -4 > > > > > > > > There are probably more, but I haven't had a chance to run more after > > > > test_datetime. > > > > > > > > This failure occurred while running with -R: > > > > > > > > test test_coding failed -- Traceback (most recent call last): > > > > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > > > > line 12, in test_bad_coding2 > > > > self.verify_bad_module(module_name) > > > > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > > > > line 20, in verify_bad_module > > > > text = fp.read() > > > > File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read > > > > res += decoder.decode(self.buffer.read(), True) > > > > File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py", > > > > line 26, in decode > > > > return codecs.ascii_decode(input, self.errors)[0] > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position > > > > 0: ordinal not in range(128) > > > > > > > > > > > > See http://docs.python.org/dev/3.0/results/make-install.out for this failure: > > > > > > > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ... > > > > Traceback (most recent call last): > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > 162, in > > > > exit_status = int(not main()) > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > 152, in main > > > > force, rx, quiet): > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > 89, in compile_dir > > > > if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet): > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > 65, in compile_dir > > > > ok = py_compile.compile(fullname, None, dfile, True) > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > > 144, in compile > > > > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > > 49, in __init__ > > > > tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value)) > > > > File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line > > > > 179, in format_exception_only > > > > filename = value.filename or "" > > > > AttributeError: 'tuple' object has no attribute 'filename' > > > > > > > > I'm guessing this came from the change in exception args handling? > > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > > 144, in compile > > > > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > > > > > > > n > > > > > > > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From stephen at xemacs.org Fri Aug 10 21:15:45 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 11 Aug 2007 04:15:45 +0900 Subject: [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References: Message-ID: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > However, the old universal newlines feature also set an attibute named > 'newlines' on the file object to a tuple of up to three elements > giving the actual line endings that were observed on the file so far > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not > implemented. I'm tempted to kill it. Does anyone have a use case for > this? I have run into files that intentionally have more than one newline convention used (mbox and Babyl mail folders, with messages received from various platforms). However, most of the time multiple newline conventions is a sign that the file is either corrupt or isn't text. If so, then saving the file may corrupt it. The newlines attribute could be used to check for this condition. > Has anyone even ever used this? Not I. When I care about such issues I prefer that the codec raise an exception at the time of detection. From brett at python.org Fri Aug 10 21:11:12 2007 From: brett at python.org (Brett Cannon) Date: Fri, 10 Aug 2007 12:11:12 -0700 Subject: [Python-3000] No (C) optimization flag In-Reply-To: <46BC9400.70803@gmail.com> References: <46BC9400.70803@gmail.com> Message-ID: On 8/10/07, Nick Coghlan wrote: > Guido van Rossum wrote: > > If you really need to step through the Python code, you can just > > sabotage the loading of the non-Python version, e.g. remove or rename > > the .so or .dll file temporarily. > > > > I wonder about the usefulness of this debugging though -- if you're > > debugging something that requires you to step through the C code, how > > do you know that the same bug is present in the Python code you're > > stepping through instead? Otherwise (if you're debugging a bug in your > > own program) I'm not sure I see how stepping through the I/O library > > is helpful. > > > > Sounds like what you're really after is *understanding* how the I/O > > library works. For that, perhaps reading the docs and then reading the > > source code would be more effective. > > However we select between Python and native module versions, the build > bots need be set up to run the modules both ways (with and without C > optimisation). > Part of Alexandre's SoC work is to come up with a mechanism to do this. -Brett From guido at python.org Fri Aug 10 21:16:47 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Aug 2007 12:16:47 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: I've updated the wiki page with the status for these. I've confirmed the test_datetime segfault, but I can only provoke it when run in sequence with all the others. I'm also experiencing a hang of test_asynchat when run in sequence. http://wiki.python.org/moin/Py3kStrUniTests --Guido On 8/10/07, Guido van Rossum wrote: > Um, Neal reported some more failures with -R earlier. I can reproduce > the failures for test_collections and test_runpy, but test_gzip passes > fine for me (standalone). I'll look into these. I'm still running the > full set as well, it'll take all day. I can't reproduce Neal's problem > with test_poplib (which was pegging the CPU for him). > > On 8/10/07, Guido van Rossum wrote: > > Status update: > > > > The following still leak (regrtest.py -R4:3:) > > > > test_array leaked [11, 11, 11] references, sum=33 > > test_multibytecodec leaked [72, 72, 72] references, sum=216 > > test_parser leaked [5, 5, 5] references, sum=15 > > test_zipimport leaked [29, 29, 29] references, sum=87 > > > > I can't reproduce the test_shelve failure. > > > > I *do* see the test_structmember failure, will investigate. > > > > I see a failure but no segfault in test_datetime; will investigate. > > > > Regarding test_univnewlines, this is virgin territory. I've never met > > anyone who used the newlines attribute on file objects. I'll make a > > separate post to call it out. > > > > --Guido > > > > On 8/10/07, Neal Norwitz wrote: > > > Bah, who needs sleep anyways. This list of problems should be fairly > > > complete when running with -R. (it skips the fatal error from > > > test_datetime though) > > > > > > Code to trigger a leak: b'\xff'.decode("utf8", "ignore") > > > > > > Leaks: > > > test_array leaked [11, 11, 11] references, sum=33 > > > test_bytes leaked [4, 4, 4] references, sum=12 > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63 > > > test_codecs leaked [260, 260, 260] references, sum=780 > > > test_ctypes leaked [-22, 43, 10] references, sum=31 > > > test_multibytecodec leaked [72, 72, 72] references, sum=216 > > > test_parser leaked [5, 5, 5] references, sum=15 > > > test_unicode leaked [4, 4, 4] references, sum=12 > > > test_xml_etree leaked [128, 128, 128] references, sum=384 > > > test_xml_etree_c leaked [128, 128, 128] references, sum=384 > > > test_zipimport leaked [29, 29, 29] references, sum=87 > > > > > > Failures with -R: > > > > > > test test_collections failed -- errors occurred; run in verbose mode for details > > > > > > test test_gzip failed -- Traceback (most recent call last): > > > File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in > > > test_many_append > > > ztxt = zgfile.read(8192) > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read > > > self._read(readsize) > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read > > > self._read_eof() > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof > > > crc32 = read32(self.fileobj) > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32 > > > return struct.unpack(" > > File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack > > > return o.unpack(s) > > > struct.error: unpack requires a string argument of length 4 > > > > > > test test_runpy failed -- Traceback (most recent call last): > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230, > > > in test_run_module > > > self._check_module(depth) > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168, > > > in _check_module > > > d2 = run_module(mod_name) # Read from bytecode > > > File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module > > > raise ImportError("No module named %s" % mod_name) > > > ImportError: No module named runpy_test > > > > > > test test_shelve failed -- errors occurred; run in verbose mode for details > > > > > > test test_structmembers failed -- errors occurred; run in verbose mode > > > for details > > > > > > test_univnewlines skipped -- This Python does not have universal newline support > > > > > > Traceback (most recent call last): > > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 222, in > > > handle_request > > > self.process_request(request, client_address) > > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 241, in > > > process_request > > > self.finish_request(request, client_address) > > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 254, in > > > finish_request > > > self.RequestHandlerClass(request, client_address, self) > > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 522, in __init__ > > > self.handle() > > > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 316, in handle > > > self.handle_one_request() > > > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 303, > > > in handle_one_request > > > if not self.parse_request(): # An error code has been sent, just exit > > > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 281, > > > in parse_request > > > self.headers = self.MessageClass(self.rfile, 0) > > > File "/home/neal/python/dev/py3k/Lib/mimetools.py", line 16, in __init__ > > > rfc822.Message.__init__(self, fp, seekable) > > > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 104, in __init__ > > > self.readheaders() > > > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 172, in readheaders > > > headerseen = self.isheader(line) > > > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 202, in isheader > > > return line[:i].lower() > > > AttributeError: 'bytes' object has no attribute 'lower' > > > > > > On 8/9/07, Neal Norwitz wrote: > > > > I wonder if a lot of the refleaks may have the same cause as this one: > > > > > > > > b'\xff'.decode("utf8", "ignore") > > > > > > > > No leaks jumped out at me. Here is the rest of the leaks that have > > > > been reported so far. I don't know how many have the same cause. > > > > > > > > test_multibytecodec leaked [72, 72, 72] references, sum=216 > > > > test_parser leaked [5, 5, 5] references, sum=15 > > > > > > > > The other failures that occurred with -R: > > > > > > > > test test_collections failed -- errors occurred; run in verbose mode for details > > > > > > > > test test_gzip failed -- Traceback (most recent call last): > > > > File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in > > > > test_many_append > > > > ztxt = zgfile.read(8192) > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read > > > > self._read(readsize) > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read > > > > self._read_eof() > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof > > > > crc32 = read32(self.fileobj) > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32 > > > > return struct.unpack(" > > > File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack > > > > return o.unpack(s) > > > > struct.error: unpack requires a string argument of length 4 > > > > > > > > test test_runpy failed -- Traceback (most recent call last): > > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230, > > > > in test_run_module > > > > self._check_module(depth) > > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168, > > > > in _check_module > > > > d2 = run_module(mod_name) # Read from bytecode > > > > File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module > > > > raise ImportError("No module named %s" % mod_name) > > > > ImportError: No module named runpy_test > > > > > > > > test_textwrap was the last test to complete. test_thread was still running. > > > > > > > > n > > > > -- > > > > On 8/9/07, Neal Norwitz wrote: > > > > > On 8/9/07, Guido van Rossum wrote: > > > > > > This is done. The new py3k branch is ready for business. > > > > > > > > > > > > Left to do: > > > > > > > > > > > > - switch the buildbot and the doc builder to use the new branch (Neal) > > > > > > > > > > I've updated to use the new branch. I got the docs building, but > > > > > there are many more problems. I won't re-enable the cronjob until > > > > > more things are working. > > > > > > > > > > > There are currently about 7 failing unit tests left: > > > > > > > > > > > > test_bsddb > > > > > > test_bsddb3 > > > > > > test_email > > > > > > test_email_codecs > > > > > > test_email_renamed > > > > > > test_sqlite > > > > > > test_urllib2_localnet > > > > > > > > > > Ok, I disabled these, so if only they fail, mail shouldn't be sent > > > > > (when I enable the script). > > > > > > > > > > There are other problems: > > > > > * had to kill test_poplib due to taking all cpu without progress > > > > > * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3: > > > > > test_foo test_bar ...) > > > > > * at least one test fails with a fatal error > > > > > * make install fails > > > > > > > > > > Here are the details (probably best to update the wiki with status > > > > > before people start working on these): > > > > > > > > > > I'm not sure what was happening with test_poplib. I had to kill > > > > > test_poplib due to taking all cpu without progress. When I ran it by > > > > > itself, it was fine. So there was some bad interaction with another > > > > > test. > > > > > > > > > > Ref leaks and fatal error (see > > > > > http://docs.python.org/dev/3.0/results/make-test-refleak.out): > > > > > test_array leaked [11, 11, 11] references, sum=33 > > > > > test_bytes leaked [4, 4, 4] references, sum=12 > > > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63 > > > > > test_codecs leaked [260, 260, 260] references, sum=780 > > > > > test_ctypes leaked [10, 10, 10] references, sum=30 > > > > > Fatal Python error: > > > > > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at > > > > > 0xb60b19c8 has negative ref count -4 > > > > > > > > > > There are probably more, but I haven't had a chance to run more after > > > > > test_datetime. > > > > > > > > > > This failure occurred while running with -R: > > > > > > > > > > test test_coding failed -- Traceback (most recent call last): > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > > > > > line 12, in test_bad_coding2 > > > > > self.verify_bad_module(module_name) > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > > > > > line 20, in verify_bad_module > > > > > text = fp.read() > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read > > > > > res += decoder.decode(self.buffer.read(), True) > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py", > > > > > line 26, in decode > > > > > return codecs.ascii_decode(input, self.errors)[0] > > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position > > > > > 0: ordinal not in range(128) > > > > > > > > > > > > > > > See http://docs.python.org/dev/3.0/results/make-install.out for this failure: > > > > > > > > > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ... > > > > > Traceback (most recent call last): > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > > 162, in > > > > > exit_status = int(not main()) > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > > 152, in main > > > > > force, rx, quiet): > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > > 89, in compile_dir > > > > > if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet): > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > > 65, in compile_dir > > > > > ok = py_compile.compile(fullname, None, dfile, True) > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > > > 144, in compile > > > > > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > > > 49, in __init__ > > > > > tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value)) > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line > > > > > 179, in format_exception_only > > > > > filename = value.filename or "" > > > > > AttributeError: 'tuple' object has no attribute 'filename' > > > > > > > > > > I'm guessing this came from the change in exception args handling? > > > > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > > > 144, in compile > > > > > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > > > > > > > > > n > > > > > > > > > > > > > > > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 10 21:19:48 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Aug 2007 12:19:48 -0700 Subject: [Python-3000] No (C) optimization flag In-Reply-To: References: <46BC9400.70803@gmail.com> Message-ID: On 8/10/07, Neil Schemenauer wrote: > Nick Coghlan wrote: > > However we select between Python and native module versions, the build > > bots need be set up to run the modules both ways (with and without C > > optimisation). > > If there is a way to explictly import each module separately then I > think that meets both needs. This sounds good. It may be as simple as moving the Python implementation into a separate module as well, and having the public module attempt to import first from the C code, then from the Python code. I think that if there's code for which no C equivalent exists (e.g. some stuff in heapq.py, presumably some stuff in io.py), it should be in the public module, so the latter cal do something like this: try: from _c_foo import * # C version except ImportError: from _py_foo import * # Py vesrion -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tony at PageDNA.com Fri Aug 10 21:27:46 2007 From: tony at PageDNA.com (Tony Lownds) Date: Fri, 10 Aug 2007 12:27:46 -0700 Subject: [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References: Message-ID: On Aug 10, 2007, at 11:23 AM, Guido van Rossum wrote: > Python 3.0 currently has limited universal newlines support: by > default, \r\n is translated into \n for text files, but this can be > controlled by the newline= keyword parameter. For details on how, see > PEP 3116. The PEP prescribes that a lone \r must also be translated, > though this hasn't been implemented yet (any volunteers?). I'll give it a shot! -Tony From jeremy at alum.mit.edu Fri Aug 10 22:13:46 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 10 Aug 2007 16:13:46 -0400 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: I also see test_shelve failing because something is passing bytes as a dictionary key. I've just started seeing it, but can't figure out what caused the change. Jeremy On 8/10/07, Guido van Rossum wrote: > I've updated the wiki page with the status for these. I've confirmed > the test_datetime segfault, but I can only provoke it when run in > sequence with all the others. > > I'm also experiencing a hang of test_asynchat when run in sequence. > > http://wiki.python.org/moin/Py3kStrUniTests > > --Guido > > On 8/10/07, Guido van Rossum wrote: > > Um, Neal reported some more failures with -R earlier. I can reproduce > > the failures for test_collections and test_runpy, but test_gzip passes > > fine for me (standalone). I'll look into these. I'm still running the > > full set as well, it'll take all day. I can't reproduce Neal's problem > > with test_poplib (which was pegging the CPU for him). > > > > On 8/10/07, Guido van Rossum wrote: > > > Status update: > > > > > > The following still leak (regrtest.py -R4:3:) > > > > > > test_array leaked [11, 11, 11] references, sum=33 > > > test_multibytecodec leaked [72, 72, 72] references, sum=216 > > > test_parser leaked [5, 5, 5] references, sum=15 > > > test_zipimport leaked [29, 29, 29] references, sum=87 > > > > > > I can't reproduce the test_shelve failure. > > > > > > I *do* see the test_structmember failure, will investigate. > > > > > > I see a failure but no segfault in test_datetime; will investigate. > > > > > > Regarding test_univnewlines, this is virgin territory. I've never met > > > anyone who used the newlines attribute on file objects. I'll make a > > > separate post to call it out. > > > > > > --Guido > > > > > > On 8/10/07, Neal Norwitz wrote: > > > > Bah, who needs sleep anyways. This list of problems should be fairly > > > > complete when running with -R. (it skips the fatal error from > > > > test_datetime though) > > > > > > > > Code to trigger a leak: b'\xff'.decode("utf8", "ignore") > > > > > > > > Leaks: > > > > test_array leaked [11, 11, 11] references, sum=33 > > > > test_bytes leaked [4, 4, 4] references, sum=12 > > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63 > > > > test_codecs leaked [260, 260, 260] references, sum=780 > > > > test_ctypes leaked [-22, 43, 10] references, sum=31 > > > > test_multibytecodec leaked [72, 72, 72] references, sum=216 > > > > test_parser leaked [5, 5, 5] references, sum=15 > > > > test_unicode leaked [4, 4, 4] references, sum=12 > > > > test_xml_etree leaked [128, 128, 128] references, sum=384 > > > > test_xml_etree_c leaked [128, 128, 128] references, sum=384 > > > > test_zipimport leaked [29, 29, 29] references, sum=87 > > > > > > > > Failures with -R: > > > > > > > > test test_collections failed -- errors occurred; run in verbose mode for details > > > > > > > > test test_gzip failed -- Traceback (most recent call last): > > > > File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in > > > > test_many_append > > > > ztxt = zgfile.read(8192) > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read > > > > self._read(readsize) > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read > > > > self._read_eof() > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof > > > > crc32 = read32(self.fileobj) > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32 > > > > return struct.unpack(" > > > File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack > > > > return o.unpack(s) > > > > struct.error: unpack requires a string argument of length 4 > > > > > > > > test test_runpy failed -- Traceback (most recent call last): > > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230, > > > > in test_run_module > > > > self._check_module(depth) > > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168, > > > > in _check_module > > > > d2 = run_module(mod_name) # Read from bytecode > > > > File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module > > > > raise ImportError("No module named %s" % mod_name) > > > > ImportError: No module named runpy_test > > > > > > > > test test_shelve failed -- errors occurred; run in verbose mode for details > > > > > > > > test test_structmembers failed -- errors occurred; run in verbose mode > > > > for details > > > > > > > > test_univnewlines skipped -- This Python does not have universal newline support > > > > > > > > Traceback (most recent call last): > > > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 222, in > > > > handle_request > > > > self.process_request(request, client_address) > > > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 241, in > > > > process_request > > > > self.finish_request(request, client_address) > > > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 254, in > > > > finish_request > > > > self.RequestHandlerClass(request, client_address, self) > > > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 522, in __init__ > > > > self.handle() > > > > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 316, in handle > > > > self.handle_one_request() > > > > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 303, > > > > in handle_one_request > > > > if not self.parse_request(): # An error code has been sent, just exit > > > > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 281, > > > > in parse_request > > > > self.headers = self.MessageClass(self.rfile, 0) > > > > File "/home/neal/python/dev/py3k/Lib/mimetools.py", line 16, in __init__ > > > > rfc822.Message.__init__(self, fp, seekable) > > > > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 104, in __init__ > > > > self.readheaders() > > > > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 172, in readheaders > > > > headerseen = self.isheader(line) > > > > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 202, in isheader > > > > return line[:i].lower() > > > > AttributeError: 'bytes' object has no attribute 'lower' > > > > > > > > On 8/9/07, Neal Norwitz wrote: > > > > > I wonder if a lot of the refleaks may have the same cause as this one: > > > > > > > > > > b'\xff'.decode("utf8", "ignore") > > > > > > > > > > No leaks jumped out at me. Here is the rest of the leaks that have > > > > > been reported so far. I don't know how many have the same cause. > > > > > > > > > > test_multibytecodec leaked [72, 72, 72] references, sum=216 > > > > > test_parser leaked [5, 5, 5] references, sum=15 > > > > > > > > > > The other failures that occurred with -R: > > > > > > > > > > test test_collections failed -- errors occurred; run in verbose mode for details > > > > > > > > > > test test_gzip failed -- Traceback (most recent call last): > > > > > File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in > > > > > test_many_append > > > > > ztxt = zgfile.read(8192) > > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read > > > > > self._read(readsize) > > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read > > > > > self._read_eof() > > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof > > > > > crc32 = read32(self.fileobj) > > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32 > > > > > return struct.unpack(" > > > > File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack > > > > > return o.unpack(s) > > > > > struct.error: unpack requires a string argument of length 4 > > > > > > > > > > test test_runpy failed -- Traceback (most recent call last): > > > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230, > > > > > in test_run_module > > > > > self._check_module(depth) > > > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168, > > > > > in _check_module > > > > > d2 = run_module(mod_name) # Read from bytecode > > > > > File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module > > > > > raise ImportError("No module named %s" % mod_name) > > > > > ImportError: No module named runpy_test > > > > > > > > > > test_textwrap was the last test to complete. test_thread was still running. > > > > > > > > > > n > > > > > -- > > > > > On 8/9/07, Neal Norwitz wrote: > > > > > > On 8/9/07, Guido van Rossum wrote: > > > > > > > This is done. The new py3k branch is ready for business. > > > > > > > > > > > > > > Left to do: > > > > > > > > > > > > > > - switch the buildbot and the doc builder to use the new branch (Neal) > > > > > > > > > > > > I've updated to use the new branch. I got the docs building, but > > > > > > there are many more problems. I won't re-enable the cronjob until > > > > > > more things are working. > > > > > > > > > > > > > There are currently about 7 failing unit tests left: > > > > > > > > > > > > > > test_bsddb > > > > > > > test_bsddb3 > > > > > > > test_email > > > > > > > test_email_codecs > > > > > > > test_email_renamed > > > > > > > test_sqlite > > > > > > > test_urllib2_localnet > > > > > > > > > > > > Ok, I disabled these, so if only they fail, mail shouldn't be sent > > > > > > (when I enable the script). > > > > > > > > > > > > There are other problems: > > > > > > * had to kill test_poplib due to taking all cpu without progress > > > > > > * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3: > > > > > > test_foo test_bar ...) > > > > > > * at least one test fails with a fatal error > > > > > > * make install fails > > > > > > > > > > > > Here are the details (probably best to update the wiki with status > > > > > > before people start working on these): > > > > > > > > > > > > I'm not sure what was happening with test_poplib. I had to kill > > > > > > test_poplib due to taking all cpu without progress. When I ran it by > > > > > > itself, it was fine. So there was some bad interaction with another > > > > > > test. > > > > > > > > > > > > Ref leaks and fatal error (see > > > > > > http://docs.python.org/dev/3.0/results/make-test-refleak.out): > > > > > > test_array leaked [11, 11, 11] references, sum=33 > > > > > > test_bytes leaked [4, 4, 4] references, sum=12 > > > > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63 > > > > > > test_codecs leaked [260, 260, 260] references, sum=780 > > > > > > test_ctypes leaked [10, 10, 10] references, sum=30 > > > > > > Fatal Python error: > > > > > > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at > > > > > > 0xb60b19c8 has negative ref count -4 > > > > > > > > > > > > There are probably more, but I haven't had a chance to run more after > > > > > > test_datetime. > > > > > > > > > > > > This failure occurred while running with -R: > > > > > > > > > > > > test test_coding failed -- Traceback (most recent call last): > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > > > > > > line 12, in test_bad_coding2 > > > > > > self.verify_bad_module(module_name) > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > > > > > > line 20, in verify_bad_module > > > > > > text = fp.read() > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read > > > > > > res += decoder.decode(self.buffer.read(), True) > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py", > > > > > > line 26, in decode > > > > > > return codecs.ascii_decode(input, self.errors)[0] > > > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position > > > > > > 0: ordinal not in range(128) > > > > > > > > > > > > > > > > > > See http://docs.python.org/dev/3.0/results/make-install.out for this failure: > > > > > > > > > > > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ... > > > > > > Traceback (most recent call last): > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > > > 162, in > > > > > > exit_status = int(not main()) > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > > > 152, in main > > > > > > force, rx, quiet): > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > > > 89, in compile_dir > > > > > > if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet): > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > > > 65, in compile_dir > > > > > > ok = py_compile.compile(fullname, None, dfile, True) > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > > > > 144, in compile > > > > > > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > > > > 49, in __init__ > > > > > > tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value)) > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line > > > > > > 179, in format_exception_only > > > > > > filename = value.filename or "" > > > > > > AttributeError: 'tuple' object has no attribute 'filename' > > > > > > > > > > > > I'm guessing this came from the change in exception args handling? > > > > > > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > > > > 144, in compile > > > > > > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > > > > > > > > > > > n > > > > > > > > > > > > > > > > > > > > > > > > -- > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu > From guido at python.org Fri Aug 10 23:04:28 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Aug 2007 14:04:28 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: I tried test_shelve on three boxes, with the following results: Ubuntu, using gdbm: pass OSX, using dbm: pass Red Hat 7.3, using bsddb: fail So this seems to be a lingering bsddb failure. (I think that's the "simple" bsddb module, not the full bsddb3 package.) --Guido On 8/10/07, Jeremy Hylton wrote: > I also see test_shelve failing because something is passing bytes as a > dictionary key. I've just started seeing it, but can't figure out > what caused the change. > > Jeremy > > On 8/10/07, Guido van Rossum wrote: > > I've updated the wiki page with the status for these. I've confirmed > > the test_datetime segfault, but I can only provoke it when run in > > sequence with all the others. > > > > I'm also experiencing a hang of test_asynchat when run in sequence. > > > > http://wiki.python.org/moin/Py3kStrUniTests > > > > --Guido > > > > On 8/10/07, Guido van Rossum wrote: > > > Um, Neal reported some more failures with -R earlier. I can reproduce > > > the failures for test_collections and test_runpy, but test_gzip passes > > > fine for me (standalone). I'll look into these. I'm still running the > > > full set as well, it'll take all day. I can't reproduce Neal's problem > > > with test_poplib (which was pegging the CPU for him). > > > > > > On 8/10/07, Guido van Rossum wrote: > > > > Status update: > > > > > > > > The following still leak (regrtest.py -R4:3:) > > > > > > > > test_array leaked [11, 11, 11] references, sum=33 > > > > test_multibytecodec leaked [72, 72, 72] references, sum=216 > > > > test_parser leaked [5, 5, 5] references, sum=15 > > > > test_zipimport leaked [29, 29, 29] references, sum=87 > > > > > > > > I can't reproduce the test_shelve failure. > > > > > > > > I *do* see the test_structmember failure, will investigate. > > > > > > > > I see a failure but no segfault in test_datetime; will investigate. > > > > > > > > Regarding test_univnewlines, this is virgin territory. I've never met > > > > anyone who used the newlines attribute on file objects. I'll make a > > > > separate post to call it out. > > > > > > > > --Guido > > > > > > > > On 8/10/07, Neal Norwitz wrote: > > > > > Bah, who needs sleep anyways. This list of problems should be fairly > > > > > complete when running with -R. (it skips the fatal error from > > > > > test_datetime though) > > > > > > > > > > Code to trigger a leak: b'\xff'.decode("utf8", "ignore") > > > > > > > > > > Leaks: > > > > > test_array leaked [11, 11, 11] references, sum=33 > > > > > test_bytes leaked [4, 4, 4] references, sum=12 > > > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63 > > > > > test_codecs leaked [260, 260, 260] references, sum=780 > > > > > test_ctypes leaked [-22, 43, 10] references, sum=31 > > > > > test_multibytecodec leaked [72, 72, 72] references, sum=216 > > > > > test_parser leaked [5, 5, 5] references, sum=15 > > > > > test_unicode leaked [4, 4, 4] references, sum=12 > > > > > test_xml_etree leaked [128, 128, 128] references, sum=384 > > > > > test_xml_etree_c leaked [128, 128, 128] references, sum=384 > > > > > test_zipimport leaked [29, 29, 29] references, sum=87 > > > > > > > > > > Failures with -R: > > > > > > > > > > test test_collections failed -- errors occurred; run in verbose mode for details > > > > > > > > > > test test_gzip failed -- Traceback (most recent call last): > > > > > File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in > > > > > test_many_append > > > > > ztxt = zgfile.read(8192) > > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read > > > > > self._read(readsize) > > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read > > > > > self._read_eof() > > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof > > > > > crc32 = read32(self.fileobj) > > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32 > > > > > return struct.unpack(" > > > > File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack > > > > > return o.unpack(s) > > > > > struct.error: unpack requires a string argument of length 4 > > > > > > > > > > test test_runpy failed -- Traceback (most recent call last): > > > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230, > > > > > in test_run_module > > > > > self._check_module(depth) > > > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168, > > > > > in _check_module > > > > > d2 = run_module(mod_name) # Read from bytecode > > > > > File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module > > > > > raise ImportError("No module named %s" % mod_name) > > > > > ImportError: No module named runpy_test > > > > > > > > > > test test_shelve failed -- errors occurred; run in verbose mode for details > > > > > > > > > > test test_structmembers failed -- errors occurred; run in verbose mode > > > > > for details > > > > > > > > > > test_univnewlines skipped -- This Python does not have universal newline support > > > > > > > > > > Traceback (most recent call last): > > > > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 222, in > > > > > handle_request > > > > > self.process_request(request, client_address) > > > > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 241, in > > > > > process_request > > > > > self.finish_request(request, client_address) > > > > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 254, in > > > > > finish_request > > > > > self.RequestHandlerClass(request, client_address, self) > > > > > File "/home/neal/python/dev/py3k/Lib/SocketServer.py", line 522, in __init__ > > > > > self.handle() > > > > > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 316, in handle > > > > > self.handle_one_request() > > > > > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 303, > > > > > in handle_one_request > > > > > if not self.parse_request(): # An error code has been sent, just exit > > > > > File "/home/neal/python/dev/py3k/Lib/BaseHTTPServer.py", line 281, > > > > > in parse_request > > > > > self.headers = self.MessageClass(self.rfile, 0) > > > > > File "/home/neal/python/dev/py3k/Lib/mimetools.py", line 16, in __init__ > > > > > rfc822.Message.__init__(self, fp, seekable) > > > > > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 104, in __init__ > > > > > self.readheaders() > > > > > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 172, in readheaders > > > > > headerseen = self.isheader(line) > > > > > File "/home/neal/python/dev/py3k/Lib/rfc822.py", line 202, in isheader > > > > > return line[:i].lower() > > > > > AttributeError: 'bytes' object has no attribute 'lower' > > > > > > > > > > On 8/9/07, Neal Norwitz wrote: > > > > > > I wonder if a lot of the refleaks may have the same cause as this one: > > > > > > > > > > > > b'\xff'.decode("utf8", "ignore") > > > > > > > > > > > > No leaks jumped out at me. Here is the rest of the leaks that have > > > > > > been reported so far. I don't know how many have the same cause. > > > > > > > > > > > > test_multibytecodec leaked [72, 72, 72] references, sum=216 > > > > > > test_parser leaked [5, 5, 5] references, sum=15 > > > > > > > > > > > > The other failures that occurred with -R: > > > > > > > > > > > > test test_collections failed -- errors occurred; run in verbose mode for details > > > > > > > > > > > > test test_gzip failed -- Traceback (most recent call last): > > > > > > File "/home/neal/python/dev/py3k/Lib/test/test_gzip.py", line 77, in > > > > > > test_many_append > > > > > > ztxt = zgfile.read(8192) > > > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 236, in read > > > > > > self._read(readsize) > > > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 301, in _read > > > > > > self._read_eof() > > > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 317, in _read_eof > > > > > > crc32 = read32(self.fileobj) > > > > > > File "/home/neal/python/dev/py3k/Lib/gzip.py", line 40, in read32 > > > > > > return struct.unpack(" > > > > > File "/home/neal/python/dev/py3k/Lib/struct.py", line 97, in unpack > > > > > > return o.unpack(s) > > > > > > struct.error: unpack requires a string argument of length 4 > > > > > > > > > > > > test test_runpy failed -- Traceback (most recent call last): > > > > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 230, > > > > > > in test_run_module > > > > > > self._check_module(depth) > > > > > > File "/home/neal/python/dev/py3k/Lib/test/test_runpy.py", line 168, > > > > > > in _check_module > > > > > > d2 = run_module(mod_name) # Read from bytecode > > > > > > File "/home/neal/python/dev/py3k/Lib/runpy.py", line 72, in run_module > > > > > > raise ImportError("No module named %s" % mod_name) > > > > > > ImportError: No module named runpy_test > > > > > > > > > > > > test_textwrap was the last test to complete. test_thread was still running. > > > > > > > > > > > > n > > > > > > -- > > > > > > On 8/9/07, Neal Norwitz wrote: > > > > > > > On 8/9/07, Guido van Rossum wrote: > > > > > > > > This is done. The new py3k branch is ready for business. > > > > > > > > > > > > > > > > Left to do: > > > > > > > > > > > > > > > > - switch the buildbot and the doc builder to use the new branch (Neal) > > > > > > > > > > > > > > I've updated to use the new branch. I got the docs building, but > > > > > > > there are many more problems. I won't re-enable the cronjob until > > > > > > > more things are working. > > > > > > > > > > > > > > > There are currently about 7 failing unit tests left: > > > > > > > > > > > > > > > > test_bsddb > > > > > > > > test_bsddb3 > > > > > > > > test_email > > > > > > > > test_email_codecs > > > > > > > > test_email_renamed > > > > > > > > test_sqlite > > > > > > > > test_urllib2_localnet > > > > > > > > > > > > > > Ok, I disabled these, so if only they fail, mail shouldn't be sent > > > > > > > (when I enable the script). > > > > > > > > > > > > > > There are other problems: > > > > > > > * had to kill test_poplib due to taking all cpu without progress > > > > > > > * bunch of tests leak (./python ./Lib/test/regrtest.py -R 4:3: > > > > > > > test_foo test_bar ...) > > > > > > > * at least one test fails with a fatal error > > > > > > > * make install fails > > > > > > > > > > > > > > Here are the details (probably best to update the wiki with status > > > > > > > before people start working on these): > > > > > > > > > > > > > > I'm not sure what was happening with test_poplib. I had to kill > > > > > > > test_poplib due to taking all cpu without progress. When I ran it by > > > > > > > itself, it was fine. So there was some bad interaction with another > > > > > > > test. > > > > > > > > > > > > > > Ref leaks and fatal error (see > > > > > > > http://docs.python.org/dev/3.0/results/make-test-refleak.out): > > > > > > > test_array leaked [11, 11, 11] references, sum=33 > > > > > > > test_bytes leaked [4, 4, 4] references, sum=12 > > > > > > > test_codeccallbacks leaked [21, 21, 21] references, sum=63 > > > > > > > test_codecs leaked [260, 260, 260] references, sum=780 > > > > > > > test_ctypes leaked [10, 10, 10] references, sum=30 > > > > > > > Fatal Python error: > > > > > > > /home/neal/python/py3k/Modules/datetimemodule.c:1175 object at > > > > > > > 0xb60b19c8 has negative ref count -4 > > > > > > > > > > > > > > There are probably more, but I haven't had a chance to run more after > > > > > > > test_datetime. > > > > > > > > > > > > > > This failure occurred while running with -R: > > > > > > > > > > > > > > test test_coding failed -- Traceback (most recent call last): > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > > > > > > > line 12, in test_bad_coding2 > > > > > > > self.verify_bad_module(module_name) > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_coding.py", > > > > > > > line 20, in verify_bad_module > > > > > > > text = fp.read() > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/io.py", line 1148, in read > > > > > > > res += decoder.decode(self.buffer.read(), True) > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/encodings/ascii.py", > > > > > > > line 26, in decode > > > > > > > return codecs.ascii_decode(input, self.errors)[0] > > > > > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position > > > > > > > 0: ordinal not in range(128) > > > > > > > > > > > > > > > > > > > > > See http://docs.python.org/dev/3.0/results/make-install.out for this failure: > > > > > > > > > > > > > > Compiling /tmp/python-test-3.0/local/lib/python3.0/test/test_pep263.py ... > > > > > > > Traceback (most recent call last): > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > > > > 162, in > > > > > > > exit_status = int(not main()) > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > > > > 152, in main > > > > > > > force, rx, quiet): > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > > > > 89, in compile_dir > > > > > > > if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet): > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/compileall.py", line > > > > > > > 65, in compile_dir > > > > > > > ok = py_compile.compile(fullname, None, dfile, True) > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > > > > > 144, in compile > > > > > > > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > > > > > 49, in __init__ > > > > > > > tbtext = ''.join(traceback.format_exception_only(exc_type, exc_value)) > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/traceback.py", line > > > > > > > 179, in format_exception_only > > > > > > > filename = value.filename or "" > > > > > > > AttributeError: 'tuple' object has no attribute 'filename' > > > > > > > > > > > > > > I'm guessing this came from the change in exception args handling? > > > > > > > > > > > > > > File "/tmp/python-test-3.0/local/lib/python3.0/py_compile.py", line > > > > > > > 144, in compile > > > > > > > py_exc = PyCompileError(err.__class__,err.args,dfile or file) > > > > > > > > > > > > > > n > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > > > > > > > > -- > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Fri Aug 10 23:20:21 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Fri, 10 Aug 2007 17:20:21 -0400 Subject: [Python-3000] No (C) optimization flag In-Reply-To: <46BC9400.70803@gmail.com> References: <46BC9400.70803@gmail.com> Message-ID: On 8/10/07, Nick Coghlan wrote: > However we select between Python and native module versions, the build > bots need be set up to run the modules both ways (with and without C > optimisation). That is trivial to do without any runtime flags. For example for testing both the C and Python implementations of StringIO (and BytesIO), I define the Python implementation with a leading underscore and rename it if the C implementation is available: class _StringIO(TextIOWrapper): ... # Use the faster implementation of StringIO if available try: from _stringio import StringIO except ImportError: StringIO = _StringIO With this way, the Python implementation remains available for testing (or debugging). For testing the modules, I first check if the C implementation is available, then define the test according to that -- just check out by yourself: http://svn.python.org/view/python/branches/cpy_merge/Lib/test/test_memoryio.py?rev=56445&view=markup -- Alexandre From victor.stinner at haypocalc.com Sat Aug 11 01:49:10 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sat, 11 Aug 2007 01:49:10 +0200 Subject: [Python-3000] [Email-SIG] fix email module for python 3000 (bytes/str) In-Reply-To: <200708090241.08369.victor.stinner@haypocalc.com> References: <200708090241.08369.victor.stinner@haypocalc.com> Message-ID: <200708110149.10939.victor.stinner@haypocalc.com> Hi, On Thursday 09 August 2007 02:41:08 Victor Stinner wrote: > I started to work on email module to port it for Python 3000, but I have > trouble to understand if a function should returns bytes or str (because I > don't know email module). It's really hard to convert email module to Python 3000 because it does mix byte strings and (unicode) character strings... I wrote some notes about bytes/str helping people to migrate Python 2.x code to Python 3000, or at least to explain the difference between Python 2.x "str" type and Python 3000 "bytes" type: http://wiki.python.org/moin/BytesStr About email module, some deductions: test_email.py: openfile() must use 'rb' file mode for all tests base64MIME.decode() and base64MIME.encode() should accept bytes and str base64MIME.decode() result type is bytes base64MIME.encode() result type should be... bytes or str, no idea Other decode() and encode() functions should use same rules about types. Python modules (binascii and base64) choosed bytes type for encode result. Victor Stinner aka haypo http://hachoir.org/ From victor.stinner at haypocalc.com Sat Aug 11 02:25:27 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sat, 11 Aug 2007 02:25:27 +0200 Subject: [Python-3000] bytes: compare bytes to integer Message-ID: <200708110225.28056.victor.stinner@haypocalc.com> Hi, I don't like the behaviour of Python 3000 when we compare a bytes strings with length=1: >>> b'xyz'[0] == b'x' False The code can be see as: >>> ord(b'x') == b'x' False or also: >>> 120 == b'x' False Two solutions: 1. b'xyz'[0] returns a new bytes object (b'x' instead of 120) like b'xyz'[0:1] does 2. allow to compare a bytes string of 1 byte with an integer I prefer (2) since (1) is wrong: bytes contains integers and not bytes! Victor Stinner aka haypo http://hachoir.org/ From victor.stinner at haypocalc.com Sat Aug 11 02:35:43 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sat, 11 Aug 2007 02:35:43 +0200 Subject: [Python-3000] Fix imghdr module for bytes Message-ID: <200708110235.43664.victor.stinner@haypocalc.com> Hi, I just see that function what() of imghdr module requires str type for argument h which is totally wrong! An image file is composed of bytes and not characters. Attached patch should fix it. Notes: - I used .startswith() instead of h[:len(s)] == s - I used h[0] == ord(b'P') instead of h[0] == b'P' because the second syntax doesn't work (see my other email "bytes: compare bytes to integer") - str is allowed but doesn't work: what() always returns None I dislike "h[0] == ord(b'P')", in Python 2.x it's simply "h[0] == 'P'". A shorter syntax would be "h[0] == 80" but I prefer explicit test. It's maybe stupid, we manipulate bytes and not character, so "h[0] == 80" is acceptable... maybe with a comment? imghdr is included in unit tests? Victor Stinner http://hachoir.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: py3k-imghdr.patch Type: text/x-diff Size: 2512 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070811/de5fac37/attachment.bin From rhamph at gmail.com Sat Aug 11 02:45:33 2007 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 10 Aug 2007 18:45:33 -0600 Subject: [Python-3000] Fix imghdr module for bytes In-Reply-To: <200708110235.43664.victor.stinner@haypocalc.com> References: <200708110235.43664.victor.stinner@haypocalc.com> Message-ID: On 8/10/07, Victor Stinner wrote: > Hi, > > I just see that function what() of imghdr module requires str type for > argument h which is totally wrong! An image file is composed of bytes and not > characters. > > Attached patch should fix it. Notes: > - I used .startswith() instead of h[:len(s)] == s > - I used h[0] == ord(b'P') instead of h[0] == b'P' because the second syntax > doesn't work (see my other email "bytes: compare bytes to integer") > - str is allowed but doesn't work: what() always returns None > > I dislike "h[0] == ord(b'P')", in Python 2.x it's simply "h[0] == 'P'". A > shorter syntax would be "h[0] == 80" but I prefer explicit test. It's maybe > stupid, we manipulate bytes and not character, so "h[0] == 80" is > acceptable... maybe with a comment? Try h[0:1] == b'P'. Slicing will ensure it stays as a bytes object, rather than just giving the integer it contains. -- Adam Olsen, aka Rhamphoryncus From brotchie at gmail.com Sat Aug 11 02:51:15 2007 From: brotchie at gmail.com (James Brotchie) Date: Sat, 11 Aug 2007 10:51:15 +1000 Subject: [Python-3000] idle3.0 - is is supposed to work? In-Reply-To: <46BC0BE6.90908@v.loewis.de> References: <87tzr8ei80.fsf@hydra.hampton.thirdcreek.com> <46BB96DF.5060305@v.loewis.de> <87ps1we3ak.fsf@hydra.hampton.thirdcreek.com> <46BB9CD7.2030301@v.loewis.de> <87lkckdyk6.fsf@hydra.hampton.thirdcreek.com> <46BC0BE6.90908@v.loewis.de> Message-ID: <8e766a670708101751l5f1f3e7fh2e7e614520b2f7f0@mail.gmail.com> On 8/10/07, "Martin v. L?wis" wrote: > > >>> OTOH, IDLE ran w/o this error in p3yk... > >> Yes. Somebody would have to study what precisely the problem is: is it > >> that there is a None key in that dictionary, and that you must not use > >> None as a tag name? In that case: where does the None come from? > >> Or else: is it that you can use None as a tagname in 2.x, but can't > >> anymore in 3.0? If so: why not? > > > > OK, I'll start looking at it. > So did I, somewhat. It looks like a genuine bug in IDLE to me: you > can't use None as a tag name, AFAIU. I'm not quite sure why this > doesn't cause an exception in 2.x; if I try to give a None tag > separately (i.e. in a stand-alone program) in 2.5, > it gives me the same exception. In 2.x the 'tag configure None' call does indeed raise a TclError, this can been seen by trapping calls to Tkinter_Error in _tkinter.c and outputting the Tkapp_Result. For some reason on py3k this TclError doesn't get caught anywhere, whilst in 2.x it either gets caught or just disappears. This behaviour can be demonstrated with: def config_colors(self): for tag, cnf in self.tagdefs.items(): if cnf: try: self.tag_configure(tag, **cnf) except: sys.exit(1) self.tag_raise('sel') on py3k the exception is caught and execution stops on 2.x no exception is caught and execution continues (however Tkinter_Error is still called during Tkiner_Call execution!?) tag_configure doesn't behave this way when used in a trivial stand-alone program, must be some obscurity within idle. I'm confused.... James -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070811/c62dff31/attachment.htm From victor.stinner at haypocalc.com Sat Aug 11 03:09:00 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sat, 11 Aug 2007 03:09:00 +0200 Subject: [Python-3000] Fix sndhdr module for bytes Message-ID: <200708110309.01014.victor.stinner@haypocalc.com> Hi, As imghdr, sndhdr tests were strill based on Unicode strings instead of bytes. Attached patch should fix the module. I'm very, I was unable to test it. Note: I replaced aifc.openfp with aifc.open since it's the new public function. sndhdr requires some cleanup: it doesn't check division by zero in functions test_hcom and test_voc. I think that division by zero means that the file is invalid. I didn't want to fix these bugs in the same patch. So first I'm waiting your comments about this one :-) Victor Stinner http://hachoir.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: py3k-sndhdr.patch Type: text/x-diff Size: 3258 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070811/25d513cc/attachment-0001.bin From greg.ewing at canterbury.ac.nz Sat Aug 11 03:31:49 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 11 Aug 2007 13:31:49 +1200 Subject: [Python-3000] No (C) optimization flag In-Reply-To: References: Message-ID: <46BD1185.2080702@canterbury.ac.nz> Christian Heimes wrote: > But on the > other hand it is going to make debugging with pdb much harder because > pdb can't step into C code. But wouldn't the only reason you want to step into, e.g. pickle be if there were a bug in pickle itself? And if this happens when you're using the C version of pickle, you need to debug the C version. Debugging the Python version instead isn't going to help you. -- Greg From greg.ewing at canterbury.ac.nz Sat Aug 11 03:48:11 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 11 Aug 2007 13:48:11 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BC83BF.3000407@trueblade.com> References: <46B13ADE.7080901@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com> <46BC83BF.3000407@trueblade.com> Message-ID: <46BD155B.2010202@canterbury.ac.nz> Eric Smith wrote: > 1: "".format() ... understands which > types can be converted to other types, and does the conversions. > > 2: each type's __format__ function understands how to convert to some > subset of all types (int can convert to float and decimal, for example). > > The problem with approach 2 is that there's logic in > int.__format__() that understands float.__format__() specifiers, and > vice-versa. At least with approach 1, all of this logic is in one place. Whereas the problem with approach 1 is that it's not extensible. You can't add new types with new format specifiers that can be interconverted. I don't think the logic needs to be complicated. As long as the format spec syntaxes are chosen sensibly, it's not necessary for e.g. int.__format__ to be able to parse float's format specs, only to recognise when it's got one. That could be as simple as if spec[:1] in 'efg': return float(self).__format__(spec) > This implies that string and repr specifiers are discernible across all > types, and int and float specifiers are unique amongst themselves. Another advantage of letting the __format__ methods handle it is that a given type *can* handle another type's format spec itself if it wants. E.g. if float has some way of handling the 'd' format that's considered better than converting to int first, then it can do that. -- Greg From greg.ewing at canterbury.ac.nz Sat Aug 11 03:57:42 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 11 Aug 2007 13:57:42 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BCA9C9.1010306@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> Message-ID: <46BD1796.3000904@canterbury.ac.nz> Ron Adam wrote: > > I'm not sure what you mean by "ditch all of these". I was guessing that what's meant by returning a (value, format_spec) tuple is to re-try the formatting using the new value and spec. That's what I thought was unnecessary, since the method can do that itself if it wants. > The output in this case is a null string. Setting the format spec to > None tell the format() function not to do anything more to it. Then why not just return an empty string? > The format function would first call the objects __format__ method and > give it a chance to have control, and depending on what is returned, try > to handle it or not. Okay, I see now -- your format function has more smarts in it than mine. But as was suggested earlier, returning NotImplemented ought to be enough to signal a fallback to a different strategy. -- Greg From greg.ewing at canterbury.ac.nz Sat Aug 11 04:03:39 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 11 Aug 2007 14:03:39 +1200 Subject: [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References: Message-ID: <46BD18FB.5030901@canterbury.ac.nz> Guido van Rossum wrote: > However, the old universal newlines feature also set an attibute named > 'newlines' on the file object to a tuple of up to three elements > giving the actual line endings that were observed on the file so far > (\r, \n, or \r\n). I've never used it, but I can see how it could be useful, e.g. if you're implementing a text editor that wants to be able to save the file back in the same format it had before. But such a specialised use could just as well be provided by a library facility, such as a wrapper class around a raw I/O stream. -- Greg From eric+python-dev at trueblade.com Sat Aug 11 05:30:33 2007 From: eric+python-dev at trueblade.com (Eric V. Smith) Date: Fri, 10 Aug 2007 23:30:33 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BD155B.2010202@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com> <46BC83BF.3000407@trueblade.com> <46BD155B.2010202@canterbury.ac.nz> Message-ID: <46BD2D59.1040209@trueblade.com> Greg Ewing wrote: > Eric Smith wrote: >> 1: "".format() ... understands which >> types can be converted to other types, and does the conversions. >> >> 2: each type's __format__ function understands how to convert to some >> subset of all types (int can convert to float and decimal, for example). >> >> The problem with approach 2 is that there's logic in >> int.__format__() that understands float.__format__() specifiers, and >> vice-versa. At least with approach 1, all of this logic is in one place. > > Whereas the problem with approach 1 is that it's not > extensible. You can't add new types with new format > specifiers that can be interconverted. Granted. > I don't think the logic needs to be complicated. As > long as the format spec syntaxes are chosen sensibly, > it's not necessary for e.g. int.__format__ to be able > to parse float's format specs, only to recognise when > it's got one. That could be as simple as > > if spec[:1] in 'efg': > return float(self).__format__(spec) Right. Your "if" test is my is_float_specifier function. The problem is that this needs to be shared between int and float and string, and anything else (maybe decimal?) that can be converted to a float. Maybe we should make is_float_specifier a classmethod of float[1], so that int's __format__ (and also string's __format__) could say: if float.is_float_specifier(spec): return float(self).__format__(spec) And float's __format__ function could do all of the specifier testing, for types it knows to convert itself to, and then say: if not float.is_float_specifier(spec): return NotImplemented else: # do the actual formatting And then presumably the top-level "".format() could check for NotImplemented and convert the value to a string and use the specifier on that: result = value.__format__(spec) if result is NotImplemented: return str(value).__format__(spec) else: return result Then we could take my approach number 1 above, but have the code that does the specifier testing be centralized. I agree that central to all of this is choosing specifiers sensibly, for those types that we expect to supply interconversions (great word!). For types that won't be participating in any conversions, such as date or user defined types, no such sensible specifiers are needed. > Another advantage of letting the __format__ methods > handle it is that a given type *can* handle another > type's format spec itself if it wants. E.g. if float > has some way of handling the 'd' format that's > considered better than converting to int first, then > it can do that. But then float's int formatting code would have to fully implement the int formatter. You couldn't add functionality to the int formatter (and its specifiers) without updating the code in 2 places: both int and float. Eric. [1]: If we make it a class method, it could just be is_specifier(), or maybe __is_specifier__. This name would be implemented by all types that participate in the interconversions we're describing. From rrr at ronadam.com Sat Aug 11 05:50:02 2007 From: rrr at ronadam.com (Ron Adam) Date: Fri, 10 Aug 2007 22:50:02 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BD1796.3000904@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD1796.3000904@canterbury.ac.nz> Message-ID: <46BD31EA.6040507@ronadam.com> Greg Ewing wrote: > Ron Adam wrote: >> I'm not sure what you mean by "ditch all of these". > > I was guessing that what's meant by returning a > (value, format_spec) tuple is to re-try the > formatting using the new value and spec. That's > what I thought was unnecessary, since the method > can do that itself if it wants. It's not retrying because it hasn't tried yet. As you noted below I think. It lets the __format__ method do its thing first and then depending on what it gets back, it (the format function) may or may not do any formatting. It's handy to pass both the format specifier and the value both times as the __format__ function may only alter one or the other. >> The output in this case is a null string. Setting the format spec to >> None tell the format() function not to do anything more to it. > > Then why not just return an empty string? Because an empty string is a valid string. It can be expanded to a minimum width which me may not want to do. Now if returning a single value was equivalent to returning ('', None) or ('', ''), then that would work. The format function could check for that case. >> The format function would first call the objects __format__ method and >> give it a chance to have control, and depending on what is returned, try >> to handle it or not. > > Okay, I see now -- your format function has more smarts > in it than mine. Yes, enough so that you can either *not have* a __format__ method as the default, or supply "object" with a very simple generic one. Which is a very easy way to give all objects their __format__ methods. In the case of not having one, format would check for it. In the case of a vary simple one that doesn't do anything, it just gets back what it sent, or possibly a NotImplemented exception. Which ever strategy is faster should probably be the one chosen here. > But as was suggested earlier, returning NotImplemented > ought to be enough to signal a fallback to a different > strategy. That isn't the case we're referring to, it's the case where we want to suppress a fall back choice that's available. Cheers, Ron From greg.ewing at canterbury.ac.nz Sat Aug 11 07:08:29 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 11 Aug 2007 17:08:29 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BD31EA.6040507@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD1796.3000904@canterbury.ac.nz> <46BD31EA.6040507@ronadam.com> Message-ID: <46BD444D.1060907@canterbury.ac.nz> Ron Adam wrote: > > Greg Ewing wrote: > > > Then why not just return an empty string? > > Because an empty string is a valid string. It can be expanded to a > minimum width which me may not want to do. I'm not seeing a use case for this. If the user says he wants his field a certain minimum width, what business does the type have overriding that? -- Greg From talin at acm.org Sat Aug 11 08:48:56 2007 From: talin at acm.org (Talin) Date: Fri, 10 Aug 2007 23:48:56 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BCA9C9.1010306@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> Message-ID: <46BD5BD8.7030706@acm.org> I'm going to address several issues of the discussion, in hopes of short-cutting through some of the debate. I may not be responding to the correct person in all cases. Ron Adam wrote: > If you want the 'r' specifier to always have precedence over even custom > __format__ methods, then you can do that too, but I don't see the need. In my conversation with Guido, he felt pretty strongly that he wanted 'repr' to be able to override the type-specific __format__ function. I'm assuming, therefore, that this is a non-negotiable design constraint. Unfortunately, it's one that completely destroys the nice neat orthogonal design that you've proposed. I often find that the best way to reason about irreconcilable design constraints is to reduce them to a set of contradictory logical propositions: a) A __format__ method must be redefine the meaning of a format specifier. b) The 'repr' option must be able to take precedence the __format__ method. The only possible resolution to this di-lemma is that the 'repr' option must not be part of the format specifier, but rather must be part of something else. Assuming that we continue with the assumption that we want to delegate as much as possible to the __format__ methods of individual types, this means that we are pretty much forced to divide the format string into two pieces, which are: 1) The part that __format__ is allowed to reinterpret. 2) The part that __format__ is required to implement without reinterpreting. ---- Now, as far as delegating formatting between types: We don't need a hyper-extensible system for delegating to different formatters. For all Python types except the numeric types, the operation of __format__ is pretty simple: the format specifier is passed to __format__ and that's it. If the __format__ method can't handle the specifier, that's an error, end of story. Numeric types are special because of the fact that they are silently inter-convertable to each other. For example, you can add a float and an int, and Python will just do the right thing without complaining. It means that a Python programmer may not always know the exact numeric type they are dealing with - and this is a feature IMHO. Therefore, it's my strong belief that you should be able to format any numeric type without knowing exactly what type it is. Which means that all numeric types need to be able to handle, in some way, all valid number format strings. IMHO. Fortunately, the set of number types is small and fixed, and is not likely to increase any time soon. And this requirement does *not* apply to any data type other than numbers. ---- As to the issue of how flexible the system should be: My belief is that one of the primary design criteria for the format specifier mini-language is that it doesn't detract from the readability of the format string. So for example, if I have a string "Total: {0:d} Tax: {1:d}", it's fairly easy for me to mentally filter out the "{0:d}" part and replace it with a number, and this in turn lets me imagine how the string might look when printed. (The older syntax, 'Total: %d', was even better in this regard, but when you start to use named/numbered fields it actually is worse. And implicit ordering is brittle.) My design goal here is relatively simple: For the most common use cases, the format field shouldn't me much longer (if it all) than the value to be printed would be. For uncommon cases, where the programmer is invoking additional options, the format field can be longer, but it should still be kept concise. ---- One final thing I wanted to mention, which Guido reminded me, is that we're getting short on time. This PEP has not yet been officially accepted, and the reason is because of the lack of an implementation. I don't want to miss the boat. (The boat in this case being Alpha 1.) -- Talin From stephen at xemacs.org Sat Aug 11 09:02:07 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 11 Aug 2007 16:02:07 +0900 Subject: [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: <46BD18FB.5030901@canterbury.ac.nz> References: <46BD18FB.5030901@canterbury.ac.nz> Message-ID: <874pj67dw0.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Guido van Rossum wrote: > > However, the old universal newlines feature also set an attibute named > > 'newlines' on the file object to a tuple of up to three elements > > giving the actual line endings that were observed on the file so far > > (\r, \n, or \r\n). > > I've never used it, but I can see how it could be > useful, e.g. if you're implementing a text editor > that wants to be able to save the file back in > the same format it had before. But if there's more than one line ending used, that's not good enough. Universal newlines is a wonderful convenience for most text usage, but if you really need to be able to preserve format, it's not going to be enough. I think it's best for universal newlines to be simple. Let fancy facilities be provided by a library wrapping raw IO, as you suggest. From nnorwitz at gmail.com Sat Aug 11 09:16:28 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sat, 11 Aug 2007 00:16:28 -0700 Subject: [Python-3000] release plans (was: More PEP 3101 changes incoming) Message-ID: On 8/10/07, Talin wrote: > > One final thing I wanted to mention, which Guido reminded me, is that > we're getting short on time. This PEP has not yet been officially > accepted, and the reason is because of the lack of an implementation. I > don't want to miss the boat. (The boat in this case being Alpha 1.) Alpha 1 is a few *weeks* away. The release will hopefully come shortly after the sprint at Google which is Aug 22-25. http://wiki.python.org/moin/GoogleSprint It's time to get stuff done! There's still a ton of work to do: http://wiki.python.org/moin/Py3kToDo One thing not mentioned on the wiki is finding and removing all the cruft in the code and docs that has been deprecated. If you know of any of these things, please add a note to the wiki. Some common strings to look for are: deprecated, compatibility, b/w, backward, and obsolete. n From rrr at ronadam.com Sat Aug 11 10:48:27 2007 From: rrr at ronadam.com (Ron Adam) Date: Sat, 11 Aug 2007 03:48:27 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BD444D.1060907@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD1796.3000904@canterbury.ac.nz> <46BD31EA.6040507@ronadam.com> <46BD444D.1060907@canterbury.ac.nz> Message-ID: <46BD77DB.60405@ronadam.com> Greg Ewing wrote: > Ron Adam wrote: >> Greg Ewing wrote: > > >>> Then why not just return an empty string? >> Because an empty string is a valid string. It can be expanded to a >> minimum width which me may not want to do. > > I'm not seeing a use case for this. If the user says > he wants his field a certain minimum width, what > business does the type have overriding that? Just pointing out *one* capability that is possible, Maybe it was just a poor choice as an example. Cheers, Ron From talin at acm.org Sat Aug 11 10:57:16 2007 From: talin at acm.org (Talin) Date: Sat, 11 Aug 2007 01:57:16 -0700 Subject: [Python-3000] Format specifier proposal Message-ID: <46BD79EC.1020301@acm.org> Taking some ideas from the various threads, here's what I'd like to propose: (Assume that brackets [] means 'optional field') [:[type][align][sign][[0]minwidth][.precision]][/fill][!r] Examples: :f # Floating point number of natural width :f10 # Floating point number, width at least 10 :f010 # Floating point number, width at least 10, leading zeros :f.2 # Floating point number with two decimal digits :8 # Minimum width 8, type defaults to natural type :d+2 # Integer number, 2 digits, sign always shown !r # repr() format :10!r # Field width 10, repr() format :s10 # String right-aligned within field of minimum width # of 10 chars. :s10.10 # String right-aligned within field of minimum width # of 10 chars, maximum width 10. :s<10 # String left-aligned in 10 char (min) field. :d^15 # Integer centered in 15 character field :>15/. # Right align and pad with '.' chars :f<+015.5 # Floating point, left aligned, always show sign, # leading zeros, field width 15 (min), 5 decimal places. Notes: -- Leading zeros is different than fill character, although the two are mutually exclusive. (Leading zeros always go between the sign and the number, padding does not.) -- For strings, precision is used as maximum field width. -- __format__ functions are not allowed to re-interpret '!r'. I realize that the grouping of things is a little odd - for example, it would be nice to put minwidth, padding and alignment in their own little group so that they could be processed independently from __format__. However: -- Since minwidth is the most common option, I wanted it to have no special prefix char. -- I wanted precision to come after minwidth, since the 'm.n' format feels intuitive and traditional. -- I wanted type to come first, since it affects how some attributes are interpreted. -- Putting the sign right before the width field also feels right. The regex for interpreting this, BTW, is something like the following: "(?:\:([a-z])?(<|>|\^)?(+|-)?(\d+)(\.\d+))(/.)?(!r)?" (Although it may make more sense to allow the fill and regex fields to appear in any order. In other words, any field that is identified by a unique prefix char can be specified in any order.) -- Talin From rrr at ronadam.com Sat Aug 11 11:48:22 2007 From: rrr at ronadam.com (Ron Adam) Date: Sat, 11 Aug 2007 04:48:22 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BD5BD8.7030706@acm.org> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> Message-ID: <46BD85E6.1030005@ronadam.com> Talin wrote: > I'm going to address several issues of the discussion, in hopes of > short-cutting through some of the debate. I may not be responding to the > correct person in all cases. > > Ron Adam wrote: >> If you want the 'r' specifier to always have precedence over even >> custom __format__ methods, then you can do that too, but I don't see >> the need. > > In my conversation with Guido, he felt pretty strongly that he wanted > 'repr' to be able to override the type-specific __format__ function. > > I'm assuming, therefore, that this is a non-negotiable design > constraint. Unfortunately, it's one that completely destroys the nice > neat orthogonal design that you've proposed. It doesn't completely destroy it, but it does create one exception that needs to be remembered. As I said, it doesn't have to be a special syntax. {0:r} is just fine. > I often find that the best way to reason about irreconcilable design > constraints is to reduce them to a set of contradictory logical > propositions: > > a) A __format__ method must be redefine the meaning of a format > specifier. > b) The 'repr' option must be able to take precedence the __format__ > method. > > The only possible resolution to this di-lemma is that the 'repr' option > must not be part of the format specifier, but rather must be part of > something else. You are either going to have one or the other, but never both, so there isn't any conflict. 'ojbect: {0:r}'.format(obj) or 'object: {0:s}'.format(repr(obj)) These don't collide in any way. The only question is weather the 'r' specifier also allows for other options like width and alignment. > Assuming that we continue with the assumption that we want to delegate > as much as possible to the __format__ methods of individual types, this > means that we are pretty much forced to divide the format string into > two pieces, which are: > > 1) The part that __format__ is allowed to reinterpret. > 2) The part that __format__ is required to implement without > reinterpreting. What should not be allowed? And why? > Now, as far as delegating formatting between types: We don't need a > hyper-extensible system for delegating to different formatters. > > For all Python types except the numeric types, the operation of > __format__ is pretty simple: the format specifier is passed to > __format__ and that's it. If the __format__ method can't handle the > specifier, that's an error, end of story. > > Numeric types are special because of the fact that they are silently > inter-convertable to each other. For example, you can add a float and an > int, and Python will just do the right thing without complaining. It > means that a Python programmer may not always know the exact numeric > type they are dealing with - and this is a feature IMHO. > > Therefore, it's my strong belief that you should be able to format any > numeric type without knowing exactly what type it is. Which means that > all numeric types need to be able to handle, in some way, all valid > number format strings. IMHO. > > Fortunately, the set of number types is small and fixed, and is not > likely to increase any time soon. And this requirement does *not* apply > to any data type other than numbers. If this is the direction you and Guido want, I'll try to help make it work. It seems it's also what greg is thinking of. I think to get the python emplementation moving, we need to impliment the whole event chain, not just a function. So starting with a fstr() type and then possibly fint() and ffloat() etc... where each subclass their respective types and add format and __format__ methods respectively. (for now and for testing) Then working out the rest will be more productive I think. > ---- > > As to the issue of how flexible the system should be: Umm... I think you got side tracked here. The below paragraphs are about readability and syntax, not how flexible of the underlying system. The question of flexibility is more to do with what things can be allowed to be overridden, and what things should not. For example just how much control does a __format__ method have? I thought the idea was the entire format specifier is sent to the __format__ method and then it can do what it wants with it. Possibly replacing the specifier altogether and initiating another types __format__ method with the substituted format specifier. Which sounds just fine to me and is what I want too. So far we established the repr formatter can not be over ridden, but not addressed anything else or how to do that specifically. (Weve addressed it in generally terms yes, but we haven't gotten into the details.) > My belief is that one of the primary design criteria for the format > specifier mini-language is that it doesn't detract from the readability > of the format string. > > So for example, if I have a string "Total: {0:d} Tax: {1:d}", it's > fairly easy for me to mentally filter out the "{0:d}" part and replace > it with a number, and this in turn lets me imagine how the string might > look when printed. > > (The older syntax, 'Total: %d', was even better in this regard, but when > you start to use named/numbered fields it actually is worse. And > implicit ordering is brittle.) > > My design goal here is relatively simple: For the most common use cases, > the format field shouldn't me much longer (if it all) than the value to > be printed would be. For uncommon cases, where the programmer is > invoking additional options, the format field can be longer, but it > should still be kept concise. > > ---- > > One final thing I wanted to mention, which Guido reminded me, is that > we're getting short on time. This PEP has not yet been officially > accepted, and the reason is because of the lack of an implementation. I > don't want to miss the boat. (The boat in this case being Alpha 1.) I'll forward what I've been playing with so you can take a look. It's not the preferred way, but it may have some things in it you can use. And it's rather incomplete still. Cheers, Ron From benji at benjiyork.com Sat Aug 11 14:32:03 2007 From: benji at benjiyork.com (Benji York) Date: Sat, 11 Aug 2007 08:32:03 -0400 Subject: [Python-3000] No (C) optimization flag In-Reply-To: <46BD1185.2080702@canterbury.ac.nz> References: <46BD1185.2080702@canterbury.ac.nz> Message-ID: <46BDAC43.3050904@benjiyork.com> Greg Ewing wrote: > Christian Heimes wrote: >> But on the >> other hand it is going to make debugging with pdb much harder because >> pdb can't step into C code. > > But wouldn't the only reason you want to step into, > e.g. pickle be if there were a bug in pickle itself? I believe he's talking about a situation where pickle calls back into Python. -- Benji York http://benjiyork.com From martin at v.loewis.de Sat Aug 11 16:25:42 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 11 Aug 2007 16:25:42 +0200 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: <46BDC6E6.1080601@v.loewis.de> > test_array leaked [11, 11, 11] references, sum=33 I fixed that in r56924 Regards, Martin From martin at v.loewis.de Sat Aug 11 16:46:29 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 11 Aug 2007 16:46:29 +0200 Subject: [Python-3000] Console encoding detection broken In-Reply-To: References: <46BC02FC.6080107@v.loewis.de> Message-ID: <46BDCBC5.2060503@v.loewis.de> > Feel free to add code that implements this. I suppose it would be a > good idea to have a separate function io.guess_console_encoding(...) > which takes some argument (perhaps a raw file?) and returns an > encoding name, never None. This could then be implemented by switching > on the platform into platform-specific functions and a default. I've added os.device_encoding, which returns the terminal's encoding if possible. If the device is not a terminal, it falls back to locale.getpreferredencoding(). Regards, Martin From tony at PageDNA.com Sat Aug 11 18:45:37 2007 From: tony at PageDNA.com (Tony Lownds) Date: Sat, 11 Aug 2007 09:45:37 -0700 Subject: [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References: Message-ID: On Aug 10, 2007, at 11:23 AM, Guido van Rossum wrote: > Python 3.0 currently has limited universal newlines support: by > default, \r\n is translated into \n for text files, but this can be > controlled by the newline= keyword parameter. For details on how, see > PEP 3116. The PEP prescribes that a lone \r must also be translated, > though this hasn't been implemented yet (any volunteers?). > I'm working on this, but now I'm not sure how the file is supposed to be read when the newline parameter is \r or \r\n. Here's the PEP language: buffer is a reference to the BufferedIOBase object to be wrapped with the TextIOWrapper. encoding refers to an encoding to be used for translating between the byte-representation and character-representation. If it is None, then the system's locale setting will be used as the default. newline can be None, '\n', '\r', or '\r\n' (all other values are illegal); it indicates the translation for '\n' characters written. If None, a system-specific default is chosen, i.e., '\r\n' on Windows and '\n' on Unix/Linux. Setting newline='\n' on input means that no CRLF translation is done; lines ending in '\r\n' will be returned as '\r\n'. ('\r' support is still needed for some OSX applications that produce files using '\r' line endings; Excel (when exporting to text) and Adobe Illustrator EPS files are the most common examples. Is this ok: when newline='\r\n' or newline='\r' is passed, only that string is used to determine the end of lines. No translation to '\n' is done. > However, the old universal newlines feature also set an attibute named > 'newlines' on the file object to a tuple of up to three elements > giving the actual line endings that were observed on the file so far > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not > implemented. I'm tempted to kill it. Does anyone have a use case for > this? Has anyone even ever used this? > This strikes me as a pragmatic feature, making it easy to read a file and write back the same line ending. I can include in patch. http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22 +show:cz2Fhijwr3s:yutdXigOmYY:YDns9IyEkLQ&sa=N&cd=12&ct=rc&cs_p=http://f tp.gnome.org/pub/gnome/sources/meld/1.0/ meld-1.0.0.tar.bz2&cs_f=meld-1.0.0/filediff.py#a0 http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22 +show:SLyZnjuFadw:kOTmKU8aU2I:VX_dFr3mrWw&sa=N&cd=37&ct=rc&cs_p=http://s vn.python.org/projects/ctypes/trunk&cs_f=ctypeslib/ctypeslib/ dynamic_module.py#a0 Thanks -Tony From guido at python.org Sat Aug 11 19:29:38 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Aug 2007 10:29:38 -0700 Subject: [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References: Message-ID: On 8/11/07, Tony Lownds wrote: > > On Aug 10, 2007, at 11:23 AM, Guido van Rossum wrote: > > > Python 3.0 currently has limited universal newlines support: by > > default, \r\n is translated into \n for text files, but this can be > > controlled by the newline= keyword parameter. For details on how, see > > PEP 3116. The PEP prescribes that a lone \r must also be translated, > > though this hasn't been implemented yet (any volunteers?). > > > > I'm working on this, but now I'm not sure how the file is supposed to > be read when > the newline parameter is \r or \r\n. Here's the PEP language: > > buffer is a reference to the BufferedIOBase object to be wrapped > with the TextIOWrapper. > encoding refers to an encoding to be used for translating between > the byte-representation > and character-representation. If it is None, then the system's > locale setting will be used > as the default. newline can be None, '\n', '\r', or '\r\n' (all > other values are illegal); > it indicates the translation for '\n' characters written. If None, > a system-specific default > is chosen, i.e., '\r\n' on Windows and '\n' on Unix/Linux. Setting > newline='\n' on input > means that no CRLF translation is done; lines ending in '\r\n' > will be returned as '\r\n'. > ('\r' support is still needed for some OSX applications that > produce files using '\r' line > endings; Excel (when exporting to text) and Adobe Illustrator EPS > files are the most common examples. > > Is this ok: when newline='\r\n' or newline='\r' is passed, only that > string is used to determine > the end of lines. No translation to '\n' is done. I *think* it would be more useful if it always returned lines ending in \n (not \r\n or \r). Wouldn't it? Although this is not how it currently behaves; when you set newline='\r\n', it returns the \r\n unchanged, so it would make sense to do this too when newline='\r'. Caveat user I guess. > > However, the old universal newlines feature also set an attibute named > > 'newlines' on the file object to a tuple of up to three elements > > giving the actual line endings that were observed on the file so far > > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not > > implemented. I'm tempted to kill it. Does anyone have a use case for > > this? Has anyone even ever used this? > > > > This strikes me as a pragmatic feature, making it easy to read a file > and write back the same line ending. I can include in patch. OK, if you think you can, that's good. It's not always sufficient (not if there was a mix of line endings) but it's a start. > http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22 > +show:cz2Fhijwr3s:yutdXigOmYY:YDns9IyEkLQ&sa=N&cd=12&ct=rc&cs_p=http://f > tp.gnome.org/pub/gnome/sources/meld/1.0/ > meld-1.0.0.tar.bz2&cs_f=meld-1.0.0/filediff.py#a0 > > http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22 > +show:SLyZnjuFadw:kOTmKU8aU2I:VX_dFr3mrWw&sa=N&cd=37&ct=rc&cs_p=http://s > vn.python.org/projects/ctypes/trunk&cs_f=ctypeslib/ctypeslib/ > dynamic_module.py#a0 -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Aug 11 19:53:02 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Aug 2007 10:53:02 -0700 Subject: [Python-3000] Four new failing tests Message-ID: I see four tests fail that passed yesterday: < test_csv < test_shelve < test_threaded_import < test_wsgiref Details: test_csv: one error ====================================================================== ERROR: test_char_write (__main__.TestArrayWrites) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_csv.py", line 648, in test_char_write a = array.array('u', string.letters) ValueError: string length not a multiple of item size test_shelve: 9 error, last: ====================================================================== ERROR: test_write (__main__.TestProto2FileShelve) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/google/home/guido/python/py3k/Lib/test/mapping_tests.py", line 118, in test_write self.failIf(knownkey in d) File "/usr/local/google/home/guido/python/py3k/Lib/shelve.py", line 92, in __contains__ return key.encode(self.keyencoding) in self.dict TypeError: gdbm key must be string, not bytes test_threaded_import: Trying 20 threads ... OK. Trying 50 threads ... OK. Trying 20 threads ... OK. Trying 50 threads ... OK. Trying 20 threads ... OK. Trying 50 threads ... OK. Traceback (most recent call last): File "Lib/test/test_threaded_import.py", line 75, in test_main() File "Lib/test/test_threaded_import.py", line 72, in test_main test_import_hangers() File "Lib/test/test_threaded_import.py", line 36, in test_import_hangers raise TestFailed(test.threaded_import_hangers.errors) test.test_support.TestFailed: ['tempfile.TemporaryFile appeared to hang'] testing import hangers ... [54531 refs] test_wsgiref: 3 errors, last: ====================================================================== ERROR: test_validated_hello (__main__.IntegrationTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_wsgiref.py", line 145, in test_validated_hello out, err = run_amock(validator(hello_app)) File "Lib/test/test_wsgiref.py", line 58, in run_amock server.finish_request((inp, out), ("127.0.0.1",8888)) File "/usr/local/google/home/guido/python/py3k/Lib/SocketServer.py", line 254, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/local/google/home/guido/python/py3k/Lib/SocketServer.py", line 522, in __init__ self.handle() File "/usr/local/google/home/guido/python/py3k/Lib/wsgiref/simple_server.py", line 131, in handle if not self.parse_request(): # An error code has been sent, just exit File "/usr/local/google/home/guido/python/py3k/Lib/BaseHTTPServer.py", line 283, in parse_request text = io.TextIOWrapper(self.rfile) File "/usr/local/google/home/guido/python/py3k/Lib/io.py", line 975, in __init__ encoding = os.device_encoding(buffer.fileno()) File "/usr/local/google/home/guido/python/py3k/Lib/io.py", line 576, in fileno return self.raw.fileno() File "/usr/local/google/home/guido/python/py3k/Lib/io.py", line 299, in fileno self._unsupported("fileno") File "/usr/local/google/home/guido/python/py3k/Lib/io.py", line 185, in _unsupported (self.__class__.__name__, name)) io.UnsupportedOperation: BytesIO.fileno() not supported -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Sat Aug 11 20:39:10 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sat, 11 Aug 2007 11:39:10 -0700 Subject: [Python-3000] Four new failing tests In-Reply-To: References: Message-ID: On 8/11/07, Guido van Rossum wrote: > I see four tests fail that passed yesterday: > > < test_csv > < test_shelve > < test_threaded_import > < test_wsgiref The only failure I could reproduce was test_wsgiref. That problem was fixed in 56932. I had updated the previous revision and built from make clean. I wonder if there are some subtle stability problems given the various intermittent problems we've seen. If anyone has time, it would be interesting to use valgrind or purify on 3k. n From tony at pagedna.com Sat Aug 11 20:41:08 2007 From: tony at pagedna.com (Tony Lownds) Date: Sat, 11 Aug 2007 11:41:08 -0700 Subject: [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References: Message-ID: On Aug 11, 2007, at 10:29 AM, Guido van Rossum wrote: >> Is this ok: when newline='\r\n' or newline='\r' is passed, only that >> string is used to determine >> the end of lines. No translation to '\n' is done. > > I *think* it would be more useful if it always returned lines ending > in \n (not \r\n or \r). Wouldn't it? Although this is not how it > currently behaves; when you set newline='\r\n', it returns the \r\n > unchanged, so it would make sense to do this too when newline='\r'. > Caveat user I guess. Because there's an easy way to translate, having the option to not translate apply to all valid newline values is probably more useful. I do think it's easier to define the behavior this way. > OK, if you think you can, that's good. It's not always sufficient (not > if there was a mix of line endings) but it's a start. Right -Tony From martin at v.loewis.de Sat Aug 11 21:29:31 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 11 Aug 2007 21:29:31 +0200 Subject: [Python-3000] Four new failing tests In-Reply-To: References: Message-ID: <46BE0E1B.3050000@v.loewis.de> > test_csv: one error > ====================================================================== > ERROR: test_char_write (__main__.TestArrayWrites) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "Lib/test/test_csv.py", line 648, in test_char_write > a = array.array('u', string.letters) > ValueError: string length not a multiple of item size Please try again. gdbm wasn't using bytes properly. Regards, Martin From martin at v.loewis.de Sat Aug 11 21:39:46 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 11 Aug 2007 21:39:46 +0200 Subject: [Python-3000] Four new failing tests In-Reply-To: References: Message-ID: <46BE1082.40300@v.loewis.de> > ====================================================================== > ERROR: test_char_write (__main__.TestArrayWrites) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "Lib/test/test_csv.py", line 648, in test_char_write > a = array.array('u', string.letters) > ValueError: string length not a multiple of item size I think some decision should be made wrt. string.letters. Clearly, string.letters cannot reasonably contain *all* letters (i.e. all characters of categories Ll, Lu, Lt, Lo). Or can it? Traditionally, string.letters contained everything that is a letter in the current locale. Still, computing this string might be expensive assuming you have to go through all Unicode code points and determine whether they are letters in the current locale. So I see the following options: 1. remove it entirely. Keep string.ascii_letters instead 2. remove string.ascii_letters, and make string.letters to be ASCII only. 3. Make string.letters contain all letters in the current locale. 4. Make string.letters truly contain everything that is classified as a letter in the Unicode database. Which one should happen? Regards, Martin From rhamph at gmail.com Sat Aug 11 22:46:07 2007 From: rhamph at gmail.com (Adam Olsen) Date: Sat, 11 Aug 2007 14:46:07 -0600 Subject: [Python-3000] Four new failing tests In-Reply-To: <46BE1082.40300@v.loewis.de> References: <46BE1082.40300@v.loewis.de> Message-ID: On 8/11/07, "Martin v. L?wis" wrote: > > ====================================================================== > > ERROR: test_char_write (__main__.TestArrayWrites) > > ---------------------------------------------------------------------- > > Traceback (most recent call last): > > File "Lib/test/test_csv.py", line 648, in test_char_write > > a = array.array('u', string.letters) > > ValueError: string length not a multiple of item size > > I think some decision should be made wrt. string.letters. > > Clearly, string.letters cannot reasonably contain *all* letters > (i.e. all characters of categories Ll, Lu, Lt, Lo). Or can it? > > Traditionally, string.letters contained everything that is a letter > in the current locale. Still, computing this string might be expensive > assuming you have to go through all Unicode code points and determine > whether they are letters in the current locale. > > So I see the following options: > 1. remove it entirely. Keep string.ascii_letters instead > 2. remove string.ascii_letters, and make string.letters to be > ASCII only. > 3. Make string.letters contain all letters in the current locale. > 4. Make string.letters truly contain everything that is classified > as a letter in the Unicode database. Wasn't unicodedata.ascii_letters suggested at one point (to eliminate the string module), or was that my imagination? IMO, if there is a need for unicode or locale letters, we should provide a function to generate them as needed. It can be passed directly to set or whatever datastructure is actually needed. We shouldn't burden the startup cost with such a large datastructure unless absolutely necessary (nor should we use a property to load it when first needed; expensive to compute attribute and all that). -- Adam Olsen, aka Rhamphoryncus From greg.ewing at canterbury.ac.nz Sun Aug 12 03:52:11 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 12 Aug 2007 13:52:11 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BD5BD8.7030706@acm.org> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> Message-ID: <46BE67CB.9010101@canterbury.ac.nz> Talin wrote: > we are pretty much forced to divide the format string into > two pieces, which are: > > 1) The part that __format__ is allowed to reinterpret. > 2) The part that __format__ is required to implement without > reinterpreting. or 2) the part that format() interprets itself without involving the __format__ method. The trouble is, treating 'r' this way means inventing a whole new part of the format string whose *only* use is to provide a way of specifying 'r'. Furthermore, whenever this part is used, the form of the regular format spec that goes with it will be highly constrained, as it doesn't make sense to use anything other than an 's'-type format along with 'r'. Given the extreme non-orthogonality that these two parts of the format spec would have, separating them doesn't seem like a good idea to me. It looks like excessive purity at the expense of practicality. > This PEP has not yet been officially > accepted, and the reason is because of the lack of an implementation. I > don't want to miss the boat. Although it wouldn't be good to rush things and end up committed to something that wasn't the best. Py3k is supposed to be removing warts, not introducing new ones. -- Greg From greg.ewing at canterbury.ac.nz Sun Aug 12 03:54:44 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 12 Aug 2007 13:54:44 +1200 Subject: [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: <874pj67dw0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <46BD18FB.5030901@canterbury.ac.nz> <874pj67dw0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <46BE6864.9020103@canterbury.ac.nz> Stephen J. Turnbull wrote: > But if there's more than one line ending used, that's not good > enough. If there's more than one, then you're in trouble anyway. In the usual case where there is only one, it provides a way of finding out what it is. -- Greg From kbk at shore.net Sun Aug 12 04:01:09 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Sat, 11 Aug 2007 22:01:09 -0400 Subject: [Python-3000] idle3.0 - is is supposed to work? In-Reply-To: <46BC0BE6.90908@v.loewis.de> (Martin v. =?iso-8859-1?Q?L=F6wi?= =?iso-8859-1?Q?s's?= message of "Fri, 10 Aug 2007 08:55:34 +0200") References: <87tzr8ei80.fsf@hydra.hampton.thirdcreek.com> <46BB96DF.5060305@v.loewis.de> <87ps1we3ak.fsf@hydra.hampton.thirdcreek.com> <46BB9CD7.2030301@v.loewis.de> <87lkckdyk6.fsf@hydra.hampton.thirdcreek.com> <46BC0BE6.90908@v.loewis.de> Message-ID: <87absxldei.fsf@hydra.hampton.thirdcreek.com> "Martin v. L?wis" writes: >>>> OTOH, IDLE ran w/o this error in p3yk... >>> Yes. Somebody would have to study what precisely the problem is: is it >>> that there is a None key in that dictionary, and that you must not use >>> None as a tag name? In that case: where does the None come from? >>> Or else: is it that you can use None as a tagname in 2.x, but can't >>> anymore in 3.0? If so: why not? >> >> OK, I'll start looking at it. > > So did I, somewhat. It looks like a genuine bug in IDLE to me: you > can't use None as a tag name, AFAIU. I'm not quite sure why this > doesn't cause an exception in 2.x; if I try to give a None tag > separately (i.e. in a stand-alone program) in 2.5, > it gives me the same exception. I've commented out the None tag. It appears to be inoperative in any case. That plus initializing 'iomark' correctly got me to the point where IDLE is producing encoding errors. See following message. -- KBK From kbk at shore.net Sun Aug 12 04:13:38 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Sat, 11 Aug 2007 22:13:38 -0400 Subject: [Python-3000] IDLE encoding setup Message-ID: <87643llctp.fsf@hydra.hampton.thirdcreek.com> I've checked in a version of PyShell.py which directs exceptions to the terminal instead of to IDLE's shell since the latter isn't working right now. There also is apparently an encoding issue with the subprocess setup which I'm ignoring for now by starting IDLE w/o the subprocess: cd Lib/idlelib ../../python ./idle.py -n Traceback (most recent call last): File "./idle.py", line 21, in idlelib.PyShell.main() File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 1389, in main shell = flist.open_shell() File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 274, in open_shell if not self.pyshell.begin(): File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 976, in begin self.firewallmessage, idlever.IDLE_VERSION, nosub)) File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 1214, in write OutputWindow.write(self, s, tags, "iomark") File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/OutputWindow.py", line 42, in write s = str(s, IOBinding.encoding) TypeError: decoding Unicode is not supported Hopefully MvL has a few minutes to revisit the IOBinding.py code which is setting IDLE's encoding. I'm not sure how it should be configured. -- KBK From greg.ewing at canterbury.ac.nz Sun Aug 12 04:16:38 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 12 Aug 2007 14:16:38 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BD85E6.1030005@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BD85E6.1030005@ronadam.com> Message-ID: <46BE6D86.6040205@canterbury.ac.nz> Ron Adam wrote: > > The only question is weather the 'r' > specifier also allows for other options like width and alignment. I'd say it should have exactly the same options as 's'. -- Greg From collinw at gmail.com Sun Aug 12 04:27:51 2007 From: collinw at gmail.com (Collin Winter) Date: Sat, 11 Aug 2007 21:27:51 -0500 Subject: [Python-3000] Untested py3k regressions Message-ID: <43aa6ff70708111927q5a1d924cx14f73517c0143ff4@mail.gmail.com> Hi all, I've started a wiki page to catalog known regressions in the py3k branch that aren't covered by the test suite: http://wiki.python.org/moin/Py3kRegressions. First up: dir() doesn't work on traceback objects (it now produces an empty list). A patch for this is up at http://python.org/sf/1772489. Collin Winter From eric+python-dev at trueblade.com Sun Aug 12 04:34:14 2007 From: eric+python-dev at trueblade.com (Eric V. Smith) Date: Sat, 11 Aug 2007 22:34:14 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BE6D86.6040205@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BD85E6.1030005@ronadam.com> <46BE6D86.6040205@canterbury.ac.nz> Message-ID: <46BE71A6.9020609@trueblade.com> Greg Ewing wrote: > Ron Adam wrote: >> The only question is weather the 'r' >> specifier also allows for other options like width and alignment. > > I'd say it should have exactly the same options as 's'. Agreed. From nnorwitz at gmail.com Sun Aug 12 04:40:40 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sat, 11 Aug 2007 19:40:40 -0700 Subject: [Python-3000] Untested py3k regressions In-Reply-To: <43aa6ff70708111927q5a1d924cx14f73517c0143ff4@mail.gmail.com> References: <43aa6ff70708111927q5a1d924cx14f73517c0143ff4@mail.gmail.com> Message-ID: On 8/11/07, Collin Winter wrote: > Hi all, > > I've started a wiki page to catalog known regressions in the py3k > branch that aren't covered by the test suite: > http://wiki.python.org/moin/Py3kRegressions. > > First up: dir() doesn't work on traceback objects (it now produces an > empty list). A patch for this is up at http://python.org/sf/1772489. I've moved the other documented regression (using PYTHONDUMPREFS env't var) from the Py3kStrUniTests page to the new page. I expect there are a bunch of options that have problems, since those don't get great testing. I've also noticed that since io is now in Python, if you catch the control-C just right, you can get strange error messages where the code assumed an error meant something specific. In my case, I typed control-C while doing an execfile (before I removed it) and got two different errors: SyntaxError and some error related to BOM IIRC. n From greg.ewing at canterbury.ac.nz Sun Aug 12 04:42:26 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 12 Aug 2007 14:42:26 +1200 Subject: [Python-3000] Four new failing tests In-Reply-To: <46BE1082.40300@v.loewis.de> References: <46BE1082.40300@v.loewis.de> Message-ID: <46BE7392.7090807@canterbury.ac.nz> Martin v. L?wis wrote: > So I see the following options: > 1. remove it entirely. Keep string.ascii_letters instead I'd vote for this one. The only major use case for string.letters I can see is testing whether something is a letter using 'c in letters'. This obviously doesn't scale when there can be thousands of letters, and a function for testing letterness covers that use case just as well. The only other thing you might want to do is iterate over all the possible letters, and that doesn't scale either. -- Greg From nnorwitz at gmail.com Sun Aug 12 04:49:12 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sat, 11 Aug 2007 19:49:12 -0700 Subject: [Python-3000] IDLE encoding setup In-Reply-To: <87643llctp.fsf@hydra.hampton.thirdcreek.com> References: <87643llctp.fsf@hydra.hampton.thirdcreek.com> Message-ID: On 8/11/07, Kurt B. Kaiser wrote: > I've checked in a version of PyShell.py which directs exceptions to the > terminal instead of to IDLE's shell since the latter isn't working right now. > > There also is apparently an encoding issue with the subprocess setup > which I'm ignoring for now by starting IDLE w/o the subprocess: > > cd Lib/idlelib > ../../python ./idle.py -n > > Traceback (most recent call last): > File "./idle.py", line 21, in > idlelib.PyShell.main() > File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 1389, in main > shell = flist.open_shell() > File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 274, in open_shell > if not self.pyshell.begin(): > File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 976, in begin > self.firewallmessage, idlever.IDLE_VERSION, nosub)) > File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/PyShell.py", line 1214, in write > OutputWindow.write(self, s, tags, "iomark") > File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/idlelib/OutputWindow.py", line 42, in write > s = str(s, IOBinding.encoding) > TypeError: decoding Unicode is not supported I can't reproduce this problem in idle. Here's how the error seems to be caused: >>> str('abc', 'utf-8') Traceback (most recent call last): File "", line 1, in TypeError: decoding Unicode is not supported Also: >>> str(str('abc', 'utf-8')) Traceback (most recent call last): File "", line 1, in TypeError: decoding Unicode is not supported This hack might work to get you farther: s = str(s.encode('utf-8'), IOBinding.encoding) (ie, add the encode() part) I don't know what should be done to really fix it though. n From oliphant.travis at ieee.org Sun Aug 12 06:24:14 2007 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Sat, 11 Aug 2007 22:24:14 -0600 Subject: [Python-3000] Need help compiling py3k-buffer branch Message-ID: Hi everyone, I apologize for my quietness on this list (I'm actually in the middle of a move), but I recently implemented most of PEP 3118 in the py3k-buffer branch (it's implemented but not tested...) However, I'm running into trouble getting it to link. The compilation step proceeds fine but then I get a segmentation fault during the link stage. It might be my platform (I've been having trouble with my 7-year old computer, as of late) or my installation of gcc. I wondered if somebody would be willing to check out the py3k-buffer branch and try to compile it to see if there is some other problem that I'm not able to detect. Thanks so much for any help. -Travis Oliphant From nnorwitz at gmail.com Sun Aug 12 06:51:49 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sat, 11 Aug 2007 21:51:49 -0700 Subject: [Python-3000] Need help compiling py3k-buffer branch In-Reply-To: References: Message-ID: On 8/11/07, Travis E. Oliphant wrote: > > However, I'm running into trouble getting it to link. The compilation > step proceeds fine but then I get a segmentation fault during the link > stage. The problem is that python is crashing when trying to run setup.py. I fixed the immediate problem which was that the type wasn't initialized properly. It usually starts up now. I also fixed a 64-bit problem with a mismatch between an int and Py_ssize_t. > It might be my platform (I've been having trouble with my 7-year old > computer, as of late) or my installation of gcc. I was seeing crashes from dereferencing null pointers sometimes on startup, sometimes on shutdown. Good luck! n From oliphant.travis at ieee.org Sun Aug 12 07:32:50 2007 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Sat, 11 Aug 2007 23:32:50 -0600 Subject: [Python-3000] Need help compiling py3k-buffer branch In-Reply-To: References: Message-ID: Neal Norwitz wrote: > On 8/11/07, Travis E. Oliphant wrote: >> However, I'm running into trouble getting it to link. The compilation >> step proceeds fine but then I get a segmentation fault during the link >> stage. > > The problem is that python is crashing when trying to run setup.py. I > fixed the immediate problem which was that the type wasn't initialized > properly. It usually starts up now. > > I also fixed a 64-bit problem with a mismatch between an int and Py_ssize_t. > >> It might be my platform (I've been having trouble with my 7-year old >> computer, as of late) or my installation of gcc. > > I was seeing crashes from dereferencing null pointers sometimes on > startup, sometimes on shutdown. > Thanks for the quick fix. That will definitely help me make more progress. Thanks, -Travis From martin at v.loewis.de Sun Aug 12 09:08:04 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 12 Aug 2007 09:08:04 +0200 Subject: [Python-3000] IDLE encoding setup In-Reply-To: <87643llctp.fsf@hydra.hampton.thirdcreek.com> References: <87643llctp.fsf@hydra.hampton.thirdcreek.com> Message-ID: <46BEB1D4.5000307@v.loewis.de> > s = str(s, IOBinding.encoding) > TypeError: decoding Unicode is not supported > > Hopefully MvL has a few minutes to revisit the IOBinding.py code which is > setting IDLE's encoding. I'm not sure how it should be configured. This code was now bogus. In 2.x, the line read s = unicode(s, IOBinding.encoding) Then unicode got systematically replaced by str, but so did the type of s, and this entire block of code was now obsolete; I removed it in 56951. I now get an IDLE window which crashes as soon as I type something. Regards, Martin From oliphant.travis at ieee.org Sat Aug 11 23:47:05 2007 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Sat, 11 Aug 2007 15:47:05 -0600 Subject: [Python-3000] Help with compiling the py3k-buffer branch Message-ID: <46BE2E59.2090008@ieee.org> Hi everyone, I apologize for my quietness on this list (I'm actually in the middle of a move), but I recently implemented most of PEP 3118 in the py3k-buffer branch (it's implemented but not tested...) However, I'm running into trouble getting it to link. The compilation step proceeds fine but then I get a segmentation fault during the link stage. It might be my platform (I've been having trouble with my 7-year old computer, as of late) or my installation of gcc. I wondered if somebody would be willing to check out the py3k-buffer branch and try to compile it to see if there is some other problem that I'm not able to detect. Thanks so much for any help. -Travis Oliphant From barry at python.org Sun Aug 12 16:50:05 2007 From: barry at python.org (Barry Warsaw) Date: Sun, 12 Aug 2007 09:50:05 -0500 Subject: [Python-3000] [Email-SIG] fix email module for python 3000 (bytes/str) In-Reply-To: <200708110149.10939.victor.stinner@haypocalc.com> References: <200708090241.08369.victor.stinner@haypocalc.com> <200708110149.10939.victor.stinner@haypocalc.com> Message-ID: <8B640CF2-EB88-45A5-A85F-1267AF24749E@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 10, 2007, at 6:49 PM, Victor Stinner wrote: > It's really hard to convert email module to Python 3000 because it > does mix > byte strings and (unicode) character strings... Indeed, but I'm making progress. Just a very quick follow up now, with hopefully more detail soon. I'm cross posting this one on purpose because of a couple of more general py3k issues involved. In r56957 I committed changes to sndhdr.py and imghdr.py so that they compare what they read out of the files against proper byte literals. AFAICT, neither module has a unittest, and if you run them from the command line, you'll see that they're completely broken (without my fix). The email package uses these to guess content type subparts for the MIMEAudio and MIMEImage subclasses. I didn't add unittests, just some judicious 'b' prefixes, and a quick command line test seems to make the situation better. This also makes a bunch of email unittests pass. Another general Python thing that bit me was when an exception gets raised with a non-ascii message, e.g. >>> raise RuntimeError('oops') Traceback (most recent call last): File "", line 1, in RuntimeError: oops >>> raise RuntimeError('oo\xfcps') Traceback (most recent call last): File "", line 1, in >>> Um, what? (I'm using a XEmacs shell buffer on OS X, but you get something similar in an iTerm and Terminal window.). In the email unittests, I was getting one unexpected exception that had a non- ascii character in it, but this crashed the unittest harness because when it tried to print the exception message out, you'd instead get an exception in io.py and the test run would exit. Okay, that all makes sense, but IWBNI py3k could do better . Fixing other simple issues (not checked in yet), I'm down to 20 failures, 13 errors out of 247 tests. I'm running test_email_renamed.py only because test_email.py will go away (we should remove the old module names and bump the email pkg version number too). As for the other questions Victor raises, we definitely need to answer them, but that should be for another reply. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRr8eHXEjvBPtnXfVAQIrJgQAoWGaoN82/KFLggu0IIM0BSghIQppiFVv 9weB+Kq6oAcgN95XKGSCZmPwA8jHkeUAWRpm8gZn7k44N2fJuZw11Klajy0tzUPW Y4b5y8jPVU85phOKinynmHb9suXroyb35ZgMSp+WipL4L5PkOMv/x9q59Rs6ldjZ cQu3Sssai9I= =QG9j -----END PGP SIGNATURE----- From eric+python-dev at trueblade.com Sun Aug 12 17:10:16 2007 From: eric+python-dev at trueblade.com (Eric V. Smith) Date: Sun, 12 Aug 2007 11:10:16 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BD5BD8.7030706@acm.org> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> Message-ID: <46BF22D8.2090309@trueblade.com> Talin wrote: > One final thing I wanted to mention, which Guido reminded me, is that > we're getting short on time. This PEP has not yet been officially > accepted, and the reason is because of the lack of an implementation. I > don't want to miss the boat. (The boat in this case being Alpha 1.) I have hooked up the existing PEP 3101 sandbox implementation into the py3k branch as unicode.format(). It implements the earlier PEP syntax for specifiers. I'm going to work on removing some cruft, adding tests, and then slowly change it over to use the new proposed specifier syntax. Eric. From p.f.moore at gmail.com Sun Aug 12 18:58:44 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Aug 2007 17:58:44 +0100 Subject: [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References: Message-ID: <79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com> On 11/08/07, Guido van Rossum wrote: > On 8/11/07, Tony Lownds wrote: > > Is this ok: when newline='\r\n' or newline='\r' is passed, only that > > string is used to determine > > the end of lines. No translation to '\n' is done. > > I *think* it would be more useful if it always returned lines ending > in \n (not \r\n or \r). Wouldn't it? Although this is not how it > currently behaves; when you set newline='\r\n', it returns the \r\n > unchanged, so it would make sense to do this too when newline='\r'. > Caveat user I guess. Neither this wording, nor the PEP are clear to me, but I'm assuming/hoping that there will be a way to spell the current behaviour for universal newlines on input[1], namely that files can have *either* bare \n, *or* the combination \r\n, to delimit lines. Whichever is used (I have no need for mixed-style files) gets translated to \n so that the program sees the same data regardless. [1] ... at least the bit I care about :-) This behaviour is immensely useful for uniform treatment of Windows text files, which are an inconsistent mess of \n-only and \r\n conventions. Specifically, I'm looking to replicate this behaviour: >xxd crlf 0000000: 610d 0a62 0d0a a..b.. >xxd lf 0000000: 610a 620a a.b. >python Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> open('crlf').read() 'a\nb\n' >>> open('lf').read() 'a\nb\n' >>> As demonstrated, this is the default in Python 2.5. I'd hope it was so in 3.0 as well. Sorry I can't test this for myself - I don't have the time/toolset to build my own Py3k on Windows... Paul. From talin at acm.org Sun Aug 12 19:00:55 2007 From: talin at acm.org (Talin) Date: Sun, 12 Aug 2007 10:00:55 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BF22D8.2090309@trueblade.com> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> Message-ID: <46BF3CC7.6010405@acm.org> Eric V. Smith wrote: > Talin wrote: >> One final thing I wanted to mention, which Guido reminded me, is that >> we're getting short on time. This PEP has not yet been officially >> accepted, and the reason is because of the lack of an implementation. >> I don't want to miss the boat. (The boat in this case being Alpha 1.) > > I have hooked up the existing PEP 3101 sandbox implementation into the > py3k branch as unicode.format(). It implements the earlier PEP syntax > for specifiers. Woo hoo! Thanks Eric. This is great news. At some point I'd like to build that branch myself, I might send you questions later. > I'm going to work on removing some cruft, adding tests, and then slowly > change it over to use the new proposed specifier syntax. I'm not sure that I'm happy with my own syntax proposal just yet. I want to sit down with Guido and talk it over before I commit to anything. I think part of the difficulty that I am having is as follows: In writing this PEP, it was never my intention to invent something brand new, but rather to find a way to cleanly merge together all of the various threads and requests and wishes that people had expressed on the subject. (Although I will admit that there are a few minor innovations of my own, but they aren't fundamental to the PEP.) That's why I originally chose a format specifier syntax which was as close to the original % formatting syntax as I could manage. The idea was that people could simply transfer over the knowledge from the old system to the new one. The old system is quite surprising in how many features it packs into just a few characters worth of specifiers. (For example, I don't know if anyone has noticed the option for putting a space in front of positive numbers, so that negative and positive numbers line up correctly when using fixed-width fields.) And some of the suggested additions, like centering using the '^' character, seemed to fit in with the old scheme quite well. However, the old syntax doesn't fit very well with the new requirements: the desire to have the 'repr' option take precedence over the type-specific formatter, and the desire to split the format specifier into two parts, one which is handled by the type-specific formatter, and one which is handled by the general formatter. So I find that I'm having to invent something brand new, and as I'm writing stuff down I'm continually asking myself the question "OK, so how is the typical Python programmer going to see this?" Normally, I don't have a problem with this, because usually when one is doing an architectural design, if you look hard enough, eventually you'll find some obviously superior configuration of design elements that is clearly the simplest and best way to do it. And so you can assume that everyone who uses your design will look at it and see that, yes indeed, this is the right design. But with these format strings, it seems (to me, anyway) that the design choices are a lot more arbitrary and driven by aesthetics. Almost any combination of specifiers will work, the question is how to arrange them in a way that is easy to memorize. And I find I'm having trouble envisioning what a typical Python programmer will or won't find intuitive; And moreover, for some reason all of the syntax proposals, including my own, seem kind of "line-noisy" to me aesthetically, for all but the simplest cases. This is made more challenging by the fact that the old syntax allowed so many options to be crammed into such a small space; I didn't want to have the new system be significantly less capable than the old, and yet I find it's rather difficult to shoehorn all of those capabilities into a new syntax without making something that is either complex, or too verbose (although I admit I have a fairly strict definition of verbosity.) Guido's been suggesting that I model the format specifiers after the .Net numeric formatting strings, but this system is significantly less capable than %-style format specifiers. Yes, you can do fancy things like "(###)###-####", but there's no provision for centering or for a custom fill character. This would be easier if I was sitting in a room with other Python programmers so that I could show them various suggestions and see what their emotional reactions are. I'm having a hard time doing this in isolation. That's kind of why I want to meet with Guido on this, as he's good at cutting through this kind of crap. -- Talin From janssen at parc.com Sun Aug 12 19:09:26 2007 From: janssen at parc.com (Bill Janssen) Date: Sun, 12 Aug 2007 10:09:26 PDT Subject: [Python-3000] [Email-SIG] fix email module for python 3000 (bytes/str) In-Reply-To: <200708110149.10939.victor.stinner@haypocalc.com> References: <200708090241.08369.victor.stinner@haypocalc.com> <200708110149.10939.victor.stinner@haypocalc.com> Message-ID: <07Aug12.100928pdt."57996"@synergy1.parc.xerox.com> > base64MIME.decode() and base64MIME.encode() should accept bytes and str > base64MIME.decode() result type is bytes > base64MIME.encode() result type should be... bytes or str, no idea > > Other decode() and encode() functions should use same rules about types. Victor, Here's my take on this: base64MIME.decode converts string to bytes base64MIME.encode converts bytes to string Pretty straightforward. Bill From janssen at parc.com Sun Aug 12 19:11:18 2007 From: janssen at parc.com (Bill Janssen) Date: Sun, 12 Aug 2007 10:11:18 PDT Subject: [Python-3000] bytes: compare bytes to integer In-Reply-To: <200708110225.28056.victor.stinner@haypocalc.com> References: <200708110225.28056.victor.stinner@haypocalc.com> Message-ID: <07Aug12.101123pdt."57996"@synergy1.parc.xerox.com> > I don't like the behaviour of Python 3000 when we compare a bytes strings > with length=1: > >>> b'xyz'[0] == b'x' > False > > The code can be see as: > >>> ord(b'x') == b'x' > False > > or also: > >>> 120 == b'x' > False > > Two solutions: > 1. b'xyz'[0] returns a new bytes object (b'x' instead of 120) > like b'xyz'[0:1] does > 2. allow to compare a bytes string of 1 byte with an integer > > I prefer (2) since (1) is wrong: bytes contains integers and not bytes! Why not just write b'xyz'[0:1] == b'x' in the first place? Let's not start adding "special" cases. Bill From martin at v.loewis.de Sun Aug 12 19:24:42 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 12 Aug 2007 19:24:42 +0200 Subject: [Python-3000] Four new failing tests In-Reply-To: References: <46BE1082.40300@v.loewis.de> Message-ID: <46BF425A.3070100@v.loewis.de> > Wasn't unicodedata.ascii_letters suggested at one point (to eliminate > the string module), or was that my imagination? Not sure - I don't recall such a proposal. > IMO, if there is a need for unicode or locale letters, we should > provide a function to generate them as needed. It can be passed > directly to set or whatever datastructure is actually needed. We > shouldn't burden the startup cost with such a large datastructure > unless absolutely necessary (nor should we use a property to load it > when first needed; expensive to compute attribute and all that). Exactly my feelings. Still, people seem to like string.letters a lot, and I'm unsure as to why that is. Regards, Martin From g.brandl at gmx.net Sun Aug 12 19:41:40 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 12 Aug 2007 19:41:40 +0200 Subject: [Python-3000] bytes: compare bytes to integer In-Reply-To: <07Aug12.101123pdt."57996"@synergy1.parc.xerox.com> References: <200708110225.28056.victor.stinner@haypocalc.com> <07Aug12.101123pdt."57996"@synergy1.parc.xerox.com> Message-ID: Bill Janssen schrieb: >> I don't like the behaviour of Python 3000 when we compare a bytes strings >> with length=1: >> >>> b'xyz'[0] == b'x' >> False >> >> The code can be see as: >> >>> ord(b'x') == b'x' >> False >> >> or also: >> >>> 120 == b'x' >> False >> >> Two solutions: >> 1. b'xyz'[0] returns a new bytes object (b'x' instead of 120) >> like b'xyz'[0:1] does >> 2. allow to compare a bytes string of 1 byte with an integer >> >> I prefer (2) since (1) is wrong: bytes contains integers and not bytes! > > Why not just write > > b'xyz'[0:1] == b'x' > > in the first place? Let's not start adding "special" cases. Hm... I have a feeling that this will be one of the first entries in a hypothetical "Python 3.0 Gotchas" list. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From foom at fuhm.net Sun Aug 12 20:28:10 2007 From: foom at fuhm.net (James Y Knight) Date: Sun, 12 Aug 2007 14:28:10 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BE67CB.9010101@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BE67CB.9010101@canterbury.ac.nz> Message-ID: <44C1B6D6-A42B-4BE0-AF29-4403A8C01784@fuhm.net> I've been skimming a lot of the discussion about how to special case various bits and pieces of formatting, but it really seems to me as if this is really just asking for the use of a generic function. I'm not sure how exactly one spells the requirement that the first argument be equal to a certain object (e.g. 'r') rather than a subtype of it, so I'll just gloss over that issue. But anyways, it seems like it might look something like this: # Default behaviors @overload def __format__(format_char:'r', obj, format_spec): return __format__('s', repr(obj), format_spec) @overload def __format__(format_char:'s', obj, format_spec): return __format__('s', str(obj), format_spec) @overload def __format__(format_char:'f', obj, format_spec): return __format__('s', float(obj), format_spec) @overload def __format__(format_char:'d', obj, format_spec): return __format__('s', int(obj), format_spec) # Type specific behaviors @overload def __format__(format_char:'s', obj:str, format_spec): ...string formatting... @overload def __format__(format_char:'f', obj:float, format_spec): ...float formatting... @overload def __format__(format_char:'d', obj:int, format_spec): ...integer formatting... James From p.f.moore at gmail.com Sun Aug 12 21:12:29 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Aug 2007 20:12:29 +0100 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: References: <79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com> Message-ID: <79990c6b0708121212m2490d6f0tb151c3c1d5aa1ea3@mail.gmail.com> On 12/08/07, Georg Brandl wrote: > Note that Python does nothing special in the above case. For non-Windows > platforms, you'd get two different results -- the conversion from \r\n to > \n is done by the Windows C runtime since the default open() mode is text mode. > > Only with mode 'U' does Python use its own universal newline mode. Pah. You're right - I almost used 'U' and then "discovered" that I didn't need it (and got bitten by a portability bug as a result :-() > With Python 3.0, the C library is not used and Python uses universal newline > mode by default. That's what I expected, but I was surprised to find that the PEP is pretty unclear on this. The phrase "universal newlines" is mentioned only once, and never defined. Knowing the meaning, I can see how the PEP is intended to say that universal newlines on input is the default (and you set the newline argument to specify a *specific*, non-universal, newline value) - but I missed it on first reading. Thanks for the clarification. Paul. From p.f.moore at gmail.com Sun Aug 12 21:19:23 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Aug 2007 20:19:23 +0100 Subject: [Python-3000] Fix imghdr module for bytes In-Reply-To: References: <200708110235.43664.victor.stinner@haypocalc.com> Message-ID: <79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com> On 11/08/07, Adam Olsen wrote: > Try h[0:1] == b'P'. Slicing will ensure it stays as a bytes object, > rather than just giving the integer it contains. Ugh. Alternatively, h[0] == ord('P') should work. Unless you're writing source in EBCDIC (is that allowed?). All of the alternatives seem somewhat ugly. While I agree with the idea that the bytes should be kept clean & simple, we seem to be finding a few non-optimal corner cases. It would be a shame if the bytes type turned into a Python 3.0 wart from day 1... Would it be worth keeping a wiki page of the bytes type "idioms" that are needed, as people discover them? Putting them all in one place might give a better feel as to whether there is a real problem to address. Paul. From rhamph at gmail.com Sun Aug 12 21:39:23 2007 From: rhamph at gmail.com (Adam Olsen) Date: Sun, 12 Aug 2007 13:39:23 -0600 Subject: [Python-3000] Four new failing tests In-Reply-To: <46BF425A.3070100@v.loewis.de> References: <46BE1082.40300@v.loewis.de> <46BF425A.3070100@v.loewis.de> Message-ID: On 8/12/07, "Martin v. L?wis" wrote: > > Wasn't unicodedata.ascii_letters suggested at one point (to eliminate > > the string module), or was that my imagination? > > Not sure - I don't recall such a proposal. > > > IMO, if there is a need for unicode or locale letters, we should > > provide a function to generate them as needed. It can be passed > > directly to set or whatever datastructure is actually needed. We > > shouldn't burden the startup cost with such a large datastructure > > unless absolutely necessary (nor should we use a property to load it > > when first needed; expensive to compute attribute and all that). > > Exactly my feelings. Still, people seem to like string.letters a lot, > and I'm unsure as to why that is. I think because it feels like the most direct, least obscured approach. Calling ord() feels like a hack, re is overkill and maligned for many reasons, and c.isalpha() would behave differently if passed unicode instead of str. Perhaps we should have a .isasciialpha() and document that as the preferred alternative. Looking over google codesearch results, I don't find myself enamored with the existing string.letters usages. Most can be easily converted to .isalpha/isalnum/isasciialpha/etc. What can't easily be converted could be done using something else, and I don't think warrant use of string.letters given its regular misusage. What's really frightening is the tendency to use string.letters to build regular expressions. -- Adam Olsen, aka Rhamphoryncus From eric+python-dev at trueblade.com Sun Aug 12 21:49:51 2007 From: eric+python-dev at trueblade.com (Eric V. Smith) Date: Sun, 12 Aug 2007 15:49:51 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BF3CC7.6010405@acm.org> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> Message-ID: <46BF645F.3000808@trueblade.com> Talin wrote: > Eric V. Smith wrote: >> I have hooked up the existing PEP 3101 sandbox implementation into the >> py3k branch as unicode.format(). It implements the earlier PEP syntax >> for specifiers. > > Woo hoo! Thanks Eric. This is great news. > > At some point I'd like to build that branch myself, I might send you > questions later. I'm currently just developing in the py3k branch. I know this is sub-optimal, since I can't check in there until the PEP is accepted. The original PEP 3101 sample implementation was in sandbox/pep3101, and was never a branch, because it was built as an external module. Maybe I should create a py3k-pep3101 (or py3k-format) branch, so I can check my stuff in? > I'm not sure that I'm happy with my own syntax proposal just yet. I want > to sit down with Guido and talk it over before I commit to anything. I'll have some comments in the thread with your proposal. > This would be easier if I was sitting in a room with other Python > programmers so that I could show them various suggestions and see what > their emotional reactions are. I'm having a hard time doing this in > isolation. That's kind of why I want to meet with Guido on this, as he's > good at cutting through this kind of crap. Good luck! I'm open to private email, or chatting on the phone, if you want someone to bounce ideas off of. Eric. From rhamph at gmail.com Sun Aug 12 21:53:27 2007 From: rhamph at gmail.com (Adam Olsen) Date: Sun, 12 Aug 2007 13:53:27 -0600 Subject: [Python-3000] Fix imghdr module for bytes In-Reply-To: <79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com> References: <200708110235.43664.victor.stinner@haypocalc.com> <79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com> Message-ID: On 8/12/07, Paul Moore wrote: > On 11/08/07, Adam Olsen wrote: > > Try h[0:1] == b'P'. Slicing will ensure it stays as a bytes object, > > rather than just giving the integer it contains. > > Ugh. Alternatively, h[0] == ord('P') should work. > > Unless you're writing source in EBCDIC (is that allowed?). I doubt it, but if it was it should be translated to unicode upon loading and have no effect on the semantics. > All of the alternatives seem somewhat ugly. While I agree with the > idea that the bytes should be kept clean & simple, we seem to be > finding a few non-optimal corner cases. It would be a shame if the > bytes type turned into a Python 3.0 wart from day 1... I don't think this behaviour change is a problem. It's just a bit surprising and something that has to be learned when you switch to 3.0. It matches list behaviour and in the end will reduce the concepts needed to use the language. > Would it be worth keeping a wiki page of the bytes type "idioms" that > are needed, as people discover them? Putting them all in one place > might give a better feel as to whether there is a real problem to > address. -- Adam Olsen, aka Rhamphoryncus From martin at v.loewis.de Sun Aug 12 23:54:36 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 12 Aug 2007 23:54:36 +0200 Subject: [Python-3000] Four new failing tests In-Reply-To: References: <46BE1082.40300@v.loewis.de> <46BF425A.3070100@v.loewis.de> Message-ID: <46BF819C.8040107@v.loewis.de> >> Exactly my feelings. Still, people seem to like string.letters a lot, >> and I'm unsure as to why that is. > > I think because it feels like the most direct, least obscured > approach. Calling ord() feels like a hack, re is overkill and > maligned for many reasons, and c.isalpha() would behave differently if > passed unicode instead of str. I think the first ones might apply, but the last one surely doesn't. When people use string.letters, they don't consider issues such as character set. If they would, they knew that string.letters may vary with locale. > What's really frightening > is the tendency to use string.letters to build regular expressions. Indeed. However, if string.letters is removed, I trust that people start listing all characters explicitly in the regex, and curse python-dev for removing such a useful facility. Regards, Martin From kbk at shore.net Mon Aug 13 00:31:44 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Sun, 12 Aug 2007 18:31:44 -0400 Subject: [Python-3000] IDLE encoding setup In-Reply-To: <46BEB1D4.5000307@v.loewis.de> (Martin v. =?iso-8859-1?Q?L=F6?= =?iso-8859-1?Q?wis's?= message of "Sun, 12 Aug 2007 09:08:04 +0200") References: <87643llctp.fsf@hydra.hampton.thirdcreek.com> <46BEB1D4.5000307@v.loewis.de> Message-ID: <87sl6ojsfj.fsf@hydra.hampton.thirdcreek.com> "Martin v. L?wis" writes: >> Hopefully MvL has a few minutes to revisit the IOBinding.py code which is >> setting IDLE's encoding. I'm not sure how it should be configured. > > This code was now bogus. In 2.x, the line read > > s = unicode(s, IOBinding.encoding) > > Then unicode got systematically replaced by str, but so did the type of > s, and this entire block of code was now obsolete; I removed it in > 56951. OK, thanks. Is the code which sets IOBinding.encoding still correct? That value is used in several places in IDLE, including setting the encoding for std{in,err,out}. Same question for IOBinding.py:IOBinding.{encode(),decode()} ! > > I now get an IDLE window which crashes as soon as I type something. Yes, something like File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/lib-tk/Tkinter.py", line 1022, in mainloop self.tk.mainloop(n) TypeError: expected string, bytes found I can duplicate this using just WidgetRedirector.main() (no IDLE), but I haven't figured out the problem as yet. That's a very interesting module ::-P -- KBK From martin at v.loewis.de Mon Aug 13 00:51:47 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 13 Aug 2007 00:51:47 +0200 Subject: [Python-3000] IDLE encoding setup In-Reply-To: <87sl6ojsfj.fsf@hydra.hampton.thirdcreek.com> References: <87643llctp.fsf@hydra.hampton.thirdcreek.com> <46BEB1D4.5000307@v.loewis.de> <87sl6ojsfj.fsf@hydra.hampton.thirdcreek.com> Message-ID: <46BF8F03.9030705@v.loewis.de> > Is the code which sets IOBinding.encoding still correct? That value is > used in several places in IDLE, including setting the encoding for > std{in,err,out}. I think so, yes. The conditions in which it needs to be used will have to change, though: Python 3 defaults to UTF-8 as the source encoding, so there is no need to use a computed encoding when there is no declared one, anymore. What encoding IDLE should use for sys.stdout is as debatable as it always was (i.e. should it use a fixed on, independent of installation, or a variable one, depending on the user's locale) > Same question for IOBinding.py:IOBinding.{encode(),decode()} ! This is still mostly correct, except that it should encode as UTF-8 in the absence of any declared encoding (see above). I'll fix that when I find some time. Regards, Martin From greg.ewing at canterbury.ac.nz Mon Aug 13 01:46:20 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 13 Aug 2007 11:46:20 +1200 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46BD79EC.1020301@acm.org> References: <46BD79EC.1020301@acm.org> Message-ID: <46BF9BCC.7010505@canterbury.ac.nz> Talin wrote: > :s10 # String right-aligned within field of minimum width > # of 10 chars. I'm wondering whether the default alignment for strings should be left instead of right. The C way is all very consistent and all, but it's not a very practical default. How often do you want a right-aligned string? -- Greg From greg.ewing at canterbury.ac.nz Mon Aug 13 01:53:06 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 13 Aug 2007 11:53:06 +1200 Subject: [Python-3000] bytes: compare bytes to integer In-Reply-To: References: <200708110225.28056.victor.stinner@haypocalc.com> <07Aug12.101123pdt.57996@synergy1.parc.xerox.com> Message-ID: <46BF9D62.4080707@canterbury.ac.nz> Georg Brandl wrote: > Hm... I have a feeling that this will be one of the first entries in a > hypothetical "Python 3.0 Gotchas" list. And probably it's exacerbated by calling them byte "strings", when they're really a kind of array rather than a kind of string, and the use of b"..." as a constructor syntax. -- Greg From greg.ewing at canterbury.ac.nz Mon Aug 13 01:58:30 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 13 Aug 2007 11:58:30 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <44C1B6D6-A42B-4BE0-AF29-4403A8C01784@fuhm.net> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BE67CB.9010101@canterbury.ac.nz> <44C1B6D6-A42B-4BE0-AF29-4403A8C01784@fuhm.net> Message-ID: <46BF9EA6.10706@canterbury.ac.nz> James Y Knight wrote: > I've been skimming a lot of the discussion about how to special case > various bits and pieces of formatting, but it really seems to me as > if this is really just asking for the use of a generic function. I was afraid someone would suggest that. I think it would be a bad idea to use something like that in such a fundamental part of the core until GFs are much better tried and tested. -- Greg From greg.ewing at canterbury.ac.nz Mon Aug 13 02:03:41 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 13 Aug 2007 12:03:41 +1200 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: <79990c6b0708121212m2490d6f0tb151c3c1d5aa1ea3@mail.gmail.com> References: <79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com> <79990c6b0708121212m2490d6f0tb151c3c1d5aa1ea3@mail.gmail.com> Message-ID: <46BF9FDD.7090402@canterbury.ac.nz> Paul Moore wrote: > and you set the newline argument to specify a *specific*, > non-universal, newline value It still seems wrong to not translate the newlines, though, since it's still a *text* mode, and the standard Python representation of text has \n line endings. -- Greg From greg.ewing at canterbury.ac.nz Mon Aug 13 02:08:28 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 13 Aug 2007 12:08:28 +1200 Subject: [Python-3000] Fix imghdr module for bytes In-Reply-To: <79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com> References: <200708110235.43664.victor.stinner@haypocalc.com> <79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com> Message-ID: <46BFA0FC.2060707@canterbury.ac.nz> Paul Moore wrote: > Ugh. Alternatively, h[0] == ord('P') should work. I'm wondering whether we want a "byte character literal" to go along with "byte string literals": h[0] == c"P" After all, if it makes sense to write an array of bytes as though they were ASCII characters, it must make sense to write a single byte that way as well. -- Greg From greg.ewing at canterbury.ac.nz Mon Aug 13 02:12:23 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 13 Aug 2007 12:12:23 +1200 Subject: [Python-3000] Four new failing tests In-Reply-To: <46BF819C.8040107@v.loewis.de> References: <46BE1082.40300@v.loewis.de> <46BF425A.3070100@v.loewis.de> <46BF819C.8040107@v.loewis.de> Message-ID: <46BFA1E7.6070300@canterbury.ac.nz> Martin v. L?wis wrote: > However, if string.letters is removed, I trust that people > start listing all characters explicitly in the regex, and curse > python-dev for removing such a useful facility. On the other hand, if it's kept, but turns into something tens of kilobytes long, what effect will *that* have on people's regular expressions? -- Greg From victor.stinner at haypocalc.com Mon Aug 13 02:19:56 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 13 Aug 2007 02:19:56 +0200 Subject: [Python-3000] bytes: compare bytes to integer In-Reply-To: <07Aug12.101123pdt."57996"@synergy1.parc.xerox.com> References: <200708110225.28056.victor.stinner@haypocalc.com> <07Aug12.101123pdt."57996"@synergy1.parc.xerox.com> Message-ID: <200708130219.57043.victor.stinner@haypocalc.com> On Sunday 12 August 2007 19:11:18 Bill Janssen wrote: > Why not just write > > b'xyz'[0:1] == b'x' It's just strange to write: 'abc'[0] == 'a' for character string and: b'abc'[0:1] == b'a' for byte string. The problem in my brain is that str is a special case since a str item is also a string, where a bytes item is an integer. It's clear that "[5, 9, 10][0] == [5]" is wrong, but for bytes and str it's not intuitive because of b'...' syntax. If I had to wrote [120, 121, 122] instead of b'xyz' it would be easier to understand that first value is an integer and not the *letter* X or the *string* X. I dislike b'xyz'[0:1] == b'x' since I want to check first item and not to compare substrings. Victor Stinner aka haypo http://hachoir.org/ From eric+python-dev at trueblade.com Mon Aug 13 02:22:27 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Sun, 12 Aug 2007 20:22:27 -0400 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46BD79EC.1020301@acm.org> References: <46BD79EC.1020301@acm.org> Message-ID: <46BFA443.2000005@trueblade.com> Talin wrote: > Taking some ideas from the various threads, here's what I'd like to propose: > > (Assume that brackets [] means 'optional field') > > [:[type][align][sign][[0]minwidth][.precision]][/fill][!r] > > Examples: > > :f # Floating point number of natural width > :f10 # Floating point number, width at least 10 > :f010 # Floating point number, width at least 10, leading zeros > :f.2 # Floating point number with two decimal digits > :8 # Minimum width 8, type defaults to natural type > :d+2 # Integer number, 2 digits, sign always shown > !r # repr() format > :10!r # Field width 10, repr() format > :s10 # String right-aligned within field of minimum width > # of 10 chars. > :s10.10 # String right-aligned within field of minimum width > # of 10 chars, maximum width 10. > :s<10 # String left-aligned in 10 char (min) field. > :d^15 # Integer centered in 15 character field > :>15/. # Right align and pad with '.' chars > :f<+015.5 # Floating point, left aligned, always show sign, > # leading zeros, field width 15 (min), 5 decimal places. For those cases where we're going to special case either conversions or repr, it would be convenient if the character were always first. And since repr and string formatting are so similar, it would be convenient if they where the same, except for the "r" part. But the "!" (or something similar) is needed, otherwise no format string could ever begin with an "r". So, how about "!r" be leftmost for repr formatting. The similarities would be: "!r" # default repr formatting ":s" # default string formatting "!r10" # repr right aligned, minimum 10 chars width ":s10" # convert to string, right aligned, minimum 10 chars width Admittedly the "r" is now superfluous, but I think it's clearer with the "r" present than without it. And it would allow for future expansion of such top-level functionality to bypass __format__. Eric. From victor.stinner at haypocalc.com Mon Aug 13 02:26:03 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 13 Aug 2007 02:26:03 +0200 Subject: [Python-3000] [Email-SIG] fix email module for python 3000 (bytes/str) In-Reply-To: <8B640CF2-EB88-45A5-A85F-1267AF24749E@python.org> References: <200708090241.08369.victor.stinner@haypocalc.com> <200708110149.10939.victor.stinner@haypocalc.com> <8B640CF2-EB88-45A5-A85F-1267AF24749E@python.org> Message-ID: <200708130226.03670.victor.stinner@haypocalc.com> On Sunday 12 August 2007 16:50:05 Barry Warsaw wrote: > In r56957 I committed changes to sndhdr.py and imghdr.py so that they > compare what they read out of the files against proper byte > literals. So nobody read my patches? :-( See my emails "[Python-3000] Fix imghdr module for bytes" and "[Python-3000] Fix sndhdr module for bytes" from last saturday. But well, my patches look similar. Barry's patch is incomplete: test_voc() is wrong. I attached a new patch: - fix "h[sbseek] == b'\1'" and "ratecode = ord(h[sbseek+4])" in test_voc() - avoid division by zero - use startswith method: replace h[:2] == b'BM' by h.startswith(b'BM') - use aifc.open() instead of old aifc.openfp() - use ord(b'P') instead of ord('P') Victor Stinner aka haypo http://hachoir.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: py3k-imgsnd-hdr.patch Type: text/x-diff Size: 5326 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070813/081c76b4/attachment-0001.bin From victor.stinner at haypocalc.com Mon Aug 13 02:32:29 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 13 Aug 2007 02:32:29 +0200 Subject: [Python-3000] bytes regular expression? In-Reply-To: References: <200708090427.19830.victor.stinner@haypocalc.com> <200708091740.59070.victor.stinner@haypocalc.com> Message-ID: <200708130232.30033.victor.stinner@haypocalc.com> On Thursday 09 August 2007 19:39:50 you wrote: > So why not just skip caching for anything that doesn't hash()? If > you're really worried about efficiency, simply re.compile() the > expression once and don't rely on the re module's internal cache. I tried to keep backward compatibility. Why character string are "optimized" (cached) but not byte string? Since regex parsing is slow, it's a good idea to avoid recomputation in re.compile(). Regular expression for bytes are useful for file, network, picture, etc. manipulation. Victor Stinner aka haypo http://hachoir.org/ From talin at acm.org Mon Aug 13 03:01:03 2007 From: talin at acm.org (Talin) Date: Sun, 12 Aug 2007 18:01:03 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <44C1B6D6-A42B-4BE0-AF29-4403A8C01784@fuhm.net> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BE67CB.9010101@canterbury.ac.nz> <44C1B6D6-A42B-4BE0-AF29-4403A8C01784@fuhm.net> Message-ID: <46BFAD4F.3000700@acm.org> James Y Knight wrote: > I've been skimming a lot of the discussion about how to special case > various bits and pieces of formatting, but it really seems to me as > if this is really just asking for the use of a generic function. I'm > not sure how exactly one spells the requirement that the first > argument be equal to a certain object (e.g. 'r') rather than a > subtype of it, so I'll just gloss over that issue. The plan is to eventually use generic functions once we actually have an implementation of them. That probably won't happen in Alpha 1. -- Talin From steven.bethard at gmail.com Mon Aug 13 04:22:46 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sun, 12 Aug 2007 20:22:46 -0600 Subject: [Python-3000] bytes regular expression? In-Reply-To: <200708130232.30033.victor.stinner@haypocalc.com> References: <200708090427.19830.victor.stinner@haypocalc.com> <200708091740.59070.victor.stinner@haypocalc.com> <200708130232.30033.victor.stinner@haypocalc.com> Message-ID: On 8/12/07, Victor Stinner wrote: > On Thursday 09 August 2007 19:39:50 you wrote: > > So why not just skip caching for anything that doesn't hash()? If > > you're really worried about efficiency, simply re.compile() the > > expression once and don't rely on the re module's internal cache. > > I tried to keep backward compatibility. It's not actually backwards incompatible -- the re docs don't promise anywhere to do any caching for you. I'd rather wait and see whether the caching is really necessary for bytes than keep str8 around if we don't have to. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From eric+python-dev at trueblade.com Mon Aug 13 04:37:54 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Sun, 12 Aug 2007 22:37:54 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BD2D59.1040209@trueblade.com> References: <46B13ADE.7080901@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com> <46BC83BF.3000407@trueblade.com> <46BD155B.2010202@canterbury.ac.nz> <46BD2D59.1040209@trueblade.com> Message-ID: <46BFC402.6060804@trueblade.com> Eric V. Smith wrote: > Right. Your "if" test is my is_float_specifier function. The problem > is that this needs to be shared between int and float and string, and > anything else (maybe decimal?) that can be converted to a float. Maybe > we should make is_float_specifier a classmethod of float[1], so that > int's __format__ (and also string's __format__) could say: > > if float.is_float_specifier(spec): > return float(self).__format__(spec) > > And float's __format__ function could do all of the specifier testing, > for types it knows to convert itself to, and then say: As I've begun implementing this, I think we really do need these is_XXX_specifier functions. Say I create a new int-like class, not derived from int, named MyInt. And I want to use it like an int, maybe to print it as a hex number: i = MyInt() "{0:x}".format(i) In order for me to write the __format__ function in MyInt, I have to know if the specifier is in fact an int specifier. Rather than put this specifier checking logic into every class that wants to convert itself to an int, we could centralize it in a class method int.is_int_specifier (or maybe int.is_specifier): class MyInt: def __format__(self, spec): if int.is_int_specifier(spec): return int(self).__format__(spec) return "MyInt instance with custom specifier " + spec def __int__(self): return The problem with this logic is that every class that implements __int__ would probably want to contains this same logic. Maybe we want to move this into unicode.format, and say that any class that implements __int__ automatically will participate in a conversion for a specifier that looks like an int specifier. Of course the same logic would exist for float and maybe string. Then we wouldn't need a public int.is_int_specifier. The disadvantage of this approach is that if you do implement __int__, you're restricted in what format specifiers your __format__ method will ever be called with. You're restricted from using a specifier that starts with d, x, etc. That argues for making every __format__ method implement this test itself, only if it wants to. Which means we would want to have int.is_int_specifier. Thoughts? Eric. From rhamph at gmail.com Mon Aug 13 06:05:56 2007 From: rhamph at gmail.com (Adam Olsen) Date: Sun, 12 Aug 2007 22:05:56 -0600 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BFC402.6060804@trueblade.com> References: <46B13ADE.7080901@acm.org> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com> <46BC83BF.3000407@trueblade.com> <46BD155B.2010202@canterbury.ac.nz> <46BD2D59.1040209@trueblade.com> <46BFC402.6060804@trueblade.com> Message-ID: On 8/12/07, Eric Smith wrote: > Eric V. Smith wrote: > > Right. Your "if" test is my is_float_specifier function. The problem > > is that this needs to be shared between int and float and string, and > > anything else (maybe decimal?) that can be converted to a float. Maybe > > we should make is_float_specifier a classmethod of float[1], so that > > int's __format__ (and also string's __format__) could say: > > > > if float.is_float_specifier(spec): > > return float(self).__format__(spec) > > > > And float's __format__ function could do all of the specifier testing, > > for types it knows to convert itself to, and then say: > > As I've begun implementing this, I think we really do need these > is_XXX_specifier functions. > > Say I create a new int-like class, not derived from int, named MyInt. > And I want to use it like an int, maybe to print it as a hex number: > > i = MyInt() > "{0:x}".format(i) > > In order for me to write the __format__ function in MyInt, I have to > know if the specifier is in fact an int specifier. Rather than put this > specifier checking logic into every class that wants to convert itself > to an int, we could centralize it in a class method int.is_int_specifier > (or maybe int.is_specifier): > > class MyInt: > def __format__(self, spec): > if int.is_int_specifier(spec): > return int(self).__format__(spec) > return "MyInt instance with custom specifier " + spec > def __int__(self): > return My proposal was to flip this logic: __format__ should check for its own specifiers first, and only if it doesn't match will it return NotImplemented (triggering a call to __int__, or maybe __index__). class MyInt: def __format__(self, spec): if is_custom_spec(spec): return "MyInt instance with custom specifier " + spec return NotImplemented def __int__(self): return This avoids the need for a public is_int_specifier. unicode.format would still have the logic, but since it's called after you're not restricted from starting with d, x, etc. > The problem with this logic is that every class that implements __int__ > would probably want to contains this same logic. > > Maybe we want to move this into unicode.format, and say that any class > that implements __int__ automatically will participate in a conversion > for a specifier that looks like an int specifier. Of course the same > logic would exist for float and maybe string. Then we wouldn't need a > public int.is_int_specifier. > > The disadvantage of this approach is that if you do implement __int__, > you're restricted in what format specifiers your __format__ method will > ever be called with. You're restricted from using a specifier that > starts with d, x, etc. That argues for making every __format__ method > implement this test itself, only if it wants to. Which means we would > want to have int.is_int_specifier. > > Thoughts? > > Eric. > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/rhamph%40gmail.com > -- Adam Olsen, aka Rhamphoryncus From eric+python-dev at trueblade.com Mon Aug 13 06:30:01 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Mon, 13 Aug 2007 00:30:01 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: References: <46B13ADE.7080901@acm.org> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com> <46BC83BF.3000407@trueblade.com> <46BD155B.2010202@canterbury.ac.nz> <46BD2D59.1040209@trueblade.com> <46BFC402.6060804@trueblade.com> Message-ID: <46BFDE49.7090404@trueblade.com> Adam Olsen wrote: > My proposal was to flip this logic: __format__ should check for its > own specifiers first, and only if it doesn't match will it return > NotImplemented (triggering a call to __int__, or maybe __index__). > > class MyInt: > def __format__(self, spec): > if is_custom_spec(spec): > return "MyInt instance with custom specifier " + spec > return NotImplemented > def __int__(self): > return > > This avoids the need for a public is_int_specifier. unicode.format > would still have the logic, but since it's called after you're not > restricted from starting with d, x, etc. That makes sense, since the object would have first crack at the spec, but needn't implement the conversions itself. Let me see where that gets me. Now I see what you were getting at with your earlier posts on the subject. It wasn't clear to me that the "use a fallback" would include "convert based on the spec, if possible". If accepted, this should go into the PEP, of course. It's not clear to me if __int__ or __index__ is correct, here. I think it's __int__, since float won't have __index__, and we want to be able to convert float to int (right?). Thanks! Eric. From pc at gafol.net Mon Aug 13 11:51:35 2007 From: pc at gafol.net (Paul Colomiets) Date: Mon, 13 Aug 2007 12:51:35 +0300 Subject: [Python-3000] Four new failing tests In-Reply-To: References: Message-ID: <46C029A7.8080604@gafol.net> Guido van Rossum wrote: > I see four tests fail that passed yesterday: > [...] > < test_threaded_import Patch attached. Need any comments? -------------- next part -------------- A non-text attachment was scrubbed... Name: py3k_tempfile.diff Type: text/x-patch Size: 486 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070813/1da6249e/attachment.bin From p.f.moore at gmail.com Mon Aug 13 14:50:12 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 13 Aug 2007 13:50:12 +0100 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: <46BF9FDD.7090402@canterbury.ac.nz> References: <79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com> <79990c6b0708121212m2490d6f0tb151c3c1d5aa1ea3@mail.gmail.com> <46BF9FDD.7090402@canterbury.ac.nz> Message-ID: <79990c6b0708130550y22ddb47crb406f46376c31233@mail.gmail.com> On 13/08/07, Greg Ewing wrote: > Paul Moore wrote: > > and you set the newline argument to specify a *specific*, > > non-universal, newline value > > It still seems wrong to not translate the newlines, though, > since it's still a *text* mode, and the standard Python > representation of text has \n line endings. Yes, I'd agree with that (it's not a case I particularly care about myself, but I agree with your logic). Paul. From kbk at shore.net Mon Aug 13 16:22:15 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Mon, 13 Aug 2007 10:22:15 -0400 Subject: [Python-3000] IDLE encoding setup In-Reply-To: <87sl6ojsfj.fsf@hydra.hampton.thirdcreek.com> (Kurt B. Kaiser's message of "Sun, 12 Aug 2007 18:31:44 -0400") References: <87643llctp.fsf@hydra.hampton.thirdcreek.com> <46BEB1D4.5000307@v.loewis.de> <87sl6ojsfj.fsf@hydra.hampton.thirdcreek.com> Message-ID: <874pj35xbc.fsf@hydra.hampton.thirdcreek.com> "Kurt B. Kaiser" writes: >> I now get an IDLE window which crashes as soon as I type something. > > Yes, something like > > File "/home/kbk/PYDOTORG/projects/python/branches/py3k/Lib/lib-tk/Tkinter.py", line 1022, in mainloop > self.tk.mainloop(n) > TypeError: expected string, bytes found > > I can duplicate this using just WidgetRedirector.main() (no IDLE), but I > haven't figured out the problem as yet. That's a very interesting module ::-P The changes you checked in to _tkinter.c fixed WidgetRedirector, and those and the other changes you made in idlelib have IDLE working without the subprocess. Thanks very much for working on this! -- KBK From martin at v.loewis.de Mon Aug 13 17:48:19 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 13 Aug 2007 17:48:19 +0200 Subject: [Python-3000] IDLE encoding setup In-Reply-To: <874pj35xbc.fsf@hydra.hampton.thirdcreek.com> References: <87643llctp.fsf@hydra.hampton.thirdcreek.com> <46BEB1D4.5000307@v.loewis.de> <87sl6ojsfj.fsf@hydra.hampton.thirdcreek.com> <874pj35xbc.fsf@hydra.hampton.thirdcreek.com> Message-ID: <46C07D43.7080601@v.loewis.de> > The changes you checked in to _tkinter.c fixed WidgetRedirector, and > those and the other changes you made in idlelib have IDLE working > without the subprocess. > > Thanks very much for working on this! I doubt that all is working yet, though. So some thorough testing would probably be necessary - plus getting the subprocess case to work, of course. Regards, Martin From skip at pobox.com Mon Aug 13 18:55:26 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 13 Aug 2007 11:55:26 -0500 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: <79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com> References: <79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com> Message-ID: <18112.36094.979628.85609@montanaro.dyndns.org> Paul> ... that files can have *either* bare \n, *or* the combination Paul> \r\n, to delimit lines. As someone else pointed out, \r needs to be supported as well. Many Mac applications (Excel comes to mind) still emit text files with \r as the line terminator. Skip From janssen at parc.com Mon Aug 13 19:10:23 2007 From: janssen at parc.com (Bill Janssen) Date: Mon, 13 Aug 2007 10:10:23 PDT Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BFC402.6060804@trueblade.com> References: <46B13ADE.7080901@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com> <46BC83BF.3000407@trueblade.com> <46BD155B.2010202@canterbury.ac.nz> <46BD2D59.1040209@trueblade.com> <46BFC402.6060804@trueblade.com> Message-ID: <07Aug13.101032pdt."57996"@synergy1.parc.xerox.com> > Say I create a new int-like class, not derived from int, named MyInt. > And I want to use it like an int, maybe to print it as a hex number: Then derive that class from "int", for heaven's sake! Bill From guido at python.org Mon Aug 13 19:25:41 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Aug 2007 10:25:41 -0700 Subject: [Python-3000] Python 3000 Sprint @ Google Message-ID: It's official! The second annual Python Sprint @ Google is happening again: August 22-25 (Wed-Sat). We're sprinting at two locations, this time Google headquarters in Mountain View and the Google office in Chicago (thanks to Brian Fitzpatrick). We'll connect the two sprints with full-screen videoconferencing. The event is *free* and includes Google's *free gourmet food*. Anyone with a reasonable Python experience is invited to attend. The primary goal is to work on Python 3000, to polish off the first alpha release; other ideas are welcome too. Experienced Python core developers will be available for mentoring. (The goal is not to learn Python; it is to learn *contributing* to Python.) For more information and to sign up, please see the wiki page on python.org: http://wiki.python.org/moin/GoogleSprint Sign-up via the wiki page is strongly recommended to avoid lines getting badges. Please read the whole wiki page to make sure you're prepared. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rrr at ronadam.com Mon Aug 13 19:33:03 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 13 Aug 2007 12:33:03 -0500 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46BD79EC.1020301@acm.org> References: <46BD79EC.1020301@acm.org> Message-ID: <46C095CF.2060507@ronadam.com> > :f<+015.5 # Floating point, left aligned, always show sign, > # leading zeros, field width 15 (min), 5 decimal places. Which has precedence... left alignment or zero padding? Or should this be an error? Ron From rowen at cesmail.net Mon Aug 13 19:46:08 2007 From: rowen at cesmail.net (Russell E Owen) Date: Mon, 13 Aug 2007 10:46:08 -0700 Subject: [Python-3000] Universal newlines support in Python 3.0 References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: In article <87wsw3p5em.fsf at uwakimon.sk.tsukuba.ac.jp>, "Stephen J. Turnbull" wrote: > Guido van Rossum writes: > > > However, the old universal newlines feature also set an attibute named > > 'newlines' on the file object to a tuple of up to three elements > > giving the actual line endings that were observed on the file so far > > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not > > implemented. I'm tempted to kill it. Does anyone have a use case for > > this? > > I have run into files that intentionally have more than one newline > convention used (mbox and Babyl mail folders, with messages received > from various platforms). However, most of the time multiple newline > conventions is a sign that the file is either corrupt or isn't text. > If so, then saving the file may corrupt it. The newlines attribute > could be used to check for this condition. There is at least one Mac source code editor (SubEthaEdit) that is all too happy to add one kind of newline to a file that started out with a different line ending character. As a result I have seen a fair number of text files with mixed line endings. I don't see as many these days, though; perhaps because the current version of SubEthaEdit handles things a bit better. So perhaps it won't matter much for Python 3000. -- Russell From guido at python.org Mon Aug 13 19:51:18 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Aug 2007 10:51:18 -0700 Subject: [Python-3000] [Email-SIG] fix email module for python 3000 (bytes/str) In-Reply-To: <200708130226.03670.victor.stinner@haypocalc.com> References: <200708090241.08369.victor.stinner@haypocalc.com> <200708110149.10939.victor.stinner@haypocalc.com> <8B640CF2-EB88-45A5-A85F-1267AF24749E@python.org> <200708130226.03670.victor.stinner@haypocalc.com> Message-ID: Checked in. But next time please do use SF to submit patches (and feel free to assign them to me and mail the list about it). On 8/12/07, Victor Stinner wrote: > On Sunday 12 August 2007 16:50:05 Barry Warsaw wrote: > > In r56957 I committed changes to sndhdr.py and imghdr.py so that they > > compare what they read out of the files against proper byte > > literals. > > So nobody read my patches? :-( See my emails "[Python-3000] Fix imghdr module > for bytes" and "[Python-3000] Fix sndhdr module for bytes" from last > saturday. But well, my patches look similar. > > Barry's patch is incomplete: test_voc() is wrong. > > I attached a new patch: > - fix "h[sbseek] == b'\1'" and "ratecode = ord(h[sbseek+4])" in test_voc() > - avoid division by zero > - use startswith method: replace h[:2] == b'BM' by h.startswith(b'BM') > - use aifc.open() instead of old aifc.openfp() > - use ord(b'P') instead of ord('P') This latter one is questionable. If you really want to compare to bytes, perhaps write h[:1] == b'P' instead of b[0] == ord(b'P')? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 13 20:57:28 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Aug 2007 11:57:28 -0700 Subject: [Python-3000] Four new failing tests In-Reply-To: <46BE1082.40300@v.loewis.de> References: <46BE1082.40300@v.loewis.de> Message-ID: On 8/11/07, "Martin v. L?wis" wrote: > > ====================================================================== > > ERROR: test_char_write (__main__.TestArrayWrites) > > ---------------------------------------------------------------------- > > Traceback (most recent call last): > > File "Lib/test/test_csv.py", line 648, in test_char_write > > a = array.array('u', string.letters) > > ValueError: string length not a multiple of item size I fixed this by removing the code from _locale.c that changes string.letters. > I think some decision should be made wrt. string.letters. > > Clearly, string.letters cannot reasonably contain *all* letters > (i.e. all characters of categories Ll, Lu, Lt, Lo). Or can it? > > Traditionally, string.letters contained everything that is a letter > in the current locale. Still, computing this string might be expensive > assuming you have to go through all Unicode code points and determine > whether they are letters in the current locale. > > So I see the following options: > 1. remove it entirely. Keep string.ascii_letters instead > 2. remove string.ascii_letters, and make string.letters to be > ASCII only. > 3. Make string.letters contain all letters in the current locale. > 4. Make string.letters truly contain everything that is classified > as a letter in the Unicode database. > > Which one should happen? First I'd like to rule out 3 and 4. I don't like 3 because in our new all-unicode world, using the locale for deciding what letters are makes no sense -- one should use isalpha() etc. I think 4 is not at all what people who use string.letters expect, and it's too large. I think 2 is unnecsesarily punishing people who use string.ascii_letters -- they have already declared they don't care about Unicode and we shouldn't break their code. So that leaves 1. There are (I think) two categories of users who use string.letters: (a) People who have never encountered a non-English locale and for whom there is no difference between string.ascii_letters and string.letters. Their code may or may not work in other locales. We're doing them a favor by flagging this in their code by removing string.letters. (b) People who want locale-specific behavior. Their code will probably break anyway, since they are apparently processing text using 8-bit characters encoded in a fixed-width encoding (e.g. the various Latin-N encodings). They ought to convert their code to Unicode. Once they are processing Unicode strings, they can just use isalpha() etc. If they really want to know the set of letters that can be encoded in their locale's encoding, they can use locale.getpreferredencoding() and deduce it from there, e.g.: enc = locale.getpreferredencoding() letters = [c for c in bytes(range(256)).decode(enc) if c.isalpha()] This won't work for multi-byte encodings of course -- but there code never worked in that case anyway. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 13 21:07:21 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Aug 2007 12:07:21 -0700 Subject: [Python-3000] Four new failing tests In-Reply-To: <46C029A7.8080604@gafol.net> References: <46C029A7.8080604@gafol.net> Message-ID: On 8/13/07, Paul Colomiets wrote: > Guido van Rossum wrote: > > I see four tests fail that passed yesterday: > > [...] > > < test_threaded_import > Patch attached. > Need any comments? Thanks! The patch as-is didn't help, but after changing the write() line to b'blat' it works. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 13 22:15:03 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Aug 2007 13:15:03 -0700 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 8/13/07, Russell E Owen wrote: > In article <87wsw3p5em.fsf at uwakimon.sk.tsukuba.ac.jp>, > "Stephen J. Turnbull" wrote: > > > Guido van Rossum writes: > > > > > However, the old universal newlines feature also set an attibute named > > > 'newlines' on the file object to a tuple of up to three elements > > > giving the actual line endings that were observed on the file so far > > > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not > > > implemented. I'm tempted to kill it. Does anyone have a use case for > > > this? > > > > I have run into files that intentionally have more than one newline > > convention used (mbox and Babyl mail folders, with messages received > > from various platforms). However, most of the time multiple newline > > conventions is a sign that the file is either corrupt or isn't text. > > If so, then saving the file may corrupt it. The newlines attribute > > could be used to check for this condition. > > There is at least one Mac source code editor (SubEthaEdit) that is all > too happy to add one kind of newline to a file that started out with a > different line ending character. As a result I have seen a fair number > of text files with mixed line endings. I don't see as many these days, > though; perhaps because the current version of SubEthaEdit handles > things a bit better. So perhaps it won't matter much for Python 3000. I've seen similar behavior in MS VC++ (long ago, dunno what it does these days). It would read files with \r\n and \n line endings, and whenever you edited a line, that line also got a \r\n ending. But unchanged lines that started out with \n-only endings would keep the \n only. And there was no way for the end user to see or control this. To emulate this behavior in Python you'd have to read the file in binary mode *or* we'd have to have an additional flag specifying to return line endings as encountered in the file. The newlines attribute (as defined in 2.x) doesn't help, because it doesn't tell which lines used which line ending. I think the newline feature in PEP 3116 falls short too; it seems mostly there to override the line ending *written* (from the default os.sep). I think we may need different flags for input and for output. For input, we'd need two things: (a) which are acceptable line endings; (b) whether to translate acceptable line endings to \n or not. For output, we need two things again: (c) whether to translate line endings at all; (d) which line endings to translate. I guess we could map (c) to (b) and (d) to (a) for a signature that's the same for input and output (and makes sense for read+write files as well). The default would be (a)=={'\n', '\r\n', '\r'} and (b)==True. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Mon Aug 13 22:22:56 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 13 Aug 2007 22:22:56 +0200 Subject: [Python-3000] Four new failing tests In-Reply-To: References: <46BE1082.40300@v.loewis.de> Message-ID: <46C0BDA0.3090207@v.loewis.de> > So that leaves 1. Ok. So several people have spoken in favor of removing string.letters; I'll work on removing it. Regards, Martin From brett at python.org Tue Aug 14 00:13:10 2007 From: brett at python.org (Brett Cannon) Date: Mon, 13 Aug 2007 15:13:10 -0700 Subject: [Python-3000] Python 3000 Sprint @ Google In-Reply-To: References: Message-ID: On 8/13/07, Guido van Rossum wrote: > It's official! The second annual Python Sprint @ Google is happening > again: August 22-25 (Wed-Sat). I can't attend this year (damn doctor's appt.), but I will try to be on Google Talk (username of bcannon) in case I can help out somehow remotely. -Brett From chris.monsanto at gmail.com Tue Aug 14 01:52:09 2007 From: chris.monsanto at gmail.com (Chris Monsanto) Date: Mon, 13 Aug 2007 19:52:09 -0400 Subject: [Python-3000] 100% backwards compatible parenless function call statements Message-ID: <799316b70708131652y32d77ee0kee84d1d3fd0ad065@mail.gmail.com> Since Python makes such a distinction between statements and expressions, I am proposing that function calls as statements should be allowed to omit parentheses. What I am proposing is 100% compatible with Python 2.x's behavior of function calls; so those uncomfortable with this (basic) idea can continue to use parens in their function calls. Expressions still require parens because of ambiguity and clarity issues. --Some examples:-- print "Parenless function call!", file=my_file print(".. but this is still allowed") # We still need parens for calls to functions where the sole argument is a tuple # But you'd have to do this anyway in Python 2.x... nothing lost. print((1, 2)) # Need parens if the function call isnt the only thing in the statement cos(3) + 4 # Need parens if function call isnt a statement, otherwise how would we get the function itself? x = cos(3) # Make a the value of my_func... my_func2 = my_func my_func2 # call other function my_func2() # call it again # Method call? f = open("myfile") f.close # Chained method obj.something().somethinganother().yeah --Notes:-- A lot of other things in Python 2.x/Python 3k at the moment have this same behavior... # No parens required x, y = b, a # But sometimes they are func((1, 2)) # Generator expressions sometimes don't need parens func(i for i in list) # But sometimes they do func(a, (i for i in list)) --Pros:-- 1) Removes unnecessary verbosity for the majority of situations. 2) Python 2.x code works the same unmodified. 3) No weird stuff with non-first class objects, ala Ruby meth.call(). Functions still remain assignable to other values without other trickery. 4) Because it's completely backwards compatible, you could even have something like from __future__ import parenless in Python 2.6 for a transition. --Cons:-- 1) Can't type "func" bare in interpreter to get its repr. I think this is a non-issue; I personally never do this, and with parenless calls you can just type "repr func" anyway. Specifically I think this shouldn't be considered because in scripts doing something like "f.close" does absolutely nothing and giving it some functionality would be nice. It also solves one of the Python gotchas found here: http://www.ferg.org/projects/python_gotchas.html(specifically #5) I'm willing to write up a proper PEP if anyone is interested in the idea. I figured I'd poll around first. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070813/21dbb247/attachment.htm From rrr at ronadam.com Tue Aug 14 01:52:31 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 13 Aug 2007 18:52:31 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BF3CC7.6010405@acm.org> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> Message-ID: <46C0EEBF.3010206@ronadam.com> Talin wrote: > I'm not sure that I'm happy with my own syntax proposal just yet. I want > to sit down with Guido and talk it over before I commit to anything. > > I think part of the difficulty that I am having is as follows: > However, the old syntax doesn't fit very well with the new requirements: > the desire to have the 'repr' option take precedence over the > type-specific formatter, and the desire to split the format specifier > into two parts, one which is handled by the type-specific formatter, and > one which is handled by the general formatter. > But with these format strings, it seems (to me, anyway) that the design > choices are a lot more arbitrary and driven by aesthetics. Almost any > combination of specifiers will work, the question is how to arrange them > in a way that is easy to memorize. They seem arbitrary because intuition says to put things together that "look" like they belong together, but the may really be two different things. Or because we try to use one thing for two different purposes. This is also what makes it hard to understand as well. I reconsidered the split term forms a bit more and I think I've come up with a better way to think about them. Sometimes a slight conceptual shift can make a difference. The basic form is: {name[:][type][alignment_term][,content_modifying_term]} TYPE: The specifier type. One of 'deifsrx'. (and any others I left off) No type defaults to 's' (this is the safest default, but it could be more flexible if need be.) ALIGNMENT TERM: [direction]width[/fill] direction: is one of '<^>' width: a positive integer /fill: a character * This pattern is always the same and makes up an alignment term. By keeping this part consistant, it makes it easy to remember and understand. CONTENT MODIFYING TERMS: * Strings and numbers are handled differently, You would never use both of these at the same time. STRINGS: [string_width] string_width: Positive or negative integer to clip a long string. * works like slice, s[:n] or s[-n:]) NUMBERS: [sign][0][digits][.decimals][%] sign: '-', '+', '(', ' ', '' * The trailing ')' is optional. 0: use leading zeros digits: number of digits, or number before decimal. .decimal: number of decimal places %: multiplies by 100 and add ' %' to end. Some differences are, alignment terms never have '+-' in them, or any of the number formatting symbols. They are consistent. The digits value are number of digits before the decimal. This doesn't include the other symbols used in the field so it isn't the same as a field width. (I believe this is one of the points of confusion. Or it is for me.) It bothered me that to figure out the number of digits before the decimal I had to subtract all the other parts. You can think of this as shorter form of the # syntax. ######.### -> 6.3 Surprisingly this does not have a big impact on the latest proposed syntax or the number of characters used unless someone wants to both specify a numeric formatting term with a larger sized alignment term. So here's what how it compares with an actual doctest that passes. Note: fstr() and ffloat() are versions of str and float with the needed methods to work. Examples from python3000 list: (With only a few changes where it makes sense or to make it work.) >>> floatvalue = ffloat(123.456) :f # Floating point number of natural width >>> fstr('{0:f}').format(floatvalue) '123.456' :f10 # Floating point number, width at least 10 ## behavior changed >>> fstr('{0:f10}').format(floatvalue) ' 123.456' >>> fstr('{0:f>10}').format(floatvalue) ' 123.456' :f010 # Floating point number, width at least 10, leading zeros >>> fstr('{0:f010}').format(floatvalue) '0000000123.456' :f.2 # Floating point number with two decimal digits >>> fstr('{0:f.2}').format(floatvalue) '123.46' :8 # Minimum width 8, type defaults to natural type ## defualt is string type, no type is guessed. >>> fstr('{0:8}').format(floatvalue) '123.456 ' :d+2 # Integer number, 2 digits, sign always shown ## (minor change to show padded digits.) >>> fstr('{0:d+5}').format(floatvalue) '+ 123' :!r # repr() format ## ALTERED, not special case syntax. ## the 'r' is special even so. >>> fstr('{0:r}').format(floatvalue) '123.456' :10!r # Field width 10, repr() format ## ALTERED, see above. >>> fstr('{0:r10}').format(floatvalue) '123.456 ' :s10 # String right-aligned within field of minimum width # of 10 chars. >>> fstr('{0:s10}').format(floatvalue) '123.456 ' :s10.10 # String right-aligned within field of minimum width # of 10 chars, maximum width 10. ## ALTERED, comma instead of period. >>> fstr('{0:s10,10}').format(floatvalue) '123.456 ' :s<10 # String left-aligned in 10 char (min) field. >>> fstr('{0:s<10}').format(floatvalue) '123.456 ' :d^15 # Integer centered in 15 character field >>> fstr('{0:d^15}').format(floatvalue) ' 123 ' :>15/. # Right align and pad with '.' chars >>> fstr('{0:>15/.}').format(floatvalue) '........123.456' :f<+015.5 # Floating point, left aligned, always show sign, # leading zeros, field width 15 (min), 5 decimal places. ## ALTERED: '<' not needed, size of digits reduced. >>> fstr('{0:f010.5}').format(floatvalue) '0000000123.45600' So very little actually changed in most of these syntax wise. Some behavioral changes to number formatting. But I think these are plus's. - I haven't special cases the '!' syntax. - The behavior of digits in the numeric format term is changed. So if the terms have the following patterns they can easily be identified. alignment term With strings type only ... A clipping term <0> <.> <%> A numeric format term And example of using these together... s>25,-25 right align short strings, clip long strings from the end. (clips the beginning off s = s[-25:]) f^30/_,(010.3%) Centers a zero padded number with 3 decimal, and with parentheses around negative numbers, and spaces around positive numbers, in a field 30 characters wide, with underscore padding. Yes, this example is a bit long, but it does a lot! > Guido's been suggesting that I model the format specifiers after the > .Net numeric formatting strings, but this system is significantly less > capable than %-style format specifiers. Yes, you can do fancy things > like "(###)###-####", but there's no provision for centering or for a > custom fill character. Usually when you use this type of formatting it's very specific and doesn't need any sort of aligning. Maybe we can try to add this in later after the rest is figured out and working? It would fit naturally as an alternative content modifying term for strings. s^30,(###)###-#### Center phone numbers in a 30 character column. The numbers signs are enough to identify the type here. > This would be easier if I was sitting in a room with other Python > programmers so that I could show them various suggestions and see what > their emotional reactions are. I'm having a hard time doing this in > isolation. That's kind of why I want to meet with Guido on this, as he's > good at cutting through this kind of crap. I agree, it would be easier. Cheers, Ron From greg.ewing at canterbury.ac.nz Tue Aug 14 02:58:27 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Aug 2007 12:58:27 +1200 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46BFA443.2000005@trueblade.com> References: <46BD79EC.1020301@acm.org> <46BFA443.2000005@trueblade.com> Message-ID: <46C0FE33.1010708@canterbury.ac.nz> Eric Smith wrote: > But the "!" (or > something similar) is needed, otherwise no format string could ever > begin with an "r". My current preference is for 'r' to always mean the same thing for all types. That means if you're designing a new format string, you just have to choose something other than 'r'. I don't see that as a big problem. -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 14 03:10:42 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Aug 2007 13:10:42 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BFC402.6060804@trueblade.com> References: <46B13ADE.7080901@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com> <46BC83BF.3000407@trueblade.com> <46BD155B.2010202@canterbury.ac.nz> <46BD2D59.1040209@trueblade.com> <46BFC402.6060804@trueblade.com> Message-ID: <46C10112.4040506@canterbury.ac.nz> Eric Smith wrote: > In order for me to write the __format__ function in MyInt, I have to > know if the specifier is in fact an int specifier. > > class MyInt: > def __format__(self, spec): > if int.is_int_specifier(spec): > return int(self).__format__(spec) > return "MyInt instance with custom specifier " + spec I would do this the other way around, i.e. first look to see whether the spec is one that MyInt wants to handle specially, and if not, *assume* that it's an int specifier. E.g. if MyInt defines a new "m" format: def __format__(self, spec): if spec.startswith("m"): return self.do_my_formatting(spec) else: return int(self).__format__(spec) -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 14 03:36:48 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Aug 2007 13:36:48 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46BFDE49.7090404@trueblade.com> References: <46B13ADE.7080901@acm.org> <46B54F51.40705@acm.org> <46B59F05.3070200@ronadam.com> <46B5FBD9.4020301@acm.org> <46BBBEC6.5030705@trueblade.com> <46BC83BF.3000407@trueblade.com> <46BD155B.2010202@canterbury.ac.nz> <46BD2D59.1040209@trueblade.com> <46BFC402.6060804@trueblade.com> <46BFDE49.7090404@trueblade.com> Message-ID: <46C10730.5010702@canterbury.ac.nz> Eric Smith wrote: > It's not clear to me if __int__ or __index__ is correct, here. I think > it's __int__, since float won't have __index__, and we want to be able > to convert float to int (right?). This issue doesn't arise if the object itself does the fallback conversion, as in the example I posted, rather than leave it to generic code in format(). -- Greg From bwinton at latte.ca Tue Aug 14 03:48:41 2007 From: bwinton at latte.ca (Blake Winton) Date: Mon, 13 Aug 2007 21:48:41 -0400 Subject: [Python-3000] 100% backwards compatible parenless function call statements In-Reply-To: <799316b70708131652y32d77ee0kee84d1d3fd0ad065@mail.gmail.com> References: <799316b70708131652y32d77ee0kee84d1d3fd0ad065@mail.gmail.com> Message-ID: <46C109F9.1040503@latte.ca> Chris Monsanto wrote: > so those uncomfortable with > this (basic) idea can continue to use parens in their function calls. But we would have to read people's code who didn't use them. > my_func2 # call other function > my_func2() # call it again So, those two are the same, but these two are different? print my_func2 print my_func2() What about these two? x.y().z x.y().z() Would this apply to anything which implements callable? > # Method call? > f = open("myfile") > f.close What happens in for x in dir(f): x ? If some things are functions, do they get called and the other things don't? > --Pros:-- > 1) Removes unnecessary verbosity for the majority of situations. "unnecessary verbosity" is kind of stretching it. Two whole characters in some situations is hardly a huge burden. > I'm willing to write up a proper PEP if anyone is interested in the > idea. I figured I'd poll around first. I vote "AAAAAAaaaahhhh! Dear god, no!". ;) Seriously, knowing at a glance the difference between function references and function invocations is one of the reasons I like Python (and dislike Ruby). Your proposal would severely compromise that functionality. Later, Blake. From bwinton at latte.ca Tue Aug 14 03:51:45 2007 From: bwinton at latte.ca (Blake Winton) Date: Mon, 13 Aug 2007 21:51:45 -0400 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <46C10AB1.4020708@latte.ca> Guido van Rossum wrote: > On 8/13/07, Russell E Owen wrote: >> In article <87wsw3p5em.fsf at uwakimon.sk.tsukuba.ac.jp>, >> "Stephen J. Turnbull" wrote: >>> I have run into files that intentionally have more than one newline >>> convention used (mbox and Babyl mail folders, with messages received >>> from various platforms). However, most of the time multiple newline >>> conventions is a sign that the file is either corrupt or isn't text. >> There is at least one Mac source code editor (SubEthaEdit) that is all >> too happy to add one kind of newline to a file that started out with a >> different line ending character. > I've seen similar behavior in MS VC++ (long ago, dunno what it does > these days). It would read files with \r\n and \n line endings, and > whenever you edited a line, that line also got a \r\n ending. But > unchanged lines that started out with \n-only endings would keep the > \n only. And there was no way for the end user to see or control this. I've seen it in Scite (an editor based around Scintilla) just yesterday. It was rather annoying, since it messed up my diffs something awful, and was invisible to the naked eye. (But it lets you "Show Line Endings", which quickly made the problem apparent.) Later, Blake. From victor.stinner at haypocalc.com Tue Aug 14 03:52:45 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 14 Aug 2007 03:52:45 +0200 Subject: [Python-3000] [Email-SIG] fix email module for python 3000 (bytes/str) In-Reply-To: References: <200708090241.08369.victor.stinner@haypocalc.com> <200708130226.03670.victor.stinner@haypocalc.com> Message-ID: <200708140352.45989.victor.stinner@haypocalc.com> Hi, On Monday 13 August 2007 19:51:18 Guido van Rossum wrote: > Checked in. But next time please do use SF to submit patches (and feel > free to assign them to me and mail the list about it). Ah yes, you already asked to use SF. I will use it next time. > On 8/12/07, Victor Stinner wrote: > > On Sunday 12 August 2007 16:50:05 Barry Warsaw wrote: > > > In r56957 I committed changes to sndhdr.py and imghdr.py so that they > > > compare what they read out of the files against proper byte > > > literals. > > > > So nobody read my patches? > > (...) > > I attached a new patch > > (...) > > - use ord(b'P') instead of ord('P') > > This latter one is questionable. If you really want to compare to > bytes, perhaps write h[:1] == b'P' instead of b[0] == ord(b'P')? Someone proposed c'P' syntax for ord(b'P') which is like an alias for 80. I prefer letters than number when letters have sens. I also think (I may be wrong) that b'xyz'[0] == 80 is faster than b'xyz'[:1] == b'x' since b'xyz'[:1] creates a new object. If we keep speed argument, b'xyz'[0] == ord(b'P') may be slower than b'xyz'[:1] == b'x' since ord(b'P') is recomputed each time (is it right?). But well, speed argument is stupid since it's a micro-optimization :-) Victor Stinner aka haypo http://hachoir.org/ From greg.ewing at canterbury.ac.nz Tue Aug 14 04:02:21 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Aug 2007 14:02:21 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46C0EEBF.3010206@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> Message-ID: <46C10D2D.60705@canterbury.ac.nz> Ron Adam wrote: > The digits value are number of digits before the decimal. This doesn't > include the other symbols used in the field so it isn't the same as a field > width. How does this work with formats where the number of digits before the decimal can vary, but before+after is constant? Also, my feeling about the whole of this is that it's too complicated. It seems like you can have at least three numbers in a format, and at first glance it's quite confusing as to what they all mean. -- Greg From skip at pobox.com Tue Aug 14 04:15:39 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 13 Aug 2007 21:15:39 -0500 Subject: [Python-3000] os.extsep & RISCOS support removal Message-ID: <18113.4171.231630.187319@montanaro.dyndns.org> I'm working my way through RISCOS code removal. I came across this note in Misc/HISTORY: - os.extsep -- a new variable needed by the RISCOS support. It is the separator used by extensions, and is '.' on all platforms except RISCOS, where it is '/'. There is no need to use this variable unless you have a masochistic desire to port your code to RISCOS. If RISCOS is going away should os.extsep as well? Skip From victor.stinner at haypocalc.com Tue Aug 14 04:22:36 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 14 Aug 2007 04:22:36 +0200 Subject: [Python-3000] Questions about email bytes/str (python 3000) Message-ID: <200708140422.36818.victor.stinner@haypocalc.com> Hi, After many tests, I'm unable to convert email module to Python 3000. I'm also unable to take decision of the best type for some contents. (1) Email parts should be stored as byte or character string? Related methods: Generator class, Message.get_payload(), Message.as_string(). Let's take an example: multipart (MIME) email with latin-1 and base64 (ascii) sections. Mix latin-1 and ascii => mix bytes. So the best type should be bytes. => bytes (2) Parsing file (raw string): use bytes or str in parsing? The parser use methods related to str like splitlines(), lower(), strip(). But it should be easy to rewrite/avoid these methods. I think that low-level parsing should be done on bytes. At the end, or when we know the charset, we can convert to str. => bytes About base64, I agree with Bill Janssen: - base64MIME.decode converts string to bytes - base64MIME.encode converts bytes to string But decode may accept bytes as input (as base64 modules does): use str(value, 'ascii', 'ignore') or str(value, 'ascii', 'strict'). I wrote 4 differents (non-working) patches. So I you want to work on email module and Python 3000, please first contact me. When I will get a better patch, I will submit it. Victor Stinner aka haypo http://hachoir.org/ From guido at python.org Tue Aug 14 04:24:20 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Aug 2007 19:24:20 -0700 Subject: [Python-3000] 100% backwards compatible parenless function call statements In-Reply-To: <799316b70708131652y32d77ee0kee84d1d3fd0ad065@mail.gmail.com> References: <799316b70708131652y32d77ee0kee84d1d3fd0ad065@mail.gmail.com> Message-ID: This is a topic for python-ideas, not python-3000. To be absolutely brutally honest, it doesn't look like you understand parsing well enough to be able to write a PEP. E.g. why is cos(3)+4 not interpreted as cos((3)+4) in your proposal? Python's predecessor had something like this, and they *did* do it properly. The result was that if you wanted the other interpretation you'd have to write (cos 3) + 4 Similarly in Haskell, I believe. In any case, I don't believe the claim from the subject, especially if you don't distinguish between f.close and f.close() How would you even know that 'close' is a method and not an attibute? E.g. how do you avoid interpreting f.closed as f.closed() (which would be a TypeError)? Skeptically, --Guido On 8/13/07, Chris Monsanto wrote: > Since Python makes such a distinction between statements and expressions, I > am proposing that function calls as statements should be allowed to omit > parentheses. What I am proposing is 100% compatible with Python 2.x's > behavior of function calls; so those uncomfortable with this (basic) idea > can continue to use parens in their function calls. Expressions still > require parens because of ambiguity and clarity issues. > > --Some examples:-- > > print "Parenless function call!", file=my_file > > print(".. but this is still allowed") > > # We still need parens for calls to functions where the sole argument is a > tuple > # But you'd have to do this anyway in Python 2.x... nothing lost. > print((1, 2)) > > # Need parens if the function call isnt the only thing in the statement > cos(3) + 4 > > # Need parens if function call isnt a statement, otherwise how would we get > the function itself? > x = cos(3) > > # Make a the value of my_func... > my_func2 = my_func > my_func2 # call other function > my_func2() # call it again > > # Method call? > f = open("myfile") > f.close > > # Chained method > obj.something().somethinganother().yeah > > --Notes:-- > > A lot of other things in Python 2.x/Python 3k at the moment have this same > behavior... > > # No parens required > x, y = b, a > > # But sometimes they are > func((1, 2)) > > # Generator expressions sometimes don't need parens > func(i for i in list) > > # But sometimes they do > func(a, (i for i in list)) > > --Pros:-- > > 1) Removes unnecessary verbosity for the majority of situations. > 2) Python 2.x code works the same unmodified. > 3) No weird stuff with non-first class objects, ala Ruby meth.call(). > Functions still remain assignable to other values without other trickery. > 4) Because it's completely backwards compatible, you could even have > something like from __future__ import parenless in Python 2.6 for a > transition. > > --Cons:-- > > 1) Can't type "func" bare in interpreter to get its repr. I think this is a > non-issue; I personally never do this, and with parenless calls you can just > type "repr func" anyway. Specifically I think this shouldn't be considered > because in scripts doing something like " f.close" does absolutely nothing > and giving it some functionality would be nice. It also solves one of the > Python gotchas found here: > http://www.ferg.org/projects/python_gotchas.html > (specifically #5) > > I'm willing to write up a proper PEP if anyone is interested in the idea. I > figured I'd poll around first. > > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 14 04:25:51 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Aug 2007 19:25:51 -0700 Subject: [Python-3000] os.extsep & RISCOS support removal In-Reply-To: <18113.4171.231630.187319@montanaro.dyndns.org> References: <18113.4171.231630.187319@montanaro.dyndns.org> Message-ID: On 8/13/07, skip at pobox.com wrote: > I'm working my way through RISCOS code removal. I came across this note in > Misc/HISTORY: > > - os.extsep -- a new variable needed by the RISCOS support. It is the > separator used by extensions, and is '.' on all platforms except > RISCOS, where it is '/'. There is no need to use this variable > unless you have a masochistic desire to port your code to RISCOS. > > If RISCOS is going away should os.extsep as well? Yes please. It just causes more code for no good reason. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Tue Aug 14 04:38:01 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 13 Aug 2007 21:38:01 -0500 Subject: [Python-3000] os.extsep & RISCOS support removal In-Reply-To: References: <18113.4171.231630.187319@montanaro.dyndns.org> Message-ID: <18113.5513.764384.130318@montanaro.dyndns.org> >> If RISCOS is going away should os.extsep as well? Guido> Yes please. It just causes more code for no good reason. Good. I already removed it in my sandbox. ;-) While I'm thinking about it, should I be identifying tasks I'm working on somewhere to avoid duplication of effort? There was something on the py3kstruni wiki page where people marked the failing tests they were working on. I'm not aware of a similar page for more general tasks. Skip From talin at acm.org Tue Aug 14 05:13:35 2007 From: talin at acm.org (Talin) Date: Mon, 13 Aug 2007 20:13:35 -0700 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46C095CF.2060507@ronadam.com> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> Message-ID: <46C11DDF.2080607@acm.org> Ron Adam wrote: > >> :f<+015.5 # Floating point, left aligned, always show sign, >> # leading zeros, field width 15 (min), 5 decimal places. > > Which has precedence... left alignment or zero padding? > > Or should this be an error? The answer is: Just ignore that proposal entirely :) ------ So I sat down with Guido and as I expected he has simplified my thoughts greatly. Based on the conversation we had, I think we both agree on what should be done: 1) There will be a new built-in function "format" that formats a single field. This function takes two arguments, a value to format, and a format specifier string. The "format" function does exactly the following: def format(value, spec): return value.__format__(spec) (I believe this even works if value is 'None'.) In other words, any type conversion or fallbacks must be done by __format__; Any interpretation or parsing of the format specifier is also done by __format__. "format" does not, however, handle the "!r" specifier. That is done by the caller of this function (usually the Formatter class.) 2) The various type-specific __format__ methods are allowed to know about other types - so 'int' knows about 'float' and so on. Note that other than the special case of int <--> float, this knowledge is one way only, meaning that the dependency graph is a acyclic. For most types, if they see a type letter that they don't recognize, they should coerce to their nearest built-in type (int, float, etc.) and re-invoke __format__. 3) In addition to int.__format__, float.__format__, and str.__format__, there will also be object.__format__, which simply coerces the object to a string, and calls __format__ on the result. class object: def __format__(self, spec): return str(self).__format__(spec) So in other words, all objects are formattable if they can be converted to a string. 4) Explicit type coercion is a separate field from the format spec: {name[:format_spec][!coercion]} Where 'coercion' can be 'r' (to convert to repr()), 's' (to convert to string.) Other letters may be added later based on need. The coercion field cases the formatter class to attempt to coerce the value to the specified type before calling format(value, format_spec) 5) Mini-language for format specifiers: So I do like your (Ron's) latest proposal, and I am thinking about it quite a bit. Guido suggested (and I am favorable to the idea) that we simply keep the 2.5 format syntax, or the slightly more advanced variation that's in the PEP now. This has a couple of advantages: -- It means that Python programmers won't have to learn a new syntax. -- It makes the 2to3 conversion of format strings trivial. (Although there are some other difficulties with automatic conversion of '%', but they are unrelated to format specifiers.) Originally I liked the idea of putting the type letter at the front, instead of at the back like it is in 2.5. However, when you think about it, it actually makes sense to have it at the back. Because the type letter is now optional, it won't need to be there most of the time. The type letter is really just an optional modifier flag, not a "type" at all. Two features of your proposal that aren't supported in the old syntax are: -- Arbitrary fill characters, as opposed to just '0' and ' '. -- Taking the string value from the left or right. I'm not sure how much we need the first. The second sounds kind of useful though. I'm thinking that we might be able to take your ideas and simply extend the old 2.5 syntax, so that it would be backwards compatible. On the other hand, it seems to me that once we have a *real* implementation (which we will soon), it will be relatively easy for people to experiment with new features and syntactical innovations. 6) Finally, Guido stressed that he wants to make sure that the implementation supports fields within fields, such as: {0:{1}.{2}} Fortunately, the 'format' function doesn't have to handle this (it only formats a single value.) This would be done by the higher-level code. -- Talin From guido at python.org Tue Aug 14 05:15:25 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Aug 2007 20:15:25 -0700 Subject: [Python-3000] os.extsep & RISCOS support removal In-Reply-To: <18113.5513.764384.130318@montanaro.dyndns.org> References: <18113.4171.231630.187319@montanaro.dyndns.org> <18113.5513.764384.130318@montanaro.dyndns.org> Message-ID: Use this page: http://wiki.python.org/moin/Py3kToDo The "master" Python 3000 in the wiki links to these and other resources: http://wiki.python.org/moin/Python3000 --Guido On 8/13/07, skip at pobox.com wrote: > > >> If RISCOS is going away should os.extsep as well? > > Guido> Yes please. It just causes more code for no good reason. > > Good. I already removed it in my sandbox. ;-) > > While I'm thinking about it, should I be identifying tasks I'm working on > somewhere to avoid duplication of effort? There was something on the > py3kstruni wiki page where people marked the failing tests they were working > on. I'm not aware of a similar page for more general tasks. > > Skip > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rrr at ronadam.com Tue Aug 14 05:44:38 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 13 Aug 2007 22:44:38 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46C10D2D.60705@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <46B2D147.90606@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C10D2D.60705@canterbury.ac.nz> Message-ID: <46C12526.8040807@ronadam.com> Greg Ewing wrote: > Ron Adam wrote: > >> The digits value are number of digits before the decimal. This doesn't >> include the other symbols used in the field so it isn't the same as a field >> width. > > How does this work with formats where the number of > digits before the decimal can vary, but before+after > is constant? I think this is what you're looking for. f>15,.3 #15 field width, 3 decimal places, right aligned. In this case the sign will be right before the most significant digit. Or you could use... f 10.3 # total width = 15 In this one, the sign would be to the far left of the field. So they are not the same thing. The space is used here to make positives numbers the same width as negatives values. > Also, my feeling about the whole of this is that > it's too complicated. It seems like you can have > at least three numbers in a format, and at first > glance it's quite confusing as to what they all > mean. Well, at first glance so is everything else that's been suggested, it's because we are doing a lot in a very little space. In this case we are adding just a touch of complexity to the syntax in order to use grouping to remove complexity in understanding the expression. These are all field width terms: >10 right align in field 10 ^15/_ center in field 15, pad with underscores 20/* left align in field 20, pad with * They are easy to identify because other terms do not contain '<^>/'. And sense they are separate from other format terms, once you get it, you've got it. Nothing more to remember here. It doesn't make sense to put signs in front of field widths because the signs have no relation to the field width at all. These are all number formats: +10.4 (10.4) .6 ' 9.3' Quoted so you can see the space. 10. Here, we don't use alignment symbols. Alignments have no meaning in the context of number of digits. So these taken as a smaller chunk of the whole will also be easier to remember. There are no complex interactions between field alignment terms, and number terms this way. That makes simpler to understand and learn. Lets take apart the alternative syntax. f<+15.2 f fixed point # of decimals is specified < align left (field attribute) + sign (number attribute) 15 width (field attribute) .2 decimals (number attribute) So what you have is some of them apply to the field with and some of them effect how the number is displayed. But they alternate. (Does anyone else find that kind of odd?) The specifier syntax described here groups related items together. f<15,-.2 f fixed point < left align 15 field width + sign .2 decimals Yes, we can get rid of one number by just using the field width in place of a digits width. But it's a trade off. I think it complicates the concept in exchange for simplifying the syntax. Regards, Ron From guido at python.org Tue Aug 14 05:53:26 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Aug 2007 20:53:26 -0700 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46C11DDF.2080607@acm.org> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> Message-ID: On 8/13/07, Talin wrote: > So I sat down with Guido and as I expected he has simplified my thoughts > greatly. Based on the conversation we had, I think we both agree on what > should be done: > > > 1) There will be a new built-in function "format" that formats a single > field. This function takes two arguments, a value to format, and a > format specifier string. > > The "format" function does exactly the following: > > def format(value, spec): > return value.__format__(spec) > > (I believe this even works if value is 'None'.) Yes, assuming the definition of object.__format__ you give later. > In other words, any type conversion or fallbacks must be done by > __format__; Any interpretation or parsing of the format specifier is > also done by __format__. > > "format" does not, however, handle the "!r" specifier. That is done by > the caller of this function (usually the Formatter class.) > > > 2) The various type-specific __format__ methods are allowed to know > about other types - so 'int' knows about 'float' and so on. > > Note that other than the special case of int <--> float, this knowledge > is one way only, meaning that the dependency graph is a acyclic. Though we don't necessarily care (witness the exception for int<->float -- other types could know about each other too, if it's useful). > For most types, if they see a type letter that they don't recognize, > they should coerce to their nearest built-in type (int, float, etc.) and > re-invoke __format__. Make that "for numeric types". One of my favorite examples of non-numeric types are the date, time and datetime types from the datetime module; here I propose that their __format__ be defined like this: def __format__(self, spec): return self.strftime(spec) > 3) In addition to int.__format__, float.__format__, and str.__format__, > there will also be object.__format__, which simply coerces the object to > a string, and calls __format__ on the result. > > class object: > def __format__(self, spec): > return str(self).__format__(spec) > > So in other words, all objects are formattable if they can be converted > to a string. > > > 4) Explicit type coercion is a separate field from the format spec: > > {name[:format_spec][!coercion]} Over lunch we discussed putting !coercion first. IMO {foo!r:20} reads more naturally from left to right: take foo, call repr() on it, then call format(_, '20') on the resulting string. > Where 'coercion' can be 'r' (to convert to repr()), 's' (to convert to > string.) Other letters may be added later based on need. > > The coercion field cases the formatter class to attempt to coerce the > value to the specified type before calling format(value, format_spec) > > > 5) Mini-language for format specifiers: > > So I do like your (Ron's) latest proposal, and I am thinking about it > quite a bit. > > Guido suggested (and I am favorable to the idea) that we simply keep the > 2.5 format syntax, or the slightly more advanced variation that's in the > PEP now. > > This has a couple of advantages: > > -- It means that Python programmers won't have to learn a new syntax. > -- It makes the 2to3 conversion of format strings trivial. (Although > there are some other difficulties with automatic conversion of '%', but > they are unrelated to format specifiers.) > > Originally I liked the idea of putting the type letter at the front, > instead of at the back like it is in 2.5. However, when you think about > it, it actually makes sense to have it at the back. Because the type > letter is now optional, it won't need to be there most of the time. The > type letter is really just an optional modifier flag, not a "type" at all. > > Two features of your proposal that aren't supported in the old syntax are: > > -- Arbitrary fill characters, as opposed to just '0' and ' '. > -- Taking the string value from the left or right. > > I'm not sure how much we need the first. The second sounds kind of > useful though. The second could be added to the mini-language for strings (str.__format__); I don't see how it would make sense for numbers. (If you want the last N digits of an int x, by all means use x%10**N.) > I'm thinking that we might be able to take your ideas and simply extend > the old 2.5 syntax, so that it would be backwards compatible. On the > other hand, it seems to me that once we have a *real* implementation > (which we will soon), it will be relatively easy for people to > experiment with new features and syntactical innovations. > > > 6) Finally, Guido stressed that he wants to make sure that the > implementation supports fields within fields, such as: > > {0:{1}.{2}} > > Fortunately, the 'format' function doesn't have to handle this (it only > formats a single value.) This would be done by the higher-level code. Yup. Great summary overall! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rrr at ronadam.com Tue Aug 14 07:49:11 2007 From: rrr at ronadam.com (Ron Adam) Date: Tue, 14 Aug 2007 00:49:11 -0500 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46C11DDF.2080607@acm.org> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> Message-ID: <46C14257.6090902@ronadam.com> Talin wrote: > Ron Adam wrote: >> >>> :f<+015.5 # Floating point, left aligned, always show sign, >>> # leading zeros, field width 15 (min), 5 decimal places. >> >> Which has precedence... left alignment or zero padding? >> >> Or should this be an error? > > The answer is: Just ignore that proposal entirely :) Ok :) > ------ > > So I sat down with Guido and as I expected he has simplified my thoughts > greatly. Based on the conversation we had, I think we both agree on what > should be done: > > > 1) There will be a new built-in function "format" that formats a single > field. This function takes two arguments, a value to format, and a > format specifier string. > > The "format" function does exactly the following: > > def format(value, spec): > return value.__format__(spec) > > (I believe this even works if value is 'None'.) > > In other words, any type conversion or fallbacks must be done by > __format__; Any interpretation or parsing of the format specifier is > also done by __format__. > > "format" does not, however, handle the "!r" specifier. That is done by > the caller of this function (usually the Formatter class.) > > > 2) The various type-specific __format__ methods are allowed to know > about other types - so 'int' knows about 'float' and so on. > > Note that other than the special case of int <--> float, this knowledge > is one way only, meaning that the dependency graph is a acyclic. > > For most types, if they see a type letter that they don't recognize, > they should coerce to their nearest built-in type (int, float, etc.) and > re-invoke __format__. If it coerces the value first, it can then just call format(value, spec). > 3) In addition to int.__format__, float.__format__, and str.__format__, > there will also be object.__format__, which simply coerces the object to > a string, and calls __format__ on the result. > > class object: > def __format__(self, spec): > return str(self).__format__(spec) > > So in other words, all objects are formattable if they can be converted > to a string. > > > 4) Explicit type coercion is a separate field from the format spec: > > {name[:format_spec][!coercion]} > > Where 'coercion' can be 'r' (to convert to repr()), 's' (to convert to > string.) Other letters may be added later based on need. > > The coercion field cases the formatter class to attempt to coerce the > value to the specified type before calling format(value, format_spec) So the !letters refer to actual types, where the format specifier letters are output format designators mean what ever the object interprets them as. Hmmm... ok, I see why Guido leans towards putting it before the colon. In a way it's more like a function call and not related to the format specifier type at all. {repr(name):format_spec} Heck, it could even be first... {r!name:format_spec} Or maybe because it's closer to name.__repr__ he prefers the name!r ordering? A wilder idea I was thinking about somewhat related to this was to be able to chain format specifiers, but I haven't worked out the details yet. > 5) Mini-language for format specifiers: > > So I do like your (Ron's) latest proposal, and I am thinking about it > quite a bit. I'm actually testing them before I post them. That filters out most of the really bad ideas. ;-) Although I'd also like to see a few more people agree with it before committing to something new. > Guido suggested (and I am favorable to the idea) that we simply keep the > 2.5 format syntax, or the slightly more advanced variation that's in the > PEP now. > > This has a couple of advantages: > > -- It means that Python programmers won't have to learn a new syntax. > -- It makes the 2to3 conversion of format strings trivial. (Although > there are some other difficulties with automatic conversion of '%', but > they are unrelated to format specifiers.) Yes, the 2 to 3 conversion will be a challenge with a new syntax, but as long as the new syntax is richer than the old one, it shouldn't be that much trouble. If we remove things we could do before, then it gets much harder. > Originally I liked the idea of putting the type letter at the front, > instead of at the back like it is in 2.5. However, when you think about > it, it actually makes sense to have it at the back. Because the type > letter is now optional, it won't need to be there most of the time. The > type letter is really just an optional modifier flag, not a "type" at all. The reason it's in the back for % formatting is it serves as the closing bracket. With the {}'s we can put it anywhere it makes that makes the most sense. > Two features of your proposal that aren't supported in the old syntax are: > > -- Arbitrary fill characters, as opposed to just '0' and ' '. > -- Taking the string value from the left or right. > > I'm not sure how much we need the first. The second sounds kind of > useful though. The fill characters are already implemented in the strings rjust, ljust, and center methods. | center(...) | S.center(width[, fillchar]) -> string | | Return S centered in a string of length width. Padding is | done using the specified fill character (default is a space) So adding it, is just a matter of calling these with the fillchar. And as Guido also pointed out... the taking of string values from the left and right should work on strings and not numbers. > I'm thinking that we might be able to take your ideas and simply extend > the old 2.5 syntax, so that it would be backwards compatible. On the > other hand, it seems to me that once we have a *real* implementation > (which we will soon), it will be relatively easy for people to > experiment with new features and syntactical innovations. I'm looking forward to that. :-) > 6) Finally, Guido stressed that he wants to make sure that the > implementation supports fields within fields, such as: > > {0:{1}.{2}} I've been thinking about this also for the use of dynamically formatting strings. Is that the use case he is after? "{0:{1},{2}}".format(value, '^40', 'f(20.2)') Which would first insert {1} and {2} into the string before formatting 0. {0:^40,f(20.2)} Use your favorite syntax of course. ;-) The items 1, and 2 would probably not be string literals in this case, but come from a data source associated to the value. And of course what actually gets inserted in inner fields can be anything. > Fortunately, the 'format' function doesn't have to handle this (it only > formats a single value.) This would be done by the higher-level code. Looks like this is moving along nicely now. :-) Cheers, Ron From andrew.j.wade at gmail.com Tue Aug 14 08:28:05 2007 From: andrew.j.wade at gmail.com (Andrew James Wade) Date: Tue, 14 Aug 2007 02:28:05 -0400 Subject: [Python-3000] Format specifier proposal In-Reply-To: References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> Message-ID: <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> On Mon, 13 Aug 2007 20:53:26 -0700 "Guido van Rossum" wrote: ... > One of my favorite examples of non-numeric types are the date, time > and datetime types from the datetime module; here I propose that their > __format__ be defined like this: > > def __format__(self, spec): > return self.strftime(spec) You loose the ability to align the field then. What about: def __format__(self, align_spec, spec="%Y-%m-%d %H:%M:%S"): return format(self.strftime(spec), align_spec) with def format(value, spec): if "," in spec: align_spec, custom_spec = spec.split(",",1) return value.__format__(align_spec, custom_spec) else: return value.__format__(spec) ":,%Y-%m-%d" may be slightly more gross than ":%Y-%m-%d", but on the plus side ":30" would mean the same thing across all types. -- Andrew From walter at livinglogic.de Tue Aug 14 09:55:35 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Tue, 14 Aug 2007 09:55:35 +0200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46C12526.8040807@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C10D2D.60705@canterbury.ac.nz> <46C12526.8040807@ronadam.com> Message-ID: <46C15FF7.8020106@livinglogic.de> Ron Adam wrote: > > Greg Ewing wrote: >> Ron Adam wrote: >> >>> The digits value are number of digits before the decimal. This doesn't >>> include the other symbols used in the field so it isn't the same as a field >>> width. >> How does this work with formats where the number of >> digits before the decimal can vary, but before+after >> is constant? > > I think this is what you're looking for. > > f>15,.3 #15 field width, 3 decimal places, right aligned. > > In this case the sign will be right before the most significant digit. > > Or you could use... > > f 10.3 # total width = 15 > > In this one, the sign would be to the far left of the field. So they are > not the same thing. The space is used here to make positives numbers the > same width as negatives values. > > >> Also, my feeling about the whole of this is that >> it's too complicated. It seems like you can have >> at least three numbers in a format, and at first >> glance it's quite confusing as to what they all >> mean. > > Well, at first glance so is everything else that's been suggested, it's > because we are doing a lot in a very little space. In this case we are > adding just a touch of complexity to the syntax in order to use grouping to > remove complexity in understanding the expression. > > These are all field width terms: > > >10 right align in field 10 > ^15/_ center in field 15, pad with underscores > 20/* left align in field 20, pad with * > > They are easy to identify because other terms do not contain '<^>/'. And > sense they are separate from other format terms, once you get it, you've > got it. Nothing more to remember here. > > It doesn't make sense to put signs in front of field widths because the > signs have no relation to the field width at all. > > > These are all number formats: > > +10.4 > (10.4) > .6 > ' 9.3' Quoted so you can see the space. > 10. > > Here, we don't use alignment symbols. Alignments have no meaning in the > context of number of digits. So these taken as a smaller chunk of the > whole will also be easier to remember. There are no complex interactions > between field alignment terms, and number terms this way. That makes > simpler to understand and learn. > > > Lets take apart the alternative syntax. > > f<+15.2 > > f fixed point # of decimals is specified > > < align left (field attribute) > > + sign (number attribute) > > 15 width (field attribute) > > .2 decimals (number attribute) Then why not have something more readable like al;s+;w15;d2 This is longer that <+15.2, but IMHO much more readable, because it's clear where each specifier ends and begins. Servus, Walter From p.f.moore at gmail.com Tue Aug 14 12:52:48 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 14 Aug 2007 11:52:48 +0100 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46C15FF7.8020106@livinglogic.de> References: <46B13ADE.7080901@acm.org> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C10D2D.60705@canterbury.ac.nz> <46C12526.8040807@ronadam.com> <46C15FF7.8020106@livinglogic.de> Message-ID: <79990c6b0708140352n7df1a758h6ed2fb37138ea930@mail.gmail.com> On 14/08/07, Walter D?rwald wrote: > Then why not have something more readable like > > al;s+;w15;d2 A brief sanity check freom someone who is not reading this thread, but happened to see this post (and it's *not* a dig at Walter, just a general comment): If that's *more* readable, I'd hate to see what it's more readable *than*. I'd suggest that someone take a step back and think about how people will use these things in practice. I'd probably refuse to accept something like that in a code review without a comment. And I'd certainly swear if I had to deal with it in maintenance... Paul. From barry at python.org Tue Aug 14 15:30:58 2007 From: barry at python.org (Barry Warsaw) Date: Tue, 14 Aug 2007 09:30:58 -0400 Subject: [Python-3000] [Email-SIG] fix email module for python 3000 (bytes/str) In-Reply-To: <200708130226.03670.victor.stinner@haypocalc.com> References: <200708090241.08369.victor.stinner@haypocalc.com> <200708110149.10939.victor.stinner@haypocalc.com> <8B640CF2-EB88-45A5-A85F-1267AF24749E@python.org> <200708130226.03670.victor.stinner@haypocalc.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 12, 2007, at 8:26 PM, Victor Stinner wrote: > On Sunday 12 August 2007 16:50:05 Barry Warsaw wrote: >> In r56957 I committed changes to sndhdr.py and imghdr.py so that they >> compare what they read out of the files against proper byte >> literals. > > So nobody read my patches? :-( See my emails "[Python-3000] Fix > imghdr module > for bytes" and "[Python-3000] Fix sndhdr module for bytes" from last > saturday. But well, my patches look similar. Victor, sorry but my email was very spotty and I definitely missed your original patches. Sorry for duplicating work and thanks for fixing the last few things in these modules. Glad Guido got these committed. I'll follow up on email package more in a bit. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsGuknEjvBPtnXfVAQLbfgQAqfiBeaVwIN35nXn9D7DZXItkzoZSd+1V f/a4PnzBHTdvFZgggisK/7o5b1uULOaHILLSmiQMFp0W/zV2JFCvKI7kc1/SkjSo UgIXK3o9WtmljH3aj1njc6fgy3VCVfa09NDKf89/rCy15AaSxF21YinIDIqF/yGN Sn2RQJqvNPc= =KpZC -----END PGP SIGNATURE----- From skip at pobox.com Tue Aug 14 15:34:20 2007 From: skip at pobox.com (skip at pobox.com) Date: Tue, 14 Aug 2007 08:34:20 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <79990c6b0708140352n7df1a758h6ed2fb37138ea930@mail.gmail.com> References: <46B13ADE.7080901@acm.org> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C10D2D.60705@canterbury.ac.nz> <46C12526.8040807@ronadam.com> <46C15FF7.8020106@livinglogic.de> <79990c6b0708140352n7df1a758h6ed2fb37138ea930@mail.gmail.com> Message-ID: <18113.44892.399786.404146@montanaro.dyndns.org> Paul> If that's *more* readable, I'd hate to see what it's more readable Paul> *than*. Paul> I'd suggest that someone take a step back and think about how Paul> people will use these things in practice. I'd probably refuse to Paul> accept something like that in a code review without a comment. And Paul> I'd certainly swear if I had to deal with it in maintenance... I'm with Paul. When I first saw the examples of the proposed notation I thought, "Exactly how is this better than the current printf-style format strings?" Since then I have basically ignored the discussion. Before you go too much farther I would suggest feeding the proposal to the wolves^H^H^H^H^H^Hfolks on comp.lang.python to see what they think. Skip From barry at python.org Tue Aug 14 15:45:45 2007 From: barry at python.org (Barry Warsaw) Date: Tue, 14 Aug 2007 09:45:45 -0400 Subject: [Python-3000] bytes: compare bytes to integer In-Reply-To: References: <200708110225.28056.victor.stinner@haypocalc.com> <07Aug12.101123pdt."57996"@synergy1.parc.xerox.com> Message-ID: <719423F4-779C-4596-8045-0E60603A9F92@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 12, 2007, at 1:41 PM, Georg Brandl wrote: > Bill Janssen schrieb: >>> I don't like the behaviour of Python 3000 when we compare a bytes >>> strings >>> with length=1: >>>>>> b'xyz'[0] == b'x' >>> False >>> >>> The code can be see as: >>>>>> ord(b'x') == b'x' >>> False >>> >>> or also: >>>>>> 120 == b'x' >>> False >>> >>> Two solutions: >>> 1. b'xyz'[0] returns a new bytes object (b'x' instead of 120) >>> like b'xyz'[0:1] does >>> 2. allow to compare a bytes string of 1 byte with an integer >>> >>> I prefer (2) since (1) is wrong: bytes contains integers and not >>> bytes! >> >> Why not just write >> >> b'xyz'[0:1] == b'x' >> >> in the first place? Let's not start adding "special" cases. > > Hm... I have a feeling that this will be one of the first entries in a > hypothetical "Python 3.0 Gotchas" list. Yes, it will because the b-prefix tricks you by being just similar enough to 8-bit strings for you to want them to act the same way. I'm not advocating getting rid of bytes literals though (they are just too handy), but if you were forced to spell it bytes('xyz') I don't think you'd get as much confusion. Any tutorial on bytes should include the following example: >>> a = list('xyz') >>> a[0] 'x' >>> a[0:1] ['x'] >>> b = bytes('xyz') >>> b[0] 120 >>> b[0:1] b'x' >>> b == b'xyz' True That makes it pretty clear, IMO. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsGyC3EjvBPtnXfVAQJEtAQAtMUk8fVAFeMHYam6iNg4G3+NwmPWVXp4 YJSh8ZBEICSNlyJSNk8ntE0vKkqLSFMnI24RtoFDJJ2lKrbPtBoH2OyWuXHgfCzd VG/LBMjMRV0IMQjkl2EtpD2atBBfDhQ6IPZtqaZJQ7HM10IUZtEq3gf/Q2Alttm4 nr4W46Pny3s= =1rz/ -----END PGP SIGNATURE----- From barry at python.org Tue Aug 14 15:58:32 2007 From: barry at python.org (Barry Warsaw) Date: Tue, 14 Aug 2007 09:58:32 -0400 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 13, 2007, at 4:15 PM, Guido van Rossum wrote: > I've seen similar behavior in MS VC++ (long ago, dunno what it does > these days). It would read files with \r\n and \n line endings, and > whenever you edited a line, that line also got a \r\n ending. But > unchanged lines that started out with \n-only endings would keep the > \n only. And there was no way for the end user to see or control this. > > To emulate this behavior in Python you'd have to read the file in > binary mode *or* we'd have to have an additional flag specifying to > return line endings as encountered in the file. The newlines attribute > (as defined in 2.x) doesn't help, because it doesn't tell which lines > used which line ending. I think the newline feature in PEP 3116 falls > short too; it seems mostly there to override the line ending *written* > (from the default os.sep). > > I think we may need different flags for input and for output. > > For input, we'd need two things: (a) which are acceptable line > endings; (b) whether to translate acceptable line endings to \n or > not. For output, we need two things again: (c) whether to translate > line endings at all; (d) which line endings to translate. I guess we > could map (c) to (b) and (d) to (a) for a signature that's the same > for input and output (and makes sense for read+write files as well). > The default would be (a)=={'\n', '\r\n', '\r'} and (b)==True. I haven't thought about the output side of the equation, but I've already hit a situation where I'd like to see the input side (b) option implemented. I'm still sussing out the email package changes (down to 7F/9E of 247 tests!) but in trying to fix things I found myself wanting to open files in text mode so that I got strings out of the file instead of bytes. This was all fine except that some of the tests started failing because of the EOL translation that happens unconditionally now. The file contained \r\n and the test was ensuring these EOLs were preserved in the parsed text. I switched back to opening the file in binary mode, and doing a crufty conversion of bytes to strings (which I suspect is error prone but gets me farther along). It would have been perfect, I think, if I could have opened the file in text mode so that read() gave me strings, with universal newlines and preservation of line endings (i.e. no translation to \n). - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsG1CXEjvBPtnXfVAQKF3AP/X+/E44KI2EB3w0i3N5cGBCajJbMV93fk j2S/lfQf4tjBH3ZFEhUnybcJxsNukYY65T4MdzKh+IgJHV5s0rQtl2Hzr85e7Y0O i5Z3N4TAKc11PjSIk6vKrkgwPCEMzvwIQ5DFxeQBF5kOF6cZuXKaeDzB6z/GBYNv YiJEnOeZkW8= =u6OL -----END PGP SIGNATURE----- From rrr at ronadam.com Tue Aug 14 16:20:26 2007 From: rrr at ronadam.com (Ron Adam) Date: Tue, 14 Aug 2007 09:20:26 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46C15FF7.8020106@livinglogic.de> References: <46B13ADE.7080901@acm.org> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C10D2D.60705@canterbury.ac.nz> <46C12526.8040807@ronadam.com> <46C15FF7.8020106@livinglogic.de> Message-ID: <46C1BA2A.9050303@ronadam.com> Walter D?rwald wrote: >> Lets take apart the alternative syntax. >> >> f<+15.2 >> >> f fixed point # of decimals is specified >> >> < align left (field attribute) >> >> + sign (number attribute) >> >> 15 width (field attribute) >> >> .2 decimals (number attribute) > > Then why not have something more readable like > > al;s+;w15;d2 > > This is longer that <+15.2, but IMHO much more readable, because it's > clear where each specifier ends and begins. > > Servus, > Walter Well depending on what its for that might very well be appropriate, It's order independent, and has other benefits, but it' probably the other extreme in the case of string formatting. It's been expressed here quite a few times that compactness is also desirable. By dividing it into two terms, you still get the compactness in the most common cases and you get an easier to understand and read terms in the more complex cases. Or format fields dynamically by inserting components, it makes things a bit easier. BTW, the order of the grouping is flexible... value_spec = "f.2" "{0:{1},^30}".format(value, value_spec) Or... field_spec = "^30" "{0:f.2,{1}}".format(value, field_spec) So it breaks it up into logical parts as well. Cheers, Ron From barry at python.org Tue Aug 14 17:39:29 2007 From: barry at python.org (Barry Warsaw) Date: Tue, 14 Aug 2007 11:39:29 -0400 Subject: [Python-3000] Questions about email bytes/str (python 3000) In-Reply-To: <200708140422.36818.victor.stinner@haypocalc.com> References: <200708140422.36818.victor.stinner@haypocalc.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 13, 2007, at 10:22 PM, Victor Stinner wrote: > After many tests, I'm unable to convert email module to Python > 3000. I'm also > unable to take decision of the best type for some contents. I made a lot of progress on the email package while I was traveling, though I haven't checked things in yet. I probably will very soon, even if I haven't yet fixed the last few remaining problems. I'm down to 7 failures, 9 errors of 247 tests. > (1) Email parts should be stored as byte or character string? Strings. Email messages are conceptually strings so I think it makes sense to represent them internally as such. The FeedParser should expect strings and the Generator should output strings. One place where I think bytes should show up would be in decoded payloads, but in that case I really want to make an API change so that .get_payload (decoded=True) is deprecated in favor of a separate method. I'm proposing other API changes to make things work better, a few of which are in my current patch, but others I want to defer if they don't directly contribute to getting these tests to pass. > Related methods: Generator class, Message.get_payload(), > Message.as_string(). > > Let's take an example: multipart (MIME) email with latin-1 and > base64 (ascii) > sections. Mix latin-1 and ascii => mix bytes. So the best type > should be > bytes. > > => bytes Except that by the time they're parsed into an email message, they must be ascii, either encoded as base64 or quoted-printable. We also have to know at that point the charset being used, so I think it makes sense to keep everything as strings. > (2) Parsing file (raw string): use bytes or str in parsing? > > The parser use methods related to str like splitlines(), lower(), > strip(). But > it should be easy to rewrite/avoid these methods. I think that low- > level > parsing should be done on bytes. At the end, or when we know the > charset, we > can convert to str. > > => bytes Maybe, though I'm not totally convinced. It's certainly easier to get the tests to pass if we stick with parsing strings. email.message_from_string() should continue to accept strings, otherwise obviously it would have to be renamed, but also because it's primary use case is turning a triple quoted string literal into an email message. I alluded to the one crufty part of this in a separate thread. In order to accept universal newlines but preserve end-of-line characters, you currently have to open files in binary mode. Then, because my parser works on strings you have to convert those bytes to strings, which I am successfully doing now, but which I suspect is ultimately error prone. I would like to see a flag to preserve line endings on files opened in text + universal newlines mode, and then I think the hack for Parser.parse() would go away. We'd define how files passed to this method must be opened. Besides, I think it is much more common to be parsing strings into email messages anyway. > About base64, I agree with Bill Janssen: > - base64MIME.decode converts string to bytes > - base64MIME.encode converts bytes to string I agree. > But decode may accept bytes as input (as base64 modules does): use > str(value, 'ascii', 'ignore') or str(value, 'ascii', 'strict'). Hmm, I'm not sure about this, but I think that .encode() may have to accept strings. > I wrote 4 differents (non-working) patches. So I you want to work > on email > module and Python 3000, please first contact me. When I will get a > better > patch, I will submit it. Like I said, I also have an extensive patch that gets me most of the way there. I don't want to having dueling patches, so I think what I'll do is put a branch in the sandbox and apply my changes there for now. Then we will have real code to discuss. A few other things from my notes and diff: Do we need email.message_from_bytes() and Message.as_bytes()? While I'm (currently ) pretty well convinced that email messages should be strings, the use case for bytes includes reading them directly to or from sockets, though in this case because the RFCs generally require ascii with encodings and charsets clearly described, I think a bytes-to-string wrapper may suffice. Charset class: How do we do conversions from input charset to output charset? This is required by e.g. Japanese to go from euc-jp to iso-2022-jp IIUC. Currently I have to use a crufty string-to-bytes converter like so: >>> bytes(ord(c) for c in s) rather than just bytes(s). I'm sure there's a better way I haven't found yet. Generator._write_headers() and the _is8bitstring() test aren't really appropriate or correct now that everything's a unicode. This affected quite a few tests because long headers that previously were getting split were now not getting split. I ended up ditching the _is8bitstring() test, but that lead me into an API change for Message.__str__() and Message.as_string(), which I've long wanted to do anyway. First Message.__str__() no longer includes the Unix-From header, but more importantly, .as_string() takes the maxheaderlen as an argument and defaults to no header wrapping. By changing various related tests to call .as_string(maxheaderlen=78), these split header tests can be made to pass again. I think these changes make str (some_message) saner and more explicit (because it does not split headers) but these may be controversial in the email-sig. You asked earlier about decode_header(). This should definitely return a list of tuples of (bytes, charset|None). Header is going to need some significant revision First, there's the whole mess of .encode() vs. __str__() vs. __unicode__() to sort out. It's insane that the latter two had different semantics w.r.t. whitespace preservation between encoded words, so let's fix that. Also, if the common use case is to do something like this: >>> msg['subject'] = 'a subject string' then I wonder if we shouldn't be doing more sanity checking on the header value. For example, if the value had a non-ascii character in it, then what should we do? One way would be to throw an exception, requiring the use of something like: >>> msg['subject'] = Header('a \xfc subject', 'utf-8') or we could do the most obvious thing and try to convert to 'ascii' then 'utf-8' if no charset is given explicitly. I thought about always turning headers into Header instances, but I think that might break some common use cases. It might be possible to define equality and other operations on Header instances so that these common cases continue to work. The email-sig can address that later. However, if all Header instances are unicode and have a valid charset, I wonder if the splittable tests are still relevant, and whether we can simplify header splitting. I have to think about this some more. As for the remaining failures and errors, they come down to simplifying the splittable logic, dealing with Message.__str__() vs. Message.__unicode__(), verifying that the UnicodeErrors some tests expect to get raise don't make sense any more, and fixing a couple of other small issues I haven't gotten to yet. I will create a sandbox branch and apply my changes later today so we have something concrete to look at. Cheers, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsHMsXEjvBPtnXfVAQLfCwP8CeHi9RBW5ULri3w6sBz5a1fkdVCftk71 uW8q0LercTJSa2ewvtrlWdKm9F403IabYjh2Bg8cZfHmYyZ+/b18oU64zzkZylo/ pHw9Iyvk9ZW6G7mwJRwpV9c6JXJNvsQtKRWipuue0ZMagI5OJBXR8vhRIDGkt+NC ARhIrHXPEW8= =DBLp -----END PGP SIGNATURE----- From adam at hupp.org Tue Aug 14 18:16:21 2007 From: adam at hupp.org (Adam Hupp) Date: Tue, 14 Aug 2007 11:16:21 -0500 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20070814161621.GA26420@mouth.upl.cs.wisc.edu> On Tue, Aug 14, 2007 at 09:58:32AM -0400, Barry Warsaw wrote: > This was all fine except that some of the tests started > failing because of the EOL translation that happens unconditionally > now. The file contained \r\n and the test was ensuring these EOLs > were preserved in the parsed text. I switched back to opening the > file in binary mode, and doing a crufty conversion of bytes to > strings (which I suspect is error prone but gets me farther along). > > It would have been perfect, I think, if I could have opened the file > in text mode so that read() gave me strings, with universal newlines > and preservation of line endings (i.e. no translation to \n). FWIW this same issue (and solution) came up while fixing the csv tests. -- Adam Hupp | http://hupp.org/adam/ From martin at v.loewis.de Tue Aug 14 18:35:56 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 14 Aug 2007 18:35:56 +0200 Subject: [Python-3000] PEP 3131: Identifier syntax Message-ID: <46C1D9EC.70902@v.loewis.de> I'm trying to finalize PEP 3131, and want to collect proposals on modifications of the identifier syntax. I will ignore any proposals that suggest that different versions of the syntax should be used depending on various conditions; I'm only asking for modifications to the current proposed syntax. So far, I recall two specific suggestions which I have now incorporated into the PEP: usage if NFKC instead of NFC, and usage of XID_Start and XID_Continue instead of ID_Start and ID_Continue (although I'm still uncertain on how precisely these properties are defined). What other changes should be implemented? Regards, Martin From guido at python.org Tue Aug 14 18:41:48 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Aug 2007 09:41:48 -0700 Subject: [Python-3000] Format specifier proposal In-Reply-To: <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> Message-ID: On 8/13/07, Andrew James Wade wrote: > On Mon, 13 Aug 2007 20:53:26 -0700 > "Guido van Rossum" wrote: > > ... > > > One of my favorite examples of non-numeric types are the date, time > > and datetime types from the datetime module; here I propose that their > > __format__ be defined like this: > > > > def __format__(self, spec): > > return self.strftime(spec) > > You loose the ability to align the field then. What about: > > def __format__(self, align_spec, spec="%Y-%m-%d %H:%M:%S"): > return format(self.strftime(spec), align_spec) > > with > > def format(value, spec): > if "," in spec: > align_spec, custom_spec = spec.split(",",1) > return value.__format__(align_spec, custom_spec) > else: > return value.__format__(spec) > > ":,%Y-%m-%d" may be slightly more gross than ":%Y-%m-%d", but on the plus > side ":30" would mean the same thing across all types. Sorry, I really don't like imposing *any* syntactic constraints on the spec apart from !r and !s. You can get the default format with a custom size by using !s:30. If you want a custom format *and* padding, just add extra spaces to the spec. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 14 18:52:47 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Aug 2007 09:52:47 -0700 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 8/14/07, Barry Warsaw wrote: > It would have been perfect, I think, if I could have opened the file > in text mode so that read() gave me strings, with universal newlines > and preservation of line endings (i.e. no translation to \n). You can do that already, by passing newline="\n" to the open() function when using text mode. Try this script for a demo: f = open("@", "wb") f.write("bare nl\n" "crlf\r\n" "bare nl\n" "crlf\r\n") f.close() f = open("@", "r") # default, universal newlines mode print(f.readlines()) f.close() f = open("@", "r", newline="\n") # recognize only \n as newline print(f.readlines()) f.close() This outputs: ['bare nl\n', 'crlf\n', 'bare nl\n', 'crlf\n'] ['bare nl\n', 'crlf\r\n', 'bare nl\n', 'crlf\r\n'] Now, this doesn't support bare \r as line terminator, but I doubt you care much about that (unless you want to port the email package to Mac OS 9 :-). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Tue Aug 14 19:22:28 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 14 Aug 2007 19:22:28 +0200 Subject: [Python-3000] No (C) optimization flag In-Reply-To: <46BDAC43.3050904@benjiyork.com> References: <46BD1185.2080702@canterbury.ac.nz> <46BDAC43.3050904@benjiyork.com> Message-ID: Benji York wrote: >> But wouldn't the only reason you want to step into, >> e.g. pickle be if there were a bug in pickle itself? > > I believe he's talking about a situation where pickle calls back into > Python. Yes, Benji is right. In the past I run into trouble with pickles two or times. I was successfully able to debug and resolve my problem with the pickle module and pdb. I like to keep the option in the Python 3.0 series. In my opinion it is very useful to step through Python code to see how the code is suppose to work. I'm trying to get involve in the Python core development process. It seems that I'm not ready yet to contribute new ideas because I'm missing the big picture. On the other hand I don't know how I can contribute to existing sub projects for Py3k. I find it difficult to get in. :/ Christian From brett at python.org Tue Aug 14 20:16:25 2007 From: brett at python.org (Brett Cannon) Date: Tue, 14 Aug 2007 11:16:25 -0700 Subject: [Python-3000] No (C) optimization flag In-Reply-To: References: <46BD1185.2080702@canterbury.ac.nz> <46BDAC43.3050904@benjiyork.com> Message-ID: On 8/14/07, Christian Heimes wrote: > Benji York wrote: > >> But wouldn't the only reason you want to step into, > >> e.g. pickle be if there were a bug in pickle itself? > > > > I believe he's talking about a situation where pickle calls back into > > Python. > > Yes, Benji is right. In the past I run into trouble with pickles two or > times. I was successfully able to debug and resolve my problem with the > pickle module and pdb. I like to keep the option in the Python 3.0 > series. In my opinion it is very useful to step through Python code to > see how the code is suppose to work. > > I'm trying to get involve in the Python core development process. It > seems that I'm not ready yet to contribute new ideas because I'm missing > the big picture. Just stick around for a while and you will pick up on a general theme in how decisions are made. > On the other hand I don't know how I can contribute to > existing sub projects for Py3k. I find it difficult to get in. :/ Well, don't force it unless you like the subproject. If you are just looking for something to do there are always bugs to squash or patches to evaluate. Otherwise I would suggest just waiting until something comes along that grabs your attention and bugging anyone else who is working on it for any guidance you need. Yes, it can take a little while to get into the groove, but we are all nice guys and are happy to answer your questions. -Brett From jimjjewett at gmail.com Tue Aug 14 21:33:20 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 14 Aug 2007 15:33:20 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46C0EEBF.3010206@ronadam.com> References: <46B13ADE.7080901@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> Message-ID: On 8/13/07, Ron Adam wrote: > I reconsidered the split term forms a bit more and I think I've come up > with a better way to think about them. Sometimes a slight conceptual > shift can make a difference. > The basic form is: > {name[:][type][alignment_term][,content_modifying_term]} That sounds good, but make sure it works in practice; I think you were already tempted to violate it yourself in your details section. You used the (alignment term) width as the number of digits before the decimal, instead of as the field width. > TYPE: > The specifier type. One of 'deifsrx'. > (and any others I left off) There should be one that says "just trust the object, and if it doesn't have a __format__, then gripe". (We wouldn't need to support arbitrary content_modifying_terms unless it were possible to use more than the builtin types.) > ALIGNMENT TERM: [direction]width[/fill] > > direction: is one of '<^>' > width: a positive integer > /fill: a character So this assumes fixed-width, with fill? Can I leave the width off to say "whatever it takes"? Can I say "width=whatever it takes, up to 72 chars ... but don't pad it if you don't need to"? (And once you support variable-width, then minimum is needed again.) I'm not sure that variable lengths and alignment even *should* be supported in the same expression, but it forcing everything to fixed-width would be enough of a change that it needs an explicit callout. > NUMBERS: [sign][0][digits][.decimals][%] I read Greg's question: How does this work with formats where the number of digits before the decimal can vary, but before+after is constant? differently, as about significant figures. It may be that 403 and 14.1 are both valid values, but 403.0 would imply too much precision. (Would it *always* be OK to write these as 4.03e+2 and 1.41e+1?) Maybe the answer is that sig figs are a special case, and need a template with callbacks instead of a format string ... but that doesn't feel right. > /fill: a character I think you need to specify it a bit more than that. Can you use a comma? (It looks like the start of the content modifier.) How about a quote-mark, or a carriage return? -jJ From g.brandl at gmx.net Tue Aug 14 22:19:06 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 14 Aug 2007 22:19:06 +0200 Subject: [Python-3000] Documentation switch imminent Message-ID: Now that the converted documentation is fairly bug-free, I want to make the switch. I will replace the old Doc/ trees in the trunk and py3k branches tomorrow, moving over the reST ones found at svn+ssh://svn.python.org/doctools/Doc-{26,3k}. Neal will change his build scripts, so that the 2.6 and 3.0 devel documentation pages at docs.python.org will be built from these new trees soon. Infos for people who will write docs in the new trees can be found in the new "Documenting Python" document, at the moment still available from http://pydoc.gbrandl.de:3000/documenting/, especially the "Differences" section at http://pydoc.gbrandl.de:3000/documenting/fromlatex/ (which is not complete, patches are welcome :) Cheers, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From lists at cheimes.de Tue Aug 14 22:46:56 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 14 Aug 2007 22:46:56 +0200 Subject: [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: <46C214C0.3050703@cheimes.de> Georg Brandl wrote: > Infos for people who will write docs in the new trees can be found in the > new "Documenting Python" document, at the moment still available from > http://pydoc.gbrandl.de:3000/documenting/, especially the "Differences" > section at http://pydoc.gbrandl.de:3000/documenting/fromlatex/ (which > is not complete, patches are welcome :) http://pydoc.gbrandl.de:3000/documenting/fromlatex/ doesn't work for me: Keyword Not Found The keyword documenting/fromlatex is not directly associated with a page. Christian From g.brandl at gmx.net Tue Aug 14 22:55:36 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 14 Aug 2007 22:55:36 +0200 Subject: [Python-3000] Documentation switch imminent In-Reply-To: <46C214C0.3050703@cheimes.de> References: <46C214C0.3050703@cheimes.de> Message-ID: Christian Heimes schrieb: > Georg Brandl wrote: >> Infos for people who will write docs in the new trees can be found in the >> new "Documenting Python" document, at the moment still available from >> http://pydoc.gbrandl.de:3000/documenting/, especially the "Differences" >> section at http://pydoc.gbrandl.de:3000/documenting/fromlatex/ (which >> is not complete, patches are welcome :) > > http://pydoc.gbrandl.de:3000/documenting/fromlatex/ doesn't work for me: > > Keyword Not Found > > The keyword documenting/fromlatex is not directly associated with a page. Oops... should be fixed now. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From rrr at ronadam.com Wed Aug 15 00:53:17 2007 From: rrr at ronadam.com (Ron Adam) Date: Tue, 14 Aug 2007 17:53:17 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: References: <46B13ADE.7080901@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> Message-ID: <46C2325D.1010209@ronadam.com> Jim Jewett wrote: > On 8/13/07, Ron Adam wrote: >> I reconsidered the split term forms a bit more and I think I've come up >> with a better way to think about them. Sometimes a slight conceptual >> shift can make a difference. > >> The basic form is: > >> {name[:][type][alignment_term][,content_modifying_term]} > > That sounds good, but make sure it works in practice; I think you were > already tempted to violate it yourself in your details section. You > used the (alignment term) width as the number of digits before the > decimal, instead of as the field width. I have a test version to test these ideas with. It's uses a centralized parser rather the the distributed method that has been decided on, but it's good for testing the syntax. In some ways it's easier for that. You can leave out either term. So that may have been what you are seeing. There are more details that can be worked out. You can also switch the order of the terms. But I'm leaning towards having it always process the terms left to right. For example you might have... {0:!r,s+30,<30} So in this case, it would first do a repr(), then trim long strings to 30 characters, then left align them in a field 30 characters wide. But I haven't tested the idea of left to right sequential formatting yet as I need to move a lot of stuff around in my test implementation to get that to work. Even with that, the most common cases are just single terms, so it adds capability for those who want it, or need it, without penalizing newbies. {0:f8.2} or {0:^30} Simple expression like these will be what is used 9 out 10 times. >> TYPE: >> The specifier type. One of 'deifsrx'. >> (and any others I left off) > > There should be one that says "just trust the object, and if it > doesn't have a __format__, then gripe". (We wouldn't need to support > arbitrary content_modifying_terms unless it were possible to use more > than the builtin types.) That's the default behavior sense the object gets it first. >> ALIGNMENT TERM: [direction]width[/fill] >> >> direction: is one of '<^>' >> width: a positive integer >> /fill: a character > > > So this assumes fixed-width, with fill? Minimal width with fill for shorter than width items. It expands if the length of the item is longer than width. > Can I leave the width off to say "whatever it takes"? Yes > Can I say "width=whatever it takes, up to 72 chars ... but don't pad > it if you don't need to"? That's the default behavior. > (And once you support variable-width, then minimum is needed again.) > > I'm not sure that variable lengths and alignment even *should* be > supported in the same expression, but it forcing everything to > fixed-width would be enough of a change that it needs an explicit > callout. Alignment is needed for when the length of the value is shorter than the length of the field. So if a field has a minimal width, and a value is shorter than that, it will be used. >> NUMBERS: [sign][0][digits][.decimals][%] > > I read Greg's question: > > How does this work with formats where the number of > digits before the decimal can vary, but before+after > is constant? > > differently, as about significant figures. It may be that 403 and > 14.1 are both valid values, but 403.0 would imply too much precision. > (Would it *always* be OK to write these as 4.03e+2 and 1.41e+1?) I've been avoiding scientific notation so far. :-) So any suggestions on this part will be good. I think the format would be 'e1.2' or even just 'e.2'. It should follow the same pattern if possible. > Maybe the answer is that sig figs are a special case, and need a > template with callbacks instead of a format string ... but that > doesn't feel right. > >> /fill: a character > > I think you need to specify it a bit more than that. Can you use a > comma? Yes, any character after a '/' works. (It looks like the start of the content modifier.) How about > a quote-mark, or a carriage return? Quotes (if they match the the string delimiters) will need to be escaped as well as new lines, but I don't see any reason why they couldn't be used. I'm not sure why you would want to use those, but I lean towards letting the programmer figure out that part. ;-) Cheers, Ron From brett at python.org Wed Aug 15 01:57:19 2007 From: brett at python.org (Brett Cannon) Date: Tue, 14 Aug 2007 16:57:19 -0700 Subject: [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: On 8/14/07, Georg Brandl wrote: > Now that the converted documentation is fairly bug-free, I want to > make the switch. > > I will replace the old Doc/ trees in the trunk and py3k branches > tomorrow, moving over the reST ones found at > svn+ssh://svn.python.org/doctools/Doc-{26,3k}. First, that address is wrong; missing a 'trunk' in there. Second, are we going to keep the docs in a separate tree forever, or is this just for now? I am not thinking so much about the tools, but whether we will need to do two separate commits in order to make code changes *and* change the docs? Or are you going to add an externals dependency in the trees to their respective doc directories? -Brett From greg.ewing at canterbury.ac.nz Wed Aug 15 02:02:35 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Aug 2007 12:02:35 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46C12526.8040807@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B2E265.5080905@ronadam.com> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C10D2D.60705@canterbury.ac.nz> <46C12526.8040807@ronadam.com> Message-ID: <46C2429B.1090507@canterbury.ac.nz> Ron Adam wrote: > > Greg Ewing wrote: > > > How does this work with formats where the number of > > digits before the decimal can vary, but before+after > > is constant? > > I think this is what you're looking for. > > f>15,.3 #15 field width, 3 decimal places, right aligned. No, I'm talking about formats such as "g" where the number of significant digits is fixed, but the position of the decimal point can change depending on the magnitude of the number. That wouldn't fit into your before.after format. >> Also, my feeling about the whole of this is that >> it's too complicated. > > it's because we are doing a lot in a very little space. Yes, and I think you're trying to do a bit too much. The format strings are starting to look like line noise. -- Greg From greg.ewing at canterbury.ac.nz Wed Aug 15 02:15:24 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Aug 2007 12:15:24 +1200 Subject: [Python-3000] Format specifier proposal In-Reply-To: References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> Message-ID: <46C2459C.1000405@canterbury.ac.nz> Guido van Rossum wrote: > Over lunch we discussed putting !coercion first. IMO {foo!r:20} reads > more naturally from left to right It also has the advantage that the common case of 'r' with no other specifications is one character shorter and looks tidier, i.e. {foo!r} rather than {foo:!r}. But either way, I suspect I'll find it difficult to avoid writing it as {foo:r} in the heat of the moment. > On 8/13/07, Talin wrote: > > Where 'coercion' can be 'r' (to convert to repr()), 's' (to convert to > > string.) Is there ever a case where you would need to convert to a string? > > Originally I liked the idea of putting the type letter at the front, > > instead of at the back like it is in 2.5. However, when you think about > > it, it actually makes sense to have it at the back. I'm not so sure about that. Since most of the time it's going to be used as a discriminator that determines how the rest of the format spec is interpreted, it could make more sense to have it at the front. The only reason it's at the back in % formats is because that's the only way of telling where the format spec ends. We don't have that problem here. > > 6) Finally, Guido stressed that he wants to make sure that the > > implementation supports fields within fields, such as: > > > > {0:{1}.{2}} Is that recursive? In other words, can the nested {} contain another full format spec? -- Greg From guido at python.org Wed Aug 15 02:27:37 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Aug 2007 17:27:37 -0700 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46C2459C.1000405@canterbury.ac.nz> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <46C2459C.1000405@canterbury.ac.nz> Message-ID: On 8/14/07, Greg Ewing wrote: > Guido van Rossum wrote: > > Over lunch we discussed putting !coercion first. IMO {foo!r:20} reads > > more naturally from left to right > > It also has the advantage that the common case of > 'r' with no other specifications is one character > shorter and looks tidier, i.e. {foo!r} rather than > {foo:!r}. > > But either way, I suspect I'll find it difficult > to avoid writing it as {foo:r} in the heat of the > moment. I guess __format__ implementations should fall back to a default formatting spec rather than raising an exception when they don't understand the spec passed to them. > > On 8/13/07, Talin wrote: > > > Where 'coercion' can be 'r' (to convert to repr()), 's' (to convert to > > > string.) > > Is there ever a case where you would need to > convert to a string? When the default output produced by __format__ is different from that produced by __str__ or __repr__ (hard to imagine), or when you want to use a string-specific option to pad or truncate (more likely). > > > Originally I liked the idea of putting the type letter at the front, > > > instead of at the back like it is in 2.5. However, when you think about > > > it, it actually makes sense to have it at the back. > > I'm not so sure about that. Since most of the time > it's going to be used as a discriminator that determines > how the rest of the format spec is interpreted, it > could make more sense to have it at the front. This can be decided on a type-by-type basis (except for numbers, which should follow the built-in numbers' example). > The only reason it's at the back in % formats is > because that's the only way of telling where the > format spec ends. We don't have that problem here. But it's still more familiar to read "10.3f" than "f10.3". > > > 6) Finally, Guido stressed that he wants to make sure that the > > > implementation supports fields within fields, such as: > > > > > > {0:{1}.{2}} > > Is that recursive? In other words, can the > nested {} contain another full format spec? We were trying not to open that can of worms. It's probably unnecessary to support that, but it may be easier to support it than to forbid it, and I don't see anything wrong with it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Wed Aug 15 02:56:54 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Aug 2007 12:56:54 +1200 Subject: [Python-3000] Format specifier proposal In-Reply-To: References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> Message-ID: <46C24F56.5050104@canterbury.ac.nz> Guido van Rossum wrote: > On 8/13/07, Andrew James Wade wrote: > >>On Mon, 13 Aug 2007 20:53:26 -0700 >>"Guido van Rossum" wrote: >>>I propose that their >>>__format__ be defined like this: >>> >>> def __format__(self, spec): >>> return self.strftime(spec) >> >>You loose the ability to align the field then. This might be a use case for the chaining of format specs that Ron mentioned. Suppose you could do "{{1:spec1}:spec2}".format(x) which would be equivalent to format(format(x, "spec1"), "spec2") then you could do "{{1:%Y-%m-%d %H:%M:%S}:<20}".format(my_date) and get your date left-aligned in a 20-wide field. (BTW, I'm not sure about using strftime-style formats as-is, since the % chars look out of place in our new format syntax.) -- Greg From greg.ewing at canterbury.ac.nz Wed Aug 15 02:59:45 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 15 Aug 2007 12:59:45 +1200 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <46C25001.6080806@canterbury.ac.nz> Guido van Rossum wrote: > Now, this doesn't support bare \r as line terminator, but I doubt you > care much about that (unless you want to port the email package to Mac > OS 9 :-). Haven't we decided that '\r' still occurs in some cases even on MacOSX? -- Greg From guido at python.org Wed Aug 15 03:41:47 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Aug 2007 18:41:47 -0700 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: <46C25001.6080806@canterbury.ac.nz> References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> <46C25001.6080806@canterbury.ac.nz> Message-ID: On 8/14/07, Greg Ewing wrote: > Guido van Rossum wrote: > > > Now, this doesn't support bare \r as line terminator, but I doubt you > > care much about that (unless you want to port the email package to Mac > > OS 9 :-). > > Haven't we decided that '\r' still occurs in some > cases even on MacOSX? Yes. I was simply describing what works today. The \r option still needs to be added to io.py. But it is in the PEP. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Wed Aug 15 03:44:54 2007 From: janssen at parc.com (Bill Janssen) Date: Tue, 14 Aug 2007 18:44:54 PDT Subject: [Python-3000] Questions about email bytes/str (python 3000) In-Reply-To: References: <200708140422.36818.victor.stinner@haypocalc.com> Message-ID: <07Aug14.184454pdt."57996"@synergy1.parc.xerox.com> > > Let's take an example: multipart (MIME) email with latin-1 and > > base64 (ascii) > > sections. Mix latin-1 and ascii => mix bytes. So the best type > > should be > > bytes. > > > > => bytes > > Except that by the time they're parsed into an email message, they > must be ascii, either encoded as base64 or quoted-printable. We also > have to know at that point the charset being used, so I think it > makes sense to keep everything as strings. Actually, Victor's right here -- it makes more sense to treat them as bytes. It's RFC 821 (SMTP) that requires 7-bit ASCII, not the MIME format. Non-SMTP mail transports do exist, and are popular in various places. Email transported via other transport mechanisms may, for instance, use a Content-Transfer-Encoding of "binary" for some sections of the message. Some parts of the top-most header of the message may be counted on to be encoded as ASCII strings, but not the whole message in general. > > About base64, I agree with Bill Janssen: > > - base64MIME.decode converts string to bytes > > - base64MIME.encode converts bytes to string > > I agree. > > > But decode may accept bytes as input (as base64 modules does): use > > str(value, 'ascii', 'ignore') or str(value, 'ascii', 'strict'). > > Hmm, I'm not sure about this, but I think that .encode() may have to > accept strings. Personally, I think it would avoid more errors if it didn't. Let the user explicitly encode the string to a particular representation before calling base64.encode(). Bill From rrr at ronadam.com Wed Aug 15 03:58:59 2007 From: rrr at ronadam.com (Ron Adam) Date: Tue, 14 Aug 2007 20:58:59 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46C2429B.1090507@canterbury.ac.nz> References: <46B13ADE.7080901@acm.org> <46B35295.1030007@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C10D2D.60705@canterbury.ac.nz> <46C12526.8040807@ronadam.com> <46C2429B.1090507@canterbury.ac.nz> Message-ID: <46C25DE3.6060906@ronadam.com> Greg Ewing wrote: > Ron Adam wrote: >> Greg Ewing wrote: > > >>> How does this work with formats where the number of >>> digits before the decimal can vary, but before+after >>> is constant? >> I think this is what you're looking for. >> >> f>15,.3 #15 field width, 3 decimal places, right aligned. > > No, I'm talking about formats such as "g" where the > number of significant digits is fixed, but the position > of the decimal point can change depending on the magnitude > of the number. That wouldn't fit into your before.after > format. It would probably just a number with no decimal point in it. Something like 'g10' seems simple enough. You will always have the 'g' in this case. >>> Also, my feeling about the whole of this is that >>> it's too complicated. >> it's because we are doing a lot in a very little space. > > Yes, and I think you're trying to do a bit too much. > The format strings are starting to look like line > noise. Do you have a specific example or is it just an overall feeling? One of the motivations for finding something else is because the % formatting terms are confusing to some. A few here have said they need to look them up repeatedly and have difficulty remembering the exact forms and order. And part of it is the suggestion of splitting it up into parts that are interpreted by the objects __format__ method, and a part that are interpreted by the format function. For example the the field alignment part can be handled by the format function, and the value format part can be handled by the __format__ method. It helps to have the alignment part be well defined and completely separate from the content formatter part in this case. And it saves everyone from having to parse and implement alignments in there format methods. I think that is really the biggest reason to do this. I'm not sure you can split up field aligning and numeric formatting that way when using the % style formatting. They are combined too tightly. So each type would need to do both in it's __format__ method. And chances are there will be many types that do one or the other but not both just because it's too much work, or just due to plain laziness. So before we discard this, I'd like to see a full working version with complete __format__ methods for int, float, and str types and any supporting functions they may use. And my apologies if its starting to seem like line noise. I'm not that good at explaining things in simple ways. I tend to add too much detail when I don't need to, or not enough when I do. A complaint I get often enough. But I think this one is fixable by anyone who is a bit better at writing and explaining things in simple ways than I am. :-) Cheers, Ron From rrr at ronadam.com Wed Aug 15 04:12:32 2007 From: rrr at ronadam.com (Ron Adam) Date: Tue, 14 Aug 2007 21:12:32 -0500 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46C24F56.5050104@canterbury.ac.nz> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> <46C24F56.5050104@canterbury.ac.nz> Message-ID: <46C26110.8020001@ronadam.com> Greg Ewing wrote: > Guido van Rossum wrote: >> On 8/13/07, Andrew James Wade wrote: >> >>> On Mon, 13 Aug 2007 20:53:26 -0700 >>> "Guido van Rossum" wrote: > >>>> I propose that their >>>> __format__ be defined like this: >>>> >>>> def __format__(self, spec): >>>> return self.strftime(spec) >>> You loose the ability to align the field then. > > This might be a use case for the chaining of format specs > that Ron mentioned. Suppose you could do > > "{{1:spec1}:spec2}".format(x) > > which would be equivalent to > > format(format(x, "spec1"), "spec2") What I was thinking of was just a simple left to right evaluation order. "{0:spec1, spec2, ... }".format(x) I don't expect this will ever get very long. > then you could do > > "{{1:%Y-%m-%d %H:%M:%S}:<20}".format(my_date) > > and get your date left-aligned in a 20-wide field. So in this case all you would need is... {0:%Y-%m-%d %H:%M:%S,<20} > (BTW, I'm not sure about using strftime-style formats > as-is, since the % chars look out of place in our new > format syntax.) > > -- > Greg > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/rrr%40ronadam.com > > From andrew.j.wade at gmail.com Wed Aug 15 05:02:27 2007 From: andrew.j.wade at gmail.com (Andrew James Wade) Date: Tue, 14 Aug 2007 23:02:27 -0400 Subject: [Python-3000] Format specifier proposal In-Reply-To: References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> Message-ID: <20070814230227.0c9be356.ajwade+py3k@andrew.wade.networklinux.net> On Tue, 14 Aug 2007 09:41:48 -0700 "Guido van Rossum" wrote: > On 8/13/07, Andrew James Wade wrote: > > On Mon, 13 Aug 2007 20:53:26 -0700 > > "Guido van Rossum" wrote: > > > > ... > > > > > One of my favorite examples of non-numeric types are the date, time > > > and datetime types from the datetime module; here I propose that their > > > __format__ be defined like this: > > > > > > def __format__(self, spec): > > > return self.strftime(spec) > > > > You loose the ability to align the field then. What about: > > > > def __format__(self, align_spec, spec="%Y-%m-%d %H:%M:%S"): > > return format(self.strftime(spec), align_spec) > > > > with > > > > def format(value, spec): > > if "," in spec: > > align_spec, custom_spec = spec.split(",",1) > > return value.__format__(align_spec, custom_spec) > > else: > > return value.__format__(spec) > > > > ":,%Y-%m-%d" may be slightly more gross than ":%Y-%m-%d", but on the plus > > side ":30" would mean the same thing across all types. > > Sorry, I really don't like imposing *any* syntactic constraints on the > spec apart from !r and !s. Does this mean that {1!30:%Y-%m-%d} would be legal syntax, that __format__ can do what it pleases with? That'd be great: there's an obvious place for putting standard fields, and another for putting custom formatting where collisions with !r and !s are not a concern: {1!30} {1:%Y-%m-%d} {1:!renewal date: %Y-%m-%d} # no special meaning for ! here. {1!30:%Y-%m-%d} "!" wouldn't necessarily have to be followed by standard codes, but I'm not sure why you'd want to put anything else (aside from !r, !s) there. > You can get the default format with a custom size by using !s:30. > > If you want a custom format *and* padding, just add extra spaces to the spec. That doesn't work for ":%A" or ":%B"; not if you want to pad to a fixed width. I really think you'll have support for the standard string-formatting codes appear in most formatting specifications in some guise or another; they may as well appear in a standard place too. -- Andrew From barry at python.org Wed Aug 15 05:44:26 2007 From: barry at python.org (Barry Warsaw) Date: Tue, 14 Aug 2007 23:44:26 -0400 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 14, 2007, at 12:52 PM, Guido van Rossum wrote: > On 8/14/07, Barry Warsaw wrote: >> It would have been perfect, I think, if I could have opened the file >> in text mode so that read() gave me strings, with universal newlines >> and preservation of line endings (i.e. no translation to \n). > > You can do that already, by passing newline="\n" to the open() > function when using text mode. Cute, but obscure. I'm not sure I like it as the ultimate way of spelling these semantics. > Try this script for a demo: > > f = open("@", "wb") > f.write("bare nl\n" > "crlf\r\n" > "bare nl\n" > "crlf\r\n") > f.close() > > f = open("@", "r") # default, universal newlines mode > print(f.readlines()) > f.close() > > f = open("@", "r", newline="\n") # recognize only \n as newline > print(f.readlines()) > f.close() > > This outputs: > > ['bare nl\n', 'crlf\n', 'bare nl\n', 'crlf\n'] > ['bare nl\n', 'crlf\r\n', 'bare nl\n', 'crlf\r\n'] > > Now, this doesn't support bare \r as line terminator, but I doubt you > care much about that (unless you want to port the email package to Mac > OS 9 :-). Naw, I don't, though someday we'll get just such a file and a bug report about busted line endings ;). There's still a problem though: this works for .readlines() but not for .read() which unconditionally converts \r\n to \n. The FeedParser uses .read() and I think the behavior should be the same for both methods. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsJ2mnEjvBPtnXfVAQIL8AP/YhVUAoR9yWMniTUls5thI4ubUmPJlln4 R2cDOCw97lsYEDBk80bS2d/ZgncG5EnleIBmg+UtkEoSduhTOLZjot3cgmfy1DqX LHFfUCe8AnHLjuZBV7RbOcpn14X8fGtqNkYq25yvyOIvIYdIBP64ZjbyFD+kZhTA Ss8e10D+YJw= =otBw -----END PGP SIGNATURE----- From guido at python.org Wed Aug 15 06:03:40 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Aug 2007 21:03:40 -0700 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 8/14/07, Barry Warsaw wrote: > On Aug 14, 2007, at 12:52 PM, Guido van Rossum wrote: > > On 8/14/07, Barry Warsaw wrote: > >> It would have been perfect, I think, if I could have opened the file > >> in text mode so that read() gave me strings, with universal newlines > >> and preservation of line endings (i.e. no translation to \n). > > > > You can do that already, by passing newline="\n" to the open() > > function when using text mode. > > Cute, but obscure. I'm not sure I like it as the ultimate way of > spelling these semantics. It was the best we could come up with in the 3 minutes we devoted to this at PyCon when drafting PEP 3116. If you have a better idea, please don't hide it! > > Try this script for a demo: > > > > f = open("@", "wb") > > f.write("bare nl\n" > > "crlf\r\n" > > "bare nl\n" > > "crlf\r\n") > > f.close() > > > > f = open("@", "r") # default, universal newlines mode > > print(f.readlines()) > > f.close() > > > > f = open("@", "r", newline="\n") # recognize only \n as newline > > print(f.readlines()) > > f.close() > > > > This outputs: > > > > ['bare nl\n', 'crlf\n', 'bare nl\n', 'crlf\n'] > > ['bare nl\n', 'crlf\r\n', 'bare nl\n', 'crlf\r\n'] > > > > Now, this doesn't support bare \r as line terminator, but I doubt you > > care much about that (unless you want to port the email package to Mac > > OS 9 :-). > > Naw, I don't, though someday we'll get just such a file and a bug > report about busted line endings ;). > > There's still a problem though: this works for .readlines() but not > for .read() which unconditionally converts \r\n to \n. The > FeedParser uses .read() and I think the behavior should be the same > for both methods. Ow, that's a bug! I'll look into fixing it; this was unintentional! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From talin at acm.org Wed Aug 15 06:27:08 2007 From: talin at acm.org (Talin) Date: Tue, 14 Aug 2007 21:27:08 -0700 Subject: [Python-3000] PEP 3101 Updated Message-ID: <46C2809C.3000806@acm.org> A new version is up, incorporating material from the various discussions on this list: http://www.python.org/dev/peps/pep-3101/ Diffs are here: http://svn.python.org/view/peps/trunk/pep-3101.txt?rev=57044&r1=56535&r2=57044 -- Talin From barry at python.org Wed Aug 15 06:33:46 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 15 Aug 2007 00:33:46 -0400 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <7375B721-C62F-49ED-945B-5A9107246D6C@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 15, 2007, at 12:03 AM, Guido van Rossum wrote: > On 8/14/07, Barry Warsaw wrote: >> On Aug 14, 2007, at 12:52 PM, Guido van Rossum wrote: >>> On 8/14/07, Barry Warsaw wrote: >>>> It would have been perfect, I think, if I could have opened the >>>> file >>>> in text mode so that read() gave me strings, with universal >>>> newlines >>>> and preservation of line endings (i.e. no translation to \n). >>> >>> You can do that already, by passing newline="\n" to the open() >>> function when using text mode. >> >> Cute, but obscure. I'm not sure I like it as the ultimate way of >> spelling these semantics. > > It was the best we could come up with in the 3 minutes we devoted to > this at PyCon when drafting PEP 3116. If you have a better idea, > please don't hide it! I think you (almost) suggested it in your first message! Add a flag called preserve_eols that defaults to False, is ignored unless universal newline mode is turned on, and when True, disables the replacement on input. >> There's still a problem though: this works for .readlines() but not >> for .read() which unconditionally converts \r\n to \n. The >> FeedParser uses .read() and I think the behavior should be the same >> for both methods. > > Ow, that's a bug! I'll look into fixing it; this was unintentional! Oh, excellent! I think that will nicely take care of email package's needs and will allow me to remove the crufty conversion. Thanks! - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsKCK3EjvBPtnXfVAQLgcgP/eFUci/jmPqEY5TDE1bUHgiMhY3F1GxXX epYc4Q7wDOf05Ky1pjmRDRMkfQkalL/seP58IAW7b1FaWT98bSP56vrLcyuy+oje 23e7bqggEikfS/+E15U7E/xz+h1qbKdEr7c43/sl/s8flBE47MHXAI/sMKKfvS+6 kVqHKWXX0Lk= =81z3 -----END PGP SIGNATURE----- From andrew.j.wade at gmail.com Wed Aug 15 06:42:04 2007 From: andrew.j.wade at gmail.com (Andrew James Wade) Date: Wed, 15 Aug 2007 00:42:04 -0400 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46C26110.8020001@ronadam.com> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> <46C24F56.5050104@canterbury.ac.nz> <46C26110.8020001@ronadam.com> Message-ID: <20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net> On Tue, 14 Aug 2007 21:12:32 -0500 Ron Adam wrote: > > > Greg Ewing wrote: > > Guido van Rossum wrote: > >> On 8/13/07, Andrew James Wade wrote: > >> > >>> On Mon, 13 Aug 2007 20:53:26 -0700 > >>> "Guido van Rossum" wrote: > > > >>>> I propose that their > >>>> __format__ be defined like this: > >>>> > >>>> def __format__(self, spec): > >>>> return self.strftime(spec) > >>> You loose the ability to align the field then. > > > > This might be a use case for the chaining of format specs > > that Ron mentioned. Suppose you could do > > > > "{{1:spec1}:spec2}".format(x) > > > > which would be equivalent to > > > > format(format(x, "spec1"), "spec2") That would be a solution to my concerns, though that would have to be: "{ {1:spec1}:spec2}" > > > What I was thinking of was just a simple left to right evaluation order. > > "{0:spec1, spec2, ... }".format(x) > > I don't expect this will ever get very long. The first __format__ will return a str, so chains longer than 2 don't make a lot of sense. And the delimiter character should be allowed in spec1; limiting the length of the chain to 2 allows that without escaping: "{0:spec1-with-embedded-comma,}".format(x) My scheme did the same sort of thing with spec1 and spec2 reversed. Your order makes more intuitive sense; I chose my order because I wanted the syntax to be a generalization of formatting strings. Handling the chaining within the __format__ methods should be all of two lines of boilerplate per method. -- Andrew From guido at python.org Wed Aug 15 06:56:23 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 14 Aug 2007 21:56:23 -0700 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase Message-ID: I thought some more about the universal newlines situation, and I think I can handle all the use cases with a single 'newline' parameter. The use cases are: (A) input use cases: (1) newline=None: input with default universal newlines mode; lines may end in \r, \n, or \r\n, and these are translated to \n. (2) newline='': input with untranslated universal newlines mode; lines may end in \r, \n, or \r\n, and these are returned untranslated. (3) newline='\r', newline='\n', newline='\r\n': input lines must end with the given character(s), and these are translated to \n. (B) output use cases: (1) newline=None: every \n written is translated to os.linesep. (2) newline='': no translation takes place. (3) newline='\r', newline='\n', newline='\r\n': every \n written is translated to the value of newline. Note that cases (2) are new, and case (3) changes from the current PEP and/or from the current implementation (which seems to deviate from the PEP). Also note that it doesn't matter whether .readline(), .read() or .read(N) is used. The PEP is currently unclear on this and the implementation is wrong. Proposed language for the PEP: ``.__init__(self, buffer, encoding=None, newline=None)`` ``buffer`` is a reference to the ``BufferedIOBase`` object to be wrapped with the ``TextIOWrapper``. ``encoding`` refers to an encoding to be used for translating between the byte-representation and character-representation. If it is ``None``, then the system's locale setting will be used as the default. ``newline`` can be ``None``, ``''``, ``'\n'``, ``'\r'``, or ``'\r\n'``; all other values are illegal. It controls the handling of line endings. It works as follows: * On input, if ``newline`` is ``None``, universal newlines mode is enabled. Lines in the input can end in ``'\n'``, ``'\r'``, or ``'\r\n'``, and these are translated into ``'\n'`` before being returned to the caller. If it is ``''``, universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller translated to ``'\n'``. * On output, if ``newline`` is ``None``, any ``'\n'`` characters written are translated to the system default line separator, ``os.linesep``. If ``newline`` is ``''``, no translation takes place. If ``newline`` is any of the other legal values, any ``'\n'`` characters written are translated to the given string. Further notes on the ``newline`` parameter: * ``'\r'`` support is still needed for some OSX applications that produce files using ``'\r'`` line endings; Excel (when exporting to text) and Adobe Illustrator EPS files are the most common examples. * If translation is enabled, it happens regardless of which method is called for reading or writing. For example, {{{f.read()}}} will always produce the same result as {{{''.join(f.readlines())}}}. * If universal newlines without translation are requested on input (i.e. ``newline=''``), if a system read operation returns a buffer ending in ``'\r'``, another system read operation is done to determine whether it is followed by ``'\n'`` or not. In universal newlines mode with translation, the second system read operation may be postponed until the next read request, and if the following system read operation returns a buffer starting with ``'\n'``, that character is simply discarded. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Wed Aug 15 07:50:30 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 15 Aug 2007 01:50:30 -0400 Subject: [Python-3000] Questions about email bytes/str (python 3000) In-Reply-To: References: <200708140422.36818.victor.stinner@haypocalc.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 14, 2007, at 11:39 AM, Barry Warsaw wrote: > I will create a sandbox branch and apply my changes later today so > we have something concrete to look at. Done. See: http://svn.python.org/view/sandbox/trunk/emailpkg/5_0-exp/ I'm down to 5 failures and 6 errors (in test_email.py only), and I think most if not all of them are related to the broken header splittable stuff. Please take a look. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsKUJnEjvBPtnXfVAQISBQQAnEKytL8fqLbe+HADIyIBr1gDFtzbc4nw zY4oEDPV+d4zFiAj9Ap5uePCfQxnqRdBMsHhkbCkB9k0XSDoWv2NxC10KLdE2CEO YMLB+BB5uMjTCkHhaUVr/rIdKv/4LKZFy1v9dJv5X3BF5clugWa3L+tioe0kPk9X jDkjZKc59LE= =73uN -----END PGP SIGNATURE----- From rrr at ronadam.com Wed Aug 15 08:52:33 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 15 Aug 2007 01:52:33 -0500 Subject: [Python-3000] Format specifier proposal In-Reply-To: <20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> <46C24F56.5050104@canterbury.ac.nz> <46C26110.8020001@ronadam.com> <20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net> Message-ID: <46C2A2B1.1010708@ronadam.com> Andrew James Wade wrote: > On Tue, 14 Aug 2007 21:12:32 -0500 > Ron Adam wrote: >> What I was thinking of was just a simple left to right evaluation order. >> >> "{0:spec1, spec2, ... }".format(x) >> >> I don't expect this will ever get very long. > > The first __format__ will return a str, so chains longer than 2 don't > make a lot of sense. And the delimiter character should be allowed in > spec1; limiting the length of the chain to 2 allows that without escaping: > > "{0:spec1-with-embedded-comma,}".format(x) > > My scheme did the same sort of thing with spec1 and spec2 reversed. > Your order makes more intuitive sense; I chose my order because I > wanted the syntax to be a generalization of formatting strings. > > Handling the chaining within the __format__ methods should be all of > two lines of boilerplate per method. I went ahead and tried this out and it actually cleared up some difficulty in organizing the parsing code. That was a very nice surprise. :) (actual doctest) >>> import time >>> class GetTime(object): ... def __init__(self, time=time.gmtime()): ... self.time = time ... def __format__(self, spec): ... return fstr(time.strftime(spec, self.time)) >>> start = GetTime(time.gmtime(1187154773.0085449)) >>> fstr("Start: {0:%d/%m/%Y %H:%M:%S,<30}").format(start) 'Start: 15/08/2007 05:12:53 ' After each term is returned from the __format__ call, the results __format__ method is called with the next specifier. GetTime.__format__ returns a string. str.__format__, aligns it. A nice left to right sequence of events. The chaining is handled before the __format__ method calls so each __format__ method only needs to be concerned with doing it's own thing. The alignment is no longer special cased as it's just part of the string formatter. No other types need it as long as their __format__ methods return strings. Which means nobody needs to write parsers to handle field alignments. If you had explicit conversions for other types besides !r and !s, it might be useful to do things like the following. Suppose you had text data with floats in it along with some other junk. You could do the following... # Purposely longish example just to show sequence of events. "The total is: ${0:s-10,!f,(.2),>12}".format(line) Which would grab 10 characters from the end of the line, convert it to a float, the floats __format__ method is called which formats it to 2 decimal places, then it's right aligned in a field 12 characters wide. That could be shorted to {0:s-10,f(.2),>12} as long as strings types know how to convert to float. Or if you want the () to line up on both sides, you'd probably just use {0:s-10,f(7.2)}. This along with the nested substitutions Guido wants, this would be a pretty powerful mini formatting language like that Talon hinted at earlier. I don't think there is any need to limit the number of terms, that sort of spoils the design. The two downsides of this are it's a bit different from what users are use to, and we would need to escape commas inside of specifiers somehow. It simplifies the parsing and formatting code underneath like I was hoping, but it may scare some people off. But the simple common cases are still really simple, so I hope not. BTW... I don't think I can add anything more to this idea. The rest is just implementation details and documentation. :) Cheers, Ron From brett at python.org Wed Aug 15 09:47:40 2007 From: brett at python.org (Brett Cannon) Date: Wed, 15 Aug 2007 00:47:40 -0700 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: On 8/14/07, Guido van Rossum wrote: > I thought some more about the universal newlines situation, and I > think I can handle all the use cases with a single 'newline' > parameter. The use cases are: > > (A) input use cases: > > (1) newline=None: input with default universal newlines mode; lines > may end in \r, \n, or \r\n, and these are translated to \n. > > (2) newline='': input with untranslated universal newlines mode; lines > may end in \r, \n, or \r\n, and these are returned untranslated. > > (3) newline='\r', newline='\n', newline='\r\n': input lines must end > with the given character(s), and these are translated to \n. > > (B) output use cases: > > (1) newline=None: every \n written is translated to os.linesep. > > (2) newline='': no translation takes place. > > (3) newline='\r', newline='\n', newline='\r\n': every \n written is > translated to the value of newline. > I like the options, but I would swap the meaning of None and the empty string. My reasoning for this is that for option 3 it says to me "here is a string representing EOL, and make it \n". So I would think of the empty string as, "I don't know what EOL is, but I want it translated to \n". Then None means, "I don't want any translation done" by the fact that the argument is not a string. In other words, the existence of a string argument means you want EOL translated to \n, and the specific value of 'newline' specifying how to determine what EOL is. -Brett From martin at v.loewis.de Wed Aug 15 10:37:50 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 15 Aug 2007 10:37:50 +0200 Subject: [Python-3000] PEP 3131 is implemented Message-ID: <46C2BB5E.7060102@v.loewis.de> I just implemented PEP 3131 (non-ASCII identifiers). There are several problems with displaying error messages, in particular when the terminal cannot render the string; if anybody wants to work on this, please go ahead. Regards, Martin From eric+python-dev at trueblade.com Wed Aug 15 11:04:32 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 15 Aug 2007 05:04:32 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C2809C.3000806@acm.org> References: <46C2809C.3000806@acm.org> Message-ID: <46C2C1A0.4060002@trueblade.com> Talin wrote: > A new version is up, incorporating material from the various discussions > on this list: > > http://www.python.org/dev/peps/pep-3101/ I have a number of parts of this implemented. I'm refactoring the original PEP 3101 sandbox code to get it working. Mostly it involves un-optimizing string handling in the original work :( These tests all pass: self.assertEquals('{0[{1}]}'.format('abcdefg', 4), 'e') self.assertEquals('{foo[{bar}]}'.format(foo='abcdefg', bar=4), 'e') self.assertEqual("My name is {0}".format('Fred'), "My name is Fred") self.assertEqual("My name is {0[name]}".format(dict(name='Fred')), "My name is Fred") self.assertEqual("My name is {0} :-{{}}".format('Fred'), "My name is Fred :-{}") I have not added the !r syntax yet. I've only spent 5 minutes looking at this so far, but I can't figure out where to add a __format__ to object. If someone could point me to the right place, that would be helpful. Thanks. From barry at python.org Wed Aug 15 14:25:49 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 15 Aug 2007 08:25:49 -0400 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 15, 2007, at 3:47 AM, Brett Cannon wrote: > On 8/14/07, Guido van Rossum wrote: >> I thought some more about the universal newlines situation, and I >> think I can handle all the use cases with a single 'newline' >> parameter. The use cases are: >> >> (A) input use cases: >> >> (1) newline=None: input with default universal newlines mode; lines >> may end in \r, \n, or \r\n, and these are translated to \n. >> >> (2) newline='': input with untranslated universal newlines mode; >> lines >> may end in \r, \n, or \r\n, and these are returned untranslated. >> >> (3) newline='\r', newline='\n', newline='\r\n': input lines must end >> with the given character(s), and these are translated to \n. >> >> (B) output use cases: >> >> (1) newline=None: every \n written is translated to os.linesep. >> >> (2) newline='': no translation takes place. >> >> (3) newline='\r', newline='\n', newline='\r\n': every \n written is >> translated to the value of newline. >> > > I like the options, but I would swap the meaning of None and the empty > string. My reasoning for this is that for option 3 it says to me > "here is a string representing EOL, and make it \n". So I would think > of the empty string as, "I don't know what EOL is, but I want it > translated to \n". Then None means, "I don't want any translation > done" by the fact that the argument is not a string. In other words, > the existence of a string argument means you want EOL translated to > \n, and the specific value of 'newline' specifying how to determine > what EOL is. What Brett said. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsLwznEjvBPtnXfVAQKMQAP9FzztQ09re2pLBN/uNKrLCf2i5Z1ENZQU Rbfwv8Ek2ZcBurvDht8Oyj3wgOzOKUhk6XfHdHD0Mf3CW9XL6dMvSZHQOv3sORQF Fh6MI4B9HezL/Fuy2C9OenM0TaYHkH5aoYagIjM9/LOezEkxliHU/gOMGY4657dG Turqz+xPunw= =xC3V -----END PGP SIGNATURE----- From barry at python.org Wed Aug 15 14:28:56 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 15 Aug 2007 08:28:56 -0400 Subject: [Python-3000] PEP 3131 is implemented In-Reply-To: <46C2BB5E.7060102@v.loewis.de> References: <46C2BB5E.7060102@v.loewis.de> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 15, 2007, at 4:37 AM, Martin v. L?wis wrote: > I just implemented PEP 3131 (non-ASCII identifiers). > > There are several problems with displaying error messages, > in particular when the terminal cannot render the string; > if anybody wants to work on this, please go ahead. I'm not sure this is related only to PEP 3131 changes (I haven't tried it yet), but I hit a similar problem when I was working on the email package. I think I posted a message about it in a separate thread. If an exception gets raised with a message containing characters that can't be printed on your terminal, you get a nasty exception inside io, which obscures the real exception. The resulting traceback makes it difficult to debug what's going on. I haven't looked deeper but / my/ solution was to repr the message before instantiating the exception. IWBNI Python itself Did Something Better. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsLxiHEjvBPtnXfVAQLSkAQAuC0UAWwFb5kC6uQb9zhCm4zH/BMKaN1k hb6PheDHHl2KnwKoVB+Lw3XrBlbrvZotpYnThQEAG4vNtW92O59zf1uxtRwFo16Q dvQhqwx4fAobWWQYkIK1F7i6SaEyHa+8D8iXy33RTcZKkwKvD69miSFyGxEyHq2x 2zH1Uk+qzos= =P+66 -----END PGP SIGNATURE----- From lists at cheimes.de Wed Aug 15 15:28:05 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 15 Aug 2007 15:28:05 +0200 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: Brett Cannon wrote: > I like the options, but I would swap the meaning of None and the empty > string. My reasoning for this is that for option 3 it says to me > "here is a string representing EOL, and make it \n". So I would think > of the empty string as, "I don't know what EOL is, but I want it > translated to \n". Then None means, "I don't want any translation > done" by the fact that the argument is not a string. In other words, > the existence of a string argument means you want EOL translated to > \n, and the specific value of 'newline' specifying how to determine > what EOL is. I like to propose some constants which should be used instead of the strings: MAC = '\r' UNIX = '\n' WINDOWS = '\r\n' UNIVERSAL = '' NOTRANSLATE = None I think that open(filename, newline=io.UNIVERSAL) or open(filename, newline=io.WINDOWS) is much more readable than open(filename, newline=''). Besides I always forget if Windows is '\r\n' or '\n\r'. *g* Christian From g.brandl at gmx.net Wed Aug 15 15:36:56 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 15 Aug 2007 15:36:56 +0200 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: Brett Cannon schrieb: > On 8/14/07, Guido van Rossum wrote: >> I thought some more about the universal newlines situation, and I >> think I can handle all the use cases with a single 'newline' >> parameter. The use cases are: >> >> (A) input use cases: >> >> (1) newline=None: input with default universal newlines mode; lines >> may end in \r, \n, or \r\n, and these are translated to \n. >> >> (2) newline='': input with untranslated universal newlines mode; lines >> may end in \r, \n, or \r\n, and these are returned untranslated. >> >> (3) newline='\r', newline='\n', newline='\r\n': input lines must end >> with the given character(s), and these are translated to \n. >> >> (B) output use cases: >> >> (1) newline=None: every \n written is translated to os.linesep. >> >> (2) newline='': no translation takes place. >> >> (3) newline='\r', newline='\n', newline='\r\n': every \n written is >> translated to the value of newline. >> > > I like the options, but I would swap the meaning of None and the empty > string. My reasoning for this is that for option 3 it says to me > "here is a string representing EOL, and make it \n". So I would think > of the empty string as, "I don't know what EOL is, but I want it > translated to \n". Then None means, "I don't want any translation > done" by the fact that the argument is not a string. In other words, > the existence of a string argument means you want EOL translated to > \n, and the specific value of 'newline' specifying how to determine > what EOL is. I'd use None and "\r"/... as proposed, but "U" instead of the empty string for universal newline mode. "U" already has that established meaning, and you don't have to remember the difference between the two (false) values "" and None. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Wed Aug 15 16:05:12 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 15 Aug 2007 16:05:12 +0200 Subject: [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: Brett Cannon schrieb: > On 8/14/07, Georg Brandl wrote: >> Now that the converted documentation is fairly bug-free, I want to >> make the switch. >> >> I will replace the old Doc/ trees in the trunk and py3k branches >> tomorrow, moving over the reST ones found at >> svn+ssh://svn.python.org/doctools/Doc-{26,3k}. > > First, that address is wrong; missing a 'trunk' in there. Sorry again. > Second, are we going to keep the docs in a separate tree forever, or > is this just for now? They will be moved (in a few minutes...) to the location where the Latex docs are now. > I am not thinking so much about the tools, but > whether we will need to do two separate commits in order to make code > changes *and* change the docs? Or are you going to add an externals > dependency in the trees to their respective doc directories? No separate commits will be needed to commit changes to the docs. However, the tool to build the docs will not be in the tree under Doc/, but continue to be maintained in the doctools/ toplevel project. I spoke with Martin about including them as externals, but we agreed that they are not needed and cost too much time on every "svn up". Instead, the Doc/ makefile checks out the tools in a separate directory and runs them from there. (The Doc/README.txt file explains this in more detail.) Cheers, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Wed Aug 15 16:33:21 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 15 Aug 2007 16:33:21 +0200 Subject: [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: Georg Brandl schrieb: > Now that the converted documentation is fairly bug-free, I want to > make the switch. > > I will replace the old Doc/ trees in the trunk and py3k branches > tomorrow, moving over the reST ones found at > svn+ssh://svn.python.org/doctools/Doc-{26,3k}. > > Neal will change his build scripts, so that the 2.6 and 3.0 devel > documentation pages at docs.python.org will be built from these new > trees soon. Okay, I made the switch. I tagged the state of both Python branches before the switch as tags/py{26,3k}-before-rstdocs/. >From now on, I'll make changes that apply to 2.6 and 3.0 only in the trunk and hope that svnmerge will continue to work. I'll also handle the backport of doc changes for bugfixes to the 2.5 branch if you drop me a mail which revision I should backport. Cheers, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From python3now at gmail.com Wed Aug 15 16:38:50 2007 From: python3now at gmail.com (James Thiele) Date: Wed, 15 Aug 2007 07:38:50 -0700 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C2809C.3000806@acm.org> References: <46C2809C.3000806@acm.org> Message-ID: <8f01efd00708150738y36deee91v918bfd3f80944d9b@mail.gmail.com> I think the example: "My name is {0.name}".format(file('out.txt')) Would be easier to understand if you added: Which would produce: "My name is 'out.txt'" On 8/14/07, Talin wrote: > A new version is up, incorporating material from the various discussions > on this list: > > http://www.python.org/dev/peps/pep-3101/ > > Diffs are here: > > http://svn.python.org/view/peps/trunk/pep-3101.txt?rev=57044&r1=56535&r2=57044 > > > -- Talin > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/python3now%40gmail.com > From python3now at gmail.com Wed Aug 15 16:52:32 2007 From: python3now at gmail.com (James Thiele) Date: Wed, 15 Aug 2007 07:52:32 -0700 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C2C1A0.4060002@trueblade.com> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> Message-ID: <8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com> The section on the explicit conversion flag contains the following line: These flags are typically placed before the format specifier: Where else can they be placed? Also there is no description of what action (if any) is taken if an unknown explicit conversion flag is encoubtered. On 8/15/07, Eric Smith wrote: > Talin wrote: > > A new version is up, incorporating material from the various discussions > > on this list: > > > > http://www.python.org/dev/peps/pep-3101/ > > I have a number of parts of this implemented. I'm refactoring the > original PEP 3101 sandbox code to get it working. Mostly it involves > un-optimizing string handling in the original work :( > > These tests all pass: > > self.assertEquals('{0[{1}]}'.format('abcdefg', 4), 'e') > self.assertEquals('{foo[{bar}]}'.format(foo='abcdefg', bar=4), 'e') > self.assertEqual("My name is {0}".format('Fred'), "My name is Fred") > self.assertEqual("My name is {0[name]}".format(dict(name='Fred')), > "My name is Fred") > self.assertEqual("My name is {0} :-{{}}".format('Fred'), > "My name is Fred :-{}") > > I have not added the !r syntax yet. > > I've only spent 5 minutes looking at this so far, but I can't figure out > where to add a __format__ to object. If someone could point me to the > right place, that would be helpful. > > Thanks. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/python3now%40gmail.com > From fdrake at acm.org Wed Aug 15 16:54:01 2007 From: fdrake at acm.org (Fred Drake) Date: Wed, 15 Aug 2007 10:54:01 -0400 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: <436D659B-DBF2-4342-AAFB-966F7FA44F86@acm.org> On Aug 15, 2007, at 9:28 AM, Christian Heimes wrote: > I like to propose some constants which should be used instead of the > strings: +1 for this. This should make code easier to read, too; not everyone spends time with line-oriented I/O, and the strings are just magic numbers in that case. -Fred -- Fred Drake From steven.bethard at gmail.com Wed Aug 15 17:02:32 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 15 Aug 2007 09:02:32 -0600 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: On 8/15/07, Georg Brandl wrote: > I'd use None and "\r"/... as proposed, but "U" instead of the empty string > for universal newline mode. I know that "U" already has this meaning as a differently named parameter, but if I saw something like:: open(file_name, newline='U') my first intuition would be that it's some weird file format where each chunk of the file is delimited by letter U. Probably just a result of dealing with too many bad file formats though. ;-) STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From jimjjewett at gmail.com Wed Aug 15 18:07:22 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 15 Aug 2007 12:07:22 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46C2325D.1010209@ronadam.com> References: <46B13ADE.7080901@acm.org> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C2325D.1010209@ronadam.com> Message-ID: On 8/14/07, Ron Adam wrote: > Jim Jewett wrote: > > On 8/13/07, Ron Adam wrote: > >> {name[:][type][alignment_term][,content_modifying_term]} > > ... You used the (alignment term) > > width as the number of digits before the decimal, > > instead of as the field width. > You can leave out either term. So that may have > been what you are seeing. I thought I was going based on your spec, rather than the examples. > {0:f8.2} Because there is no comma, this should be a field of width 8 -- two after a decimal point, the decimal point itself, and at most 5 digits (or 4 and sign) before the the decimal point I (mis?)read your messge as special-casing float to 8 digits before the decimal point, for a total field width of 11. ... > Minimal width with fill for shorter than width items. It > expands if the length of the item is longer than width. ... > > Can I say "width=whatever it takes, up to 72 chars ... > > but don't pad it if you don't need to"? > That's the default behavior. If width is only a minimum, where did it get the maximum of 72 chars part? > > I'm not sure that variable lengths and alignment even > > *should* be supported in the same expression, but if > > forcing everything to fixed-width would be enough of > > a change that it needs an explicit callout. > Alignment is needed for when the length of the value is shorter than the > length of the field. So if a field has a minimal width, and a value is > shorter than that, it will be used. I should have been more explicit -- variable length *field*. How can one report say "use up to ten characters if you need them, but only use three if that is all you need" and another report say "use exactly ten characters; right align and fill with spaces." -jJ From barry at python.org Wed Aug 15 18:11:18 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 15 Aug 2007 12:11:18 -0400 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 15, 2007, at 9:28 AM, Christian Heimes wrote: > I like to propose some constants which should be used instead of the > strings: > > MAC = '\r' > UNIX = '\n' > WINDOWS = '\r\n' > UNIVERSAL = '' > NOTRANSLATE = None > > I think that open(filename, newline=io.UNIVERSAL) or open(filename, > newline=io.WINDOWS) is much more readable than open(filename, > newline=''). Besides I always forget if Windows is '\r\n' or '\n > \r'. *g* Yes, excellent idea. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsMlpnEjvBPtnXfVAQINBAP/en1BYxU9wKErov26dyqo8snJLNnregEO YVP/8b9EM4csEMAJbO/pOBjsOuub/TO5h7nCdiuV0GAGTAzzt4kICHr/cEVGKnOU dCd949uTLeIYVkgJnPnJ/ynE5Q30uMIIysXBbrbNx3rWJt74fNBDuF0xLHgw4d0O cHvT5rzmdvs= =DwhF -----END PGP SIGNATURE----- From eric+python-dev at trueblade.com Wed Aug 15 19:02:55 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 15 Aug 2007 13:02:55 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <8f01efd00708150738y36deee91v918bfd3f80944d9b@mail.gmail.com> References: <46C2809C.3000806@acm.org> <8f01efd00708150738y36deee91v918bfd3f80944d9b@mail.gmail.com> Message-ID: <46C331BF.2020104@trueblade.com> James Thiele wrote: > I think the example: > > "My name is {0.name}".format(file('out.txt')) > > Would be easier to understand if you added: > > Which would produce: > > "My name is 'out.txt'" I agree. Also, the example a couple of paragraphs down: "My name is {0[name]}".format(dict(name='Fred')) should show the expected output: "My name is Fred" I was adding test cases from the PEP last night, and I ignored the file one because I didn't want to mess with files. I've looked around for a replacement, but I couldn't find a built in type with an attribute that would be easy to test. Maybe we could stick with file, and use sys.stdin: "File name is {0.name}".format(sys.stdin) which would produce: 'File name is 0' I don't know if the "0" is platform dependent or not. If anyone has an example of a builtin (or standard module) type, variable, or whatever that has an attribute that can have a known value, I'd like to see it. When working on this, I notice that in 2.3.3 (on the same machine), sys.stdin.name is '', but in py3k it's 0. Not sure if that's a bug or intentional. In any event, if we leave this example in the PEP, not only should we include the expected output, it should probably be changed to use "open" instead of "file": "My name is {0.name}".format(open('out.txt')) since I think file(filename) is deprecated (but still works). At least I thought it was deprecated, now that I look around I can't find any mention of it. Eric. From brett at python.org Wed Aug 15 19:16:10 2007 From: brett at python.org (Brett Cannon) Date: Wed, 15 Aug 2007 10:16:10 -0700 Subject: [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: On 8/15/07, Georg Brandl wrote: > Brett Cannon schrieb: > > On 8/14/07, Georg Brandl wrote: > >> Now that the converted documentation is fairly bug-free, I want to > >> make the switch. > >> > >> I will replace the old Doc/ trees in the trunk and py3k branches > >> tomorrow, moving over the reST ones found at > >> svn+ssh://svn.python.org/doctools/Doc-{26,3k}. > > > > First, that address is wrong; missing a 'trunk' in there. > > Sorry again. > Not a problem. I also noticed, though, that the user (pythondev) is missing as well. =) > > Second, are we going to keep the docs in a separate tree forever, or > > is this just for now? > > They will be moved (in a few minutes...) to the location where the > Latex docs are now. > Yep, just did an update. > > I am not thinking so much about the tools, but > > whether we will need to do two separate commits in order to make code > > changes *and* change the docs? Or are you going to add an externals > > dependency in the trees to their respective doc directories? > > No separate commits will be needed to commit changes to the docs. > However, the tool to build the docs will not be in the tree under Doc/, > but continue to be maintained in the doctools/ toplevel project. > OK. > I spoke with Martin about including them as externals, but we agreed that > they are not needed and cost too much time on every "svn up". Instead, > the Doc/ makefile checks out the tools in a separate directory and runs > them from there. (The Doc/README.txt file explains this in more detail.) Seems simple enough! Thanks again for doing this, Georg (and the doc SIG)! -Brett From rhamph at gmail.com Wed Aug 15 19:20:27 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 15 Aug 2007 11:20:27 -0600 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46C2A2B1.1010708@ronadam.com> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> <46C24F56.5050104@canterbury.ac.nz> <46C26110.8020001@ronadam.com> <20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net> <46C2A2B1.1010708@ronadam.com> Message-ID: On 8/15/07, Ron Adam wrote: > > > Andrew James Wade wrote: > > On Tue, 14 Aug 2007 21:12:32 -0500 > > Ron Adam wrote: > > >> What I was thinking of was just a simple left to right evaluation order. > >> > >> "{0:spec1, spec2, ... }".format(x) > >> > >> I don't expect this will ever get very long. > > > > The first __format__ will return a str, so chains longer than 2 don't > > make a lot of sense. And the delimiter character should be allowed in > > spec1; limiting the length of the chain to 2 allows that without escaping: > > > > "{0:spec1-with-embedded-comma,}".format(x) > > > > My scheme did the same sort of thing with spec1 and spec2 reversed. > > Your order makes more intuitive sense; I chose my order because I > > wanted the syntax to be a generalization of formatting strings. > > > > Handling the chaining within the __format__ methods should be all of > > two lines of boilerplate per method. > > I went ahead and tried this out and it actually cleared up some difficulty > in organizing the parsing code. That was a very nice surprise. :) > > (actual doctest) > > >>> import time > >>> class GetTime(object): > ... def __init__(self, time=time.gmtime()): > ... self.time = time > ... def __format__(self, spec): > ... return fstr(time.strftime(spec, self.time)) > > >>> start = GetTime(time.gmtime(1187154773.0085449)) > > >>> fstr("Start: {0:%d/%m/%Y %H:%M:%S,<30}").format(start) > 'Start: 15/08/2007 05:12:53 ' Caveat: some date formats include a comma. I think the only workaround would be splitting them into separate formats (and using the input date twice). -- Adam Olsen, aka Rhamphoryncus From eric+python-dev at trueblade.com Wed Aug 15 19:26:33 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 15 Aug 2007 13:26:33 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> <8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com> Message-ID: <46C33749.5010702@trueblade.com> James Thiele wrote: > The section on the explicit conversion flag contains the following line: > > These flags are typically placed before the format specifier: > > Where else can they be placed? I'd like this to say they can only be placed where the PEP describes them, or maybe to be only at the end. "{0!r:20}".format("Hello") or "{0:20!r}".format("Hello") Putting them at the end makes the parsing easier, although I grant you that that's not a great reason for specifying it that way. Whatever it is, I think there should be only one place they can go. > Also there is no description of what action (if any) is taken if an > unknown explicit conversion flag is encoubtered. I would assume a ValueError, but yes, it should be explicit. From guido at python.org Wed Aug 15 19:28:00 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Aug 2007 10:28:00 -0700 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: On 8/14/07, Guido van Rossum wrote: > I thought some more about the universal newlines situation, and I > think I can handle all the use cases with a single 'newline' > parameter. The use cases are: > > (A) input use cases: > > (1) newline=None: input with default universal newlines mode; lines > may end in \r, \n, or \r\n, and these are translated to \n. > > (2) newline='': input with untranslated universal newlines mode; lines > may end in \r, \n, or \r\n, and these are returned untranslated. > > (3) newline='\r', newline='\n', newline='\r\n': input lines must end > with the given character(s), and these are translated to \n. > > (B) output use cases: > > (1) newline=None: every \n written is translated to os.linesep. > > (2) newline='': no translation takes place. > > (3) newline='\r', newline='\n', newline='\r\n': every \n written is > translated to the value of newline. I'm going to respond to several replies in one email. Warning: bikeshedding ahead! On 8/15/07, Brett Cannon wrote: > I like the options, but I would swap the meaning of None and the empty > string. My reasoning for this is that for option 3 it says to me > "here is a string representing EOL, and make it \n". So I would think > of the empty string as, "I don't know what EOL is, but I want it > translated to \n". Then None means, "I don't want any translation > done" by the fact that the argument is not a string. In other words, > the existence of a string argument means you want EOL translated to > \n, and the specific value of 'newline' specifying how to determine > what EOL is. I see it differently. None is the natural default, which is universal newline modes with translation on input, and translation of \n to os.linesep on output. On input, all the other forms mean "no translation", and the value is the character string that ends a line (leaving the door open for a future extension to arbitrary record separators, either as an eventual standard feature, or as a compatible user-defined variant). If it is empty, that is clearly an exception (since io.py is not able to paranormally guess when a line ends without searching for a character), so we give that the special meaning "disable translation, but use the default line ending separators". On output, the situation isn't quite symmetrical, since the use cases are different: the natural default is to translate \n to os.linesep, and the most common other choices are probably to translate \n to a specific line ending (this helps keep the line ending choice separate from the code that produces the output). Again, translating \n to the empty string makes no sense, so the empty string can be used for another special case: and again, it is the "give the app the most control" case. Note that translation on input when a specific line ending is given doesn't make much sense, and can even create ambiguities -- e.g. if the line ending is \r\n, an input line of the form XXX\nYYY\r\n would be translated to XXX\nYYY\n, and then one would wonder why it wasn't split at the first \n. (If you want translation, you're apparently not all that interested in the details, so the default is best for you.) For output, it's different: *not* translating on output doesn't require one to specify a line ending when opening the file. Here are a few complete scenarios: - Copy a file (perhaps changing the encoding) while keeping line endings the same: specify newline="" on input and output. - Copy a file translating line endings to the platform default: specify newline=None on input and output. - Copy a file translating line endings to a specific string: specify newline=None on input and newline="" on output. - Read a Windows file the way it would be interpreted by certain tools on Windows: set newline="\r\n" (this treats a lone \n or \r as a regular character). On 8/15/07, Christian Heimes wrote: > I like to propose some constants which should be used instead of the > strings: > > MAC = '\r' > UNIX = '\n' > WINDOWS = '\r\n' > UNIVERSAL = '' > NOTRANSLATE = None > > I think that open(filename, newline=io.UNIVERSAL) or open(filename, > newline=io.WINDOWS) is much more readable than open(filename, > newline=''). Besides I always forget if Windows is '\r\n' or '\n\r'. *g* I find named constants unpythonic; taken to the extreme you'd also want to define names for modes like "r" and "w+b". I also think it's a bad idea to use platform names -- lots of places besides Windows use \r\n (e.g. most standard internet protocols), and most modern Mac applications use \n, not \r. On 8/15/07, Georg Brandl wrote: > I'd use None and "\r"/... as proposed, but "U" instead of the empty string > for universal newline mode. "U" already has that established meaning, and > you don't have to remember the difference between the two (false) values "" > and None. But it would close off the possible extension to other separators I mentioned above. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Wed Aug 15 19:39:34 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 15 Aug 2007 11:39:34 -0600 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: On 8/15/07, Christian Heimes wrote: > Brett Cannon wrote: > > I like the options, but I would swap the meaning of None and the empty > > string. My reasoning for this is that for option 3 it says to me > > "here is a string representing EOL, and make it \n". So I would think > > of the empty string as, "I don't know what EOL is, but I want it > > translated to \n". Then None means, "I don't want any translation > > done" by the fact that the argument is not a string. In other words, > > the existence of a string argument means you want EOL translated to > > \n, and the specific value of 'newline' specifying how to determine > > what EOL is. > > I like to propose some constants which should be used instead of the > strings: > > MAC = '\r' > UNIX = '\n' > WINDOWS = '\r\n' > UNIVERSAL = '' > NOTRANSLATE = None > > I think that open(filename, newline=io.UNIVERSAL) or open(filename, > newline=io.WINDOWS) is much more readable than open(filename, > newline=''). Besides I always forget if Windows is '\r\n' or '\n\r'. *g* I agree, but please make the constants opaque. I don't want to see a random mix of constants and non-constants. Plus, opaque constants could be self-documenting. -- Adam Olsen, aka Rhamphoryncus From jjb5 at cornell.edu Wed Aug 15 19:44:04 2007 From: jjb5 at cornell.edu (Joel Bender) Date: Wed, 15 Aug 2007 13:44:04 -0400 Subject: [Python-3000] Fix imghdr module for bytes In-Reply-To: <46BFA0FC.2060707@canterbury.ac.nz> References: <200708110235.43664.victor.stinner@haypocalc.com> <79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com> <46BFA0FC.2060707@canterbury.ac.nz> Message-ID: <46C33B64.9040104@cornell.edu> Greg Ewing wrote: > I'm wondering whether we want a "byte character literal" > to go along with "byte string literals": > > h[0] == c"P" > > After all, if it makes sense to write an array of bytes > as though they were ASCII characters, it must make sense > to write a single byte that way as well. Would you propose these to be mutable as well? Ugh. :-) Joel From martin at v.loewis.de Wed Aug 15 19:51:02 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 15 Aug 2007 19:51:02 +0200 Subject: [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: <46C33D06.9030607@v.loewis.de> > Okay, I made the switch. I tagged the state of both Python branches > before the switch as tags/py{26,3k}-before-rstdocs/. Update instructions: 1. svn diff Doc; any pending changes will need to be redone 2. svn up; this will remove the tex sources, and then likely fail if there were still other files present in Doc, e.g. from building the documentation 3. review any files left in Doc 4. rm -rf Doc 5. svn up If you are certain there is nothing of interest in your sandbox copy of Doc, you can start with step 4. Regards, Martin From tony at PageDNA.com Wed Aug 15 20:27:45 2007 From: tony at PageDNA.com (Tony Lownds) Date: Wed, 15 Aug 2007 11:27:45 -0700 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: On Aug 14, 2007, at 9:56 PM, Guido van Rossum wrote: > I thought some more about the universal newlines situation, and I > think I can handle all the use cases with a single 'newline' > parameter. The use cases are: > > (A) input use cases: > > (1) newline=None: input with default universal newlines mode; lines > may end in \r, \n, or \r\n, and these are translated to \n. > > (2) newline='': input with untranslated universal newlines mode; lines > may end in \r, \n, or \r\n, and these are returned untranslated. > > (3) newline='\r', newline='\n', newline='\r\n': input lines must end > with the given character(s), and these are translated to \n. > > (B) output use cases: > > (1) newline=None: every \n written is translated to os.linesep. > > (2) newline='': no translation takes place. > > (3) newline='\r', newline='\n', newline='\r\n': every \n written is > translated to the value of newline. > These make a lot of sense to me. I'm working on test cases / cleanup, but I have a patch that implements the behavior above. And the newlines attribute. Thanks -Tony From victor.stinner at haypocalc.com Wed Aug 15 21:52:38 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 15 Aug 2007 21:52:38 +0200 Subject: [Python-3000] Questions about email bytes/str (python 3000) In-Reply-To: <07Aug14.184454pdt."57996"@synergy1.parc.xerox.com> References: <200708140422.36818.victor.stinner@haypocalc.com> <07Aug14.184454pdt."57996"@synergy1.parc.xerox.com> Message-ID: <200708152152.38839.victor.stinner@haypocalc.com> On Wednesday 15 August 2007 03:44:54 Bill Janssen wrote: > > (...) I think that base64MIME.encode() may have to accept strings. > > Personally, I think it would avoid more errors if it didn't. Yeah, how can you guess which charset the user want to use? For most user, there is only one charset: latin-1. So I you use UTF-8, he will not understand conversion errors. Another argument: I like bidirectional codec: decode(encode(x)) == x encode(decode(x)) == x So if you mix bytes and str, these relations will be wrong. Victor Stinner aka haypo http://hachoir.org/ From rhamph at gmail.com Wed Aug 15 22:14:25 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 15 Aug 2007 14:14:25 -0600 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: On 8/14/07, Guido van Rossum wrote: > I thought some more about the universal newlines situation, and I > think I can handle all the use cases with a single 'newline' > parameter. The use cases are: > > (A) input use cases: > > (1) newline=None: input with default universal newlines mode; lines > may end in \r, \n, or \r\n, and these are translated to \n. > > (2) newline='': input with untranslated universal newlines mode; lines > may end in \r, \n, or \r\n, and these are returned untranslated. Caveat: this mode cannot be supported by sockets. When reading a lone \r you need to peek ahead to ensure the next character is not a \n, but for sockets that may block indefinitely. I don't expect sockets to use the file API by default, but there's enough overlap (named pipes?) that limitations like this should be well documented (and if possible, produce an explicit error!) > (3) newline='\r', newline='\n', newline='\r\n': input lines must end > with the given character(s), and these are translated to \n. > > (B) output use cases: > > (1) newline=None: every \n written is translated to os.linesep. > > (2) newline='': no translation takes place. > > (3) newline='\r', newline='\n', newline='\r\n': every \n written is > translated to the value of newline. > > Note that cases (2) are new, and case (3) changes from the current PEP > and/or from the current implementation (which seems to deviate from > the PEP). [snip] > > * If universal newlines without translation are requested on > input (i.e. ``newline=''``), if a system read operation > returns a buffer ending in ``'\r'``, another system read > operation is done to determine whether it is followed by > ``'\n'`` or not. In universal newlines mode with > translation, the second system read operation may be > postponed until the next read request, and if the following > system read operation returns a buffer starting with > ``'\n'``, that character is simply discarded. -- Adam Olsen, aka Rhamphoryncus From guido at python.org Wed Aug 15 22:17:15 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Aug 2007 13:17:15 -0700 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: On 8/15/07, Adam Olsen wrote: > On 8/14/07, Guido van Rossum wrote: > > (2) newline='': input with untranslated universal newlines mode; lines > > may end in \r, \n, or \r\n, and these are returned untranslated. > > Caveat: this mode cannot be supported by sockets. When reading a lone > \r you need to peek ahead to ensure the next character is not a \n, > but for sockets that may block indefinitely. It depends on what you want. In general *any* read from a socket may block indefinitely. If the protocol requires turning around at \r *or* \r\n I'd say the protocol is insane. > I don't expect sockets to use the file API by default, but there's > enough overlap (named pipes?) that limitations like this should be > well documented (and if possible, produce an explicit error!) Why do you want it to produce an error? Who says I don't know what I'm doing when I request that mode? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Wed Aug 15 22:36:58 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 15 Aug 2007 14:36:58 -0600 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: On 8/15/07, Guido van Rossum wrote: > On 8/15/07, Adam Olsen wrote: > > On 8/14/07, Guido van Rossum wrote: > > > (2) newline='': input with untranslated universal newlines mode; lines > > > may end in \r, \n, or \r\n, and these are returned untranslated. > > > > Caveat: this mode cannot be supported by sockets. When reading a lone > > \r you need to peek ahead to ensure the next character is not a \n, > > but for sockets that may block indefinitely. > > It depends on what you want. In general *any* read from a socket may > block indefinitely. If the protocol requires turning around at \r *or* > \r\n I'd say the protocol is insane. > > > I don't expect sockets to use the file API by default, but there's > > enough overlap (named pipes?) that limitations like this should be > > well documented (and if possible, produce an explicit error!) > > Why do you want it to produce an error? Who says I don't know what I'm > doing when I request that mode? As you just said, you'd be insane to require it. But on second thought I don't think we can reliably say it's wrong. A named pipe may just have a file cat'd through it, which would handle this mode just fine. It should be documented that interactive streams cannot safely use this mode. -- Adam Olsen, aka Rhamphoryncus From rrr at ronadam.com Wed Aug 15 22:52:56 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 15 Aug 2007 15:52:56 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: References: <46B13ADE.7080901@acm.org> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C2325D.1010209@ronadam.com> Message-ID: <46C367A8.4040601@ronadam.com> Jim Jewett wrote: > On 8/14/07, Ron Adam wrote: >> Jim Jewett wrote: >>> On 8/13/07, Ron Adam wrote: > >>>> {name[:][type][alignment_term][,content_modifying_term]} > >>> ... You used the (alignment term) >>> width as the number of digits before the decimal, >>> instead of as the field width. > >> You can leave out either term. So that may have >> been what you are seeing. > > I thought I was going based on your spec, rather than the examples. > >> {0:f8.2} > > Because there is no comma, this should be a field of width 8 -- two > after a decimal point, the decimal point itself, and at most 5 digits > (or 4 and sign) before the the decimal point > > I (mis?)read your messge as special-casing float to 8 digits before > the decimal point, for a total field width of 11. No, but I see where the confusion is coming from. The spec is split up a bit different than what you are expecting. Maybe this will help... {name[:[type][align][,format]]} If the alignment is left out, the comma isn't needed so it becomes.. {name[:[type][fomrat]} Which is what you are seeing. BUT, after trying a sequential specifications terms... and finding it works better this changes a bit.. more on this at the end of this message. I'll try to answer your questions first. > ... >> Minimal width with fill for shorter than width items. It >> expands if the length of the item is longer than width. > ... >>> Can I say "width=whatever it takes, up to 72 chars ... >>> but don't pad it if you don't need to"? > >> That's the default behavior. > > If width is only a minimum, where did it get the maximum of 72 chars part? Ok.. we aren't communicating... Lets see if I can clear this up. width = whatever it takes # size of a field max_width = 72 Padding doesn't make sense here. Whatever it takes is basically saying there is no minimum width, so there is nothing to say how much padding to use. Unless you want a fixed 72 width with padding. Then the whatever it takes part doesn't makes sense. So if you want a padded width of 30, and a maximum width of 72, you would use the following. {0:s^30/_,+72} Pad strings shorter than 30 with '_', trim long strings to 72 max. In this case the +72 is evaluated in the context of a string. >>> I'm not sure that variable lengths and alignment even >>> *should* be supported in the same expression, but if >>> forcing everything to fixed-width would be enough of >>> a change that it needs an explicit callout. > >> Alignment is needed for when the length of the value is shorter than the >> length of the field. So if a field has a minimal width, and a value is >> shorter than that, it will be used. > > I should have been more explicit -- variable length *field*. How can > one report say "use up to ten characters if you need them, but only > use three if that is all you need" and another report say "use exactly > ten characters; right align and fill with spaces." > > -jJ {0:s<3,+10} The second part is like s[:+10] Minimum left aligned of 3, (strings of 1 and 2 go to the left with spaces as padding up to 3 characters total, but expand up to 10. Anything over that is trimmed to 10 characters. {0:s>10,+10} Right align shorter than 10 character strings and pad with spaces, and cut longer strings to 10 characters from the left. {0:s>10,-10} The second term is like s[-10:] Right align shorter than 10 character strings, and pad with spaces, and cut longer strings to 10 characters from the right. Now... For different variation. I've found we can do all this by chaining multiple specifiers in a simple left to right evaluated expression. It doesn't split the type letters from the terms like what confused you earlier, so you may like this better. The biggest problem with this is the use of commas for separators, they can clash with commas in the specifiers. This is a problem with any multi part split specifier as well, so if you can think of a way to resolve that, it would be good. BTW, The examples above work unchanged. {0:s>10,+10} This is equivalent to... s>10,s+10 We can drop the 's' off the second term because the result of the first term is a string. Or it can be left on for clarity. So this evaluates to... value.__format__('s>10').__format__('+10') We could combine the terms, but because the behavior is so different, aligning vs clipping, I don't think that's a good idea even though they are both handled by the string type. With numbers it works the same way... {0:f10.2,>12} value.__format__('f10.2').__format__('>12') Here a string is returned after the f format spec is applied, and then string's __format__ handles the field alignment. The leading 's' isn't needed for 's>12' because the type is already a string. It's much simpler than it may sound in practice. I also have a test (proof of concept) implementation if you'd like to see it. I've worked into parts of Talons latest PEP update as well. There are still plenty of things to add to it though like scientific notation and general number forms. Unused argument testing, nested terms, and attribute access. General form: {label[:spec1[,spec2][,spec3]...]} Each values.__format__ method is called with the next specifier from left to right sequence. In most cases only one or two specifiers are needed. * = Not implemented yet The general string presentation type. 's' - String. Outputs aligned and or trimmed strings. 'r' - Repr string. Outputs a string by using repr(). Strings are the default for anything without a __format__ method. The available integer presentation types are: * 'b' - Binary. Outputs the number in base 2. * 'c' - Character. Converts the integer to the corresponding unicode character before printing. 'd' - Decimal Integer. Outputs the number in base 10. 'o' - Octal format. Outputs the number in base 8. 'x' - Hex format. Outputs the number in base 16, using lower- case letters for the digits above 9. 'X' - Hex format. Outputs the number in base 16, using upper- case letters for the digits above 9. The available floating point presentation types are: * 'e' - Exponent notation. Prints the number in scientific notation using the letter 'e' to indicate the exponent. * 'E' - Exponent notation. Same as 'e' except it uses an upper case 'E' as the separator character. 'f' - Fixed point. Displays the number as a fixed-point number. 'F' - Fixed point. Same as 'f'. * 'g' - General format. This prints the number as a fixed-point number, unless the number is too large, in which case it switches to 'e' exponent notation. * 'G' - General format. Same as 'g' except switches to 'E' if the number gets to large. * 'n' - Number. This is the same as 'g', except that it uses the current locale setting to insert the appropriate number separator characters. '%' - Percentage. Multiplies the number by 100 and displays in fixed ('f') format, followed by a percent sign. String alignment: Sets minimum field width, justification, and padding chars. [s|r][justify][width][/padding_char] < left justify (default) > right justify ^ center width minimum field width /padding_char A single character String clipping: [s|r][trimmed width] +n Use n characters from beginning. s[:+n] -n use n characters form end. s[-n:] Numeric formats: [type][sign][0][digits][.decimals][%] signs: - show negative sign (defualt) + show both negative and positive sign ( parentheses around negatives, spaces around positives (ending ')' is optional.) % muliplies number by 100 and places ' %' after it. 0 pad digits with leading zeros. digits number of digits before the decimal decimals number of digits after the decimal EXAMPLES: >>> short_str = fstr("World") >>> long_str = fstr("World" * 3) >>> pos_int = fint(12345) >>> neg_int = fint(-12345) >>> pos_float = ffloat(123.45678) >>> neg_float = ffloat(-123.45678) >>> fstr("Hello {0}").format(short_str) # default __str__ 'Hello World' >>> fstr("Hello {0:s}").format(short_str) 'Hello World' >>> fstr("Hello {0:r}").format(short_str) "Hello 'World'" >>> fstr("Hello {0:s<10}").format(short_str) 'Hello World ' >>> fstr("Hello {0:s^10/_}").format(short_str) 'Hello __World___' >>> fstr("Hello {0:s+12,<10}").format(short_str) 'Hello World ' >>> fstr("Hello {0:s+12,<10}").format(long_str) 'Hello WorldWorldWo' >>> fstr("Hello {0:r+12,<10}").format(long_str) "Hello 'WorldWorldW" >>> fstr("Hello {0:s-12}").format(long_str) 'Hello ldWorldWorld' INTEGERS: >>> fstr("Item Number: {0}").format(pos_int) 'Item Number: 12345' >>> fstr("Item Number: {0:i}").format(pos_int) 'Item Number: 12345' >>> fstr("Item Number: {0:x}").format(pos_int) 'Item Number: 0x3039' >>> fstr("Item Number: {0:X}").format(pos_int) 'Item Number: 0X3039' >>> fstr("Item Number: {0:o}").format(pos_int) 'Item Number: 030071' >>> fstr("Item Number: {0:>10}").format(pos_int) 'Item Number: 12345' >>> fstr("Item Number: {0:<10}").format(pos_int) 'Item Number: 12345 ' >>> fstr("Item Number: {0:^10}").format(pos_int) 'Item Number: 12345 ' >>> fstr("Item Number: {0:i010%}").format(neg_int) 'Item Number: -0001234500 %' >>> fstr("Item Number: {0:i()}").format(pos_int) 'Item Number: 12345 ' >>> fstr("Item Number: {0:i()}").format(neg_int) 'Item Number: (12345)' FIXEDPOINT: >>> fstr("Item Number: {0}").format(pos_float) 'Item Number: 123.45678' >>> fstr("Item Number: {0:f}").format(pos_float) 'Item Number: 123.45678' >>> fstr("Item Number: {0:>12}").format(pos_float) 'Item Number: 123.45678' >>> fstr("Item Number: {0:<12}").format(pos_float) 'Item Number: 123.45678 ' >>> fstr("Item Number: {0:^12}").format(pos_float) 'Item Number: 123.45678 ' >>> fstr("Item Number: {0:f07.2%}").format(neg_float) 'Item Number: -0012345.68 %' >>> fstr("Item Number: {0:F.3}").format(neg_float) 'Item Number: -123.457' >>> fstr("Item Number: {0:f.7}").format(neg_float) 'Item Number: -123.4567800' >>> fstr("Item Number: {0:f(05.3)}").format(neg_float) 'Item Number: (00123.457)' >>> fstr("Item Number: {0:f05.7}").format(neg_float) 'Item Number: -00123.4567800' >>> fstr("Item Number: {0:f06.2}").format(neg_float) 'Item Number: -000123.46' >>> import time >>> class GetTime(object): ... def __init__(self, time=time.gmtime()): ... self.time = time ... def __format__(self, spec): ... return fstr(time.strftime(spec, self.time)) >>> start = GetTime(time.gmtime(1187154773.0085449)) >>> fstr("Start: {0:%d/%m/%Y %H:%M:%S,<30}").format(start) 'Start: 15/08/2007 05:12:53 ' ----------------------------------------------------------------- Examples from python3000 list: (With only a few changes where it makes sense or to make it work.) >>> floatvalue = ffloat(123.456) # Floating point number of natural width >>> fstr('{0:f}').format(floatvalue) '123.456' # Floating point number, with 10 digits before the decimal >>> fstr('{0:f10}').format(floatvalue) ' 123.456' # Floating point number, in feild of width 10, right justified. >>> fstr('{0:f,>10}').format(floatvalue) ' 123.456' # Floating point number, width at least 10 digits before # the decimal, leading zeros >>> fstr('{0:f010}').format(floatvalue) '0000000123.456' # Floating point number with two decimal digits >>> fstr('{0:f.2}').format(floatvalue) '123.46' # Minimum width 8, type defaults to natural type >>> fstr('{0:8}').format(floatvalue) '123.456 ' # Integer number, 5 digits, sign always shown >>> fstr('{0:d+5}').format(floatvalue) '+ 123' # repr() format >>> fstr('{0:r}').format(floatvalue) "'123.456'" # Field width 10, repr() format >>> fstr('{0:r10}').format(floatvalue) "'123.456' " # String right-aligned within field of minimum width >>> fstr('{0:s10}').format(floatvalue) '123.456 ' # String right-aligned within field of minimum width >>> fstr('{0:s+10,10}').format(floatvalue) '123.456 ' # String left-aligned in 10 char (min) field. >>> fstr('{0:s<10}').format(floatvalue) '123.456 ' # Integer centered in 15 character field >>> fstr('{0:d,^15}').format(floatvalue) ' 123 ' # Right align and pad with '.' chars >>> fstr('{0:>15/.}').format(floatvalue) '........123.456' # Floating point, always show sign, # leading zeros, 10 digits before decimal, 5 decimal places. >>> fstr('{0:f010.5}').format(floatvalue) '0000000123.45600' From rrr at ronadam.com Wed Aug 15 23:29:40 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 15 Aug 2007 16:29:40 -0500 Subject: [Python-3000] Format specifier proposal In-Reply-To: References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> <46C24F56.5050104@canterbury.ac.nz> <46C26110.8020001@ronadam.com> <20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net> <46C2A2B1.1010708@ronadam.com> Message-ID: <46C37044.6000406@ronadam.com> Adam Olsen wrote: > On 8/15/07, Ron Adam wrote: >> >> Andrew James Wade wrote: >>> On Tue, 14 Aug 2007 21:12:32 -0500 >>> Ron Adam wrote: >>>> What I was thinking of was just a simple left to right evaluation order. >>>> >>>> "{0:spec1, spec2, ... }".format(x) >>>> >>>> I don't expect this will ever get very long. >>> The first __format__ will return a str, so chains longer than 2 don't >>> make a lot of sense. And the delimiter character should be allowed in >>> spec1; limiting the length of the chain to 2 allows that without escaping: >>> >>> "{0:spec1-with-embedded-comma,}".format(x) >>> >>> My scheme did the same sort of thing with spec1 and spec2 reversed. >>> Your order makes more intuitive sense; I chose my order because I >>> wanted the syntax to be a generalization of formatting strings. >> > >>> Handling the chaining within the __format__ methods should be all of >>> two lines of boilerplate per method. >> I went ahead and tried this out and it actually cleared up some difficulty >> in organizing the parsing code. That was a very nice surprise. :) >> >> (actual doctest) >> >> >>> import time >> >>> class GetTime(object): >> ... def __init__(self, time=time.gmtime()): >> ... self.time = time >> ... def __format__(self, spec): >> ... return fstr(time.strftime(spec, self.time)) >> >> >>> start = GetTime(time.gmtime(1187154773.0085449)) >> >> >>> fstr("Start: {0:%d/%m/%Y %H:%M:%S,<30}").format(start) >> 'Start: 15/08/2007 05:12:53 ' > > Caveat: some date formats include a comma. I think the only > workaround would be splitting them into separate formats (and using > the input date twice). Maybe having an escaped comma? '\,' It really isn't any different than escaping quotes. It could be limited to just inside format {} expressions I think. Using raw strings with '\54' won't work. Ron From eric+python-dev at trueblade.com Wed Aug 15 23:34:26 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 15 Aug 2007 17:34:26 -0400 Subject: [Python-3000] Change in sys.stdin.name Message-ID: <46C37162.5020705@trueblade.com> I mentioned this in another message, but I thought I'd mention it here. I see this change in the behavior of sys.stdin.name, between 2.3.3 and 3.0x (checked out a few minutes ago). $ python Python 2.3.3 (#1, May 7 2004, 10:31:40) [GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.stdin.name '' $ ./python Python 3.0x (py3k:57077M, Aug 15 2007, 17:27:26) [GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.stdin.name 0 I see similar behavior with sys.stdout and sys.stderr. Is this deliberate? I can file a bug report if need be, just let me know. Eric. From brett at python.org Wed Aug 15 23:40:22 2007 From: brett at python.org (Brett Cannon) Date: Wed, 15 Aug 2007 14:40:22 -0700 Subject: [Python-3000] [Python-Dev] Documentation switch imminent In-Reply-To: <46C33D06.9030607@v.loewis.de> References: <46C33D06.9030607@v.loewis.de> Message-ID: On 8/15/07, "Martin v. L?wis" wrote: > > Okay, I made the switch. I tagged the state of both Python branches > > before the switch as tags/py{26,3k}-before-rstdocs/. > > Update instructions: > > 1. svn diff Doc; any pending changes will need to be redone > 2. svn up; this will remove the tex sources, and then likely > fail if there were still other files present in Doc, e.g. > from building the documentation > 3. review any files left in Doc > 4. rm -rf Doc > 5. svn up > > If you are certain there is nothing of interest in your sandbox > copy of Doc, you can start with step 4. Why the 'rm' call? When I did ``svn update`` it deleted the files for me. Is this to ditch some metadata? -Brett From martin at v.loewis.de Wed Aug 15 23:54:01 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 15 Aug 2007 23:54:01 +0200 Subject: [Python-3000] [Python-Dev] Documentation switch imminent In-Reply-To: References: <46C33D06.9030607@v.loewis.de> Message-ID: <46C375F9.3020908@v.loewis.de> >> 1. svn diff Doc; any pending changes will need to be redone >> 2. svn up; this will remove the tex sources, and then likely >> fail if there were still other files present in Doc, e.g. >> from building the documentation >> 3. review any files left in Doc >> 4. rm -rf Doc >> 5. svn up >> >> If you are certain there is nothing of interest in your sandbox >> copy of Doc, you can start with step 4. > > Why the 'rm' call? When I did ``svn update`` it deleted the files for > me. Is this to ditch some metadata? No, it's to delete any files in this tree not under version control, see step 2. If you had any such files, step 2 would abort with an error message svn: Konnte Verzeichnis ?Doc? nicht hinzuf?gen: ein Objekt mit demselben Namen existiert bereits (or some such) Regards, Martin From jimjjewett at gmail.com Thu Aug 16 01:27:57 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 15 Aug 2007 19:27:57 -0400 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46C2459C.1000405@canterbury.ac.nz> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <46C2459C.1000405@canterbury.ac.nz> Message-ID: On 8/14/07, Greg Ewing wrote: > ... {foo!r} rather than {foo:!r} > But either way, I suspect I'll find it difficult > to avoid writing it as {foo:r} in the heat of the > moment. Me too. And I don't like using up the "!" character. I know that it is still available outside of strings, but ... it is one more character that python used to leave for other tools. Giving a meaning to "@" mostly went OK, but ... From guido at python.org Thu Aug 16 01:40:27 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Aug 2007 16:40:27 -0700 Subject: [Python-3000] Questions about email bytes/str (python 3000) In-Reply-To: <200708152152.38839.victor.stinner@haypocalc.com> References: <200708140422.36818.victor.stinner@haypocalc.com> <200708152152.38839.victor.stinner@haypocalc.com> Message-ID: (Warning: quotation intentionally out of context!) On 8/15/07, somebody wrote: > For most users, there is only one charset: latin-1. Whoa! Careful with those assumptions. This is very culture and platforms dependent. For Americans it's ASCII (only half joking :-). For most of the world it's likely UTF-8. In Asia, it's anything *but* Latin-1. On my Mac laptop, the file system and the Terminal program default to UTF-8. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Thu Aug 16 01:41:59 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 15 Aug 2007 19:41:59 -0400 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46C2A2B1.1010708@ronadam.com> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> <46C24F56.5050104@canterbury.ac.nz> <46C26110.8020001@ronadam.com> <20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net> <46C2A2B1.1010708@ronadam.com> Message-ID: On 8/15/07, Ron Adam wrote: > After each term is returned from the __format__ call, the results > __format__ method is called with the next specifier. GetTime.__format__ > returns a string. str.__format__, aligns it. A nice left to right > sequence of events. Is this a pattern that objects should normally follow, or a convention enforced by format itself? In other words, does "{0:abc,def,ghi}".format(value) mean # Assume value.__format__ will delegate properly, to # result1.__format__("def,ghi") # # There are some surprises when a trailing field size gets ignored by # value.__class__. # # Are infinite loops more likely? value.__format__("abc,def,ghi") or # The separator character (","?) gets hard to use in format strings... value.__format__("abc").__format__("def").__format__("ghi") -jJ From guido at python.org Thu Aug 16 02:37:19 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Aug 2007 17:37:19 -0700 Subject: [Python-3000] Change in sys.stdin.name In-Reply-To: <46C37162.5020705@trueblade.com> References: <46C37162.5020705@trueblade.com> Message-ID: It sort of is -- the new I/O library uses the file descriptor if no filename is given. There were no unit tests that verified the old behavior, and I think it was of pretty marginal usefulness. Code inspecting f.name can tell the difference by looking at its type -- if it is an int, it's a file descriptor, if it is a string, it's a file name. On 8/15/07, Eric Smith wrote: > I mentioned this in another message, but I thought I'd mention it here. > > I see this change in the behavior of sys.stdin.name, between 2.3.3 and > 3.0x (checked out a few minutes ago). > > $ python > Python 2.3.3 (#1, May 7 2004, 10:31:40) > [GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import sys > >>> sys.stdin.name > '' > > > $ ./python > Python 3.0x (py3k:57077M, Aug 15 2007, 17:27:26) > [GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import sys > >>> sys.stdin.name > 0 > > > I see similar behavior with sys.stdout and sys.stderr. > > Is this deliberate? I can file a bug report if need be, just let me know. > > Eric. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Thu Aug 16 02:47:27 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 16 Aug 2007 02:47:27 +0200 Subject: [Python-3000] Change in sys.stdin.name In-Reply-To: <46C37162.5020705@trueblade.com> References: <46C37162.5020705@trueblade.com> Message-ID: Eric Smith wrote: > Is this deliberate? I can file a bug report if need be, just let me know. I'm sure it is a bug. The site.installnewio() function doesn't set the names. The attached patch fixes the issue and adds an unit test, too. Christian -------------- next part -------------- A non-text attachment was scrubbed... Name: stdnames.patch Type: text/x-patch Size: 1060 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070816/a40faf04/attachment.bin From greg.ewing at canterbury.ac.nz Thu Aug 16 03:06:42 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Aug 2007 13:06:42 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46C25DE3.6060906@ronadam.com> References: <46B13ADE.7080901@acm.org> <46B39BBF.80809@ronadam.com> <46B46DE1.1090403@canterbury.ac.nz> <46B4950F.40905@ronadam.com> <46B4A9A0.9070206@ronadam.com> <46B52422.2090006@canterbury.ac.nz> <46B568F3.9060105@ronadam.com> <46B66BE0.7090005@canterbury.ac.nz> <46B6851C.1030204@ronadam.com> <46B6C219.4040900@canterbury.ac.nz> <46B6DABB.3080509@ronadam.com> <46B7CD8C.5070807@acm.org> <46B8D58E.5040501@ronadam.com> <46BAC2D9.2020902@acm.org> <46BAEFB0.9050400@ronadam.com> <46BBB1AE.5010207@canterbury.ac.nz> <46BBEB16.2040205@ronadam.com> <46BC1479.30405@canterbury.ac.nz> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C10D2D.60705@canterbury.ac.nz> <46C12526.8040807@ronadam.com> <46C2429B.1090507@canterbury.ac.nz> <46C25DE3.6060906@ronadam.com> Message-ID: <46C3A322.8040904@canterbury.ac.nz> Ron Adam wrote: > > Greg Ewing wrote: > > The format strings are starting to look like line > > noise. > > Do you have a specific example or is it just an overall feeling? It's an overall feeling from looking at your examples. I can't take them in at a glance -- I have to minutely examine them character by character, which is tiring. With the traditional format strings, at least I can visually parse them without much trouble, even if I don't know precisely what all the parts mean.` > For example the the field alignment > part can be handled by the format function, and the value format part > can be handled by the __format__ method. Yes, although that seems to be about the *only* thing that can be separated, and it can be specified using just one character, which should be easy enough to strip out before passing on the format string. > And my apologies if its starting to seem like line noise. I'm not that > good at explaining things in simple ways. It doesn't really have anything to do with explanation. As I indicated above, even if I understand exactly what each part means, it's still hard work parsing the string if it contains more than a couple of the allowed elements. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From rrr at ronadam.com Thu Aug 16 03:07:02 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 15 Aug 2007 20:07:02 -0500 Subject: [Python-3000] Format specifier proposal In-Reply-To: References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> <46C24F56.5050104@canterbury.ac.nz> <46C26110.8020001@ronadam.com> <20070815004204.f986b4b1.ajwade+py3k@andrew.wade.networklinux.net> <46C2A2B1.1010708@ronadam.com> Message-ID: <46C3A336.7080706@ronadam.com> Jim Jewett wrote: > On 8/15/07, Ron Adam wrote: > >> After each term is returned from the __format__ call, the results >> __format__ method is called with the next specifier. GetTime.__format__ >> returns a string. str.__format__, aligns it. A nice left to right >> sequence of events. > > Is this a pattern that objects should normally follow, or a convention > enforced by format itself? In other words, does > > "{0:abc,def,ghi}".format(value) > > mean > > # Assume value.__format__ will delegate properly, to > # result1.__format__("def,ghi") > # > # There are some surprises when a trailing field size gets ignored by > # value.__class__. > # > # Are infinite loops more likely? > value.__format__("abc,def,ghi") > > or > > # The separator character (","?) gets hard to use in format strings... > value.__format__("abc").__format__("def").__format__("ghi") It would have to be this version. There isn't any way for the vformat method (*) to decide which ',' belongs where unless you create some strict and awkward rules about when you can use comma's and when you can't. * vformat is the method described in the pep responsible for calling format() with each value and specifier. So it is where the chaining is done. Currently I have only the following for this part, but it could be more sophisticated. value = self.get_value(key, args, kwargs) for term in spec.split(','): value = format(value, term) There are two ways around this, one is to have a comma escape sequence such as '\,'. Then after it split, it can replace the '\,' with ',' and then call format with the specifier with the un-escaped commas. Another way might be to be able to designate an alternative separator in some way. {0:|:abc|def,ghi} Where :sep: is the separator to use other than comma. Or :: to force a single term with no chaining. Or some other syntax might work? It's not an impossible problem to solve. The idea is __format__ methods only need to be concerned with their part. They shouldn't have to parse some other objects specifier and pass it along. (But you can still do that if you really want to.) Cheers, Ron From greg.ewing at canterbury.ac.nz Thu Aug 16 03:08:54 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Aug 2007 13:08:54 +1200 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46C26110.8020001@ronadam.com> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> <46C24F56.5050104@canterbury.ac.nz> <46C26110.8020001@ronadam.com> Message-ID: <46C3A3A6.2060109@canterbury.ac.nz> Ron Adam wrote: > What I was thinking of was just a simple left to right evaluation order. > > "{0:spec1, spec2, ... }".format(x) That would work too, as long as you were willing to forbid "," as a possible character in a type-specific format string. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Thu Aug 16 03:18:12 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Aug 2007 13:18:12 +1200 Subject: [Python-3000] Format specifier proposal In-Reply-To: <20070814230227.0c9be356.ajwade+py3k@andrew.wade.networklinux.net> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> <20070814230227.0c9be356.ajwade+py3k@andrew.wade.networklinux.net> Message-ID: <46C3A5D4.3060602@canterbury.ac.nz> Andrew James Wade wrote: > {1:!renewal date: %Y-%m-%d} # no special meaning for ! here. Yuck. Although it might happen to work due to reuse of strftime, I'd consider that bad style -- constant parts of the output string should be outside of the format specs, i.e.: "renewal date: {1:%Y-%m-%d}".format(my_date) -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Thu Aug 16 03:22:47 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Aug 2007 13:22:47 +1200 Subject: [Python-3000] [Python-Dev] Universal newlines support in Python 3.0 In-Reply-To: <7375B721-C62F-49ED-945B-5A9107246D6C@python.org> References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> <7375B721-C62F-49ED-945B-5A9107246D6C@python.org> Message-ID: <46C3A6E7.4010702@canterbury.ac.nz> Barry Warsaw wrote: > Add a flag > called preserve_eols that defaults to False, is ignored unless > universal newline mode is turned on, Is there any reason it shouldn't work in non-universal- newlines mode too? -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Thu Aug 16 03:35:36 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Aug 2007 13:35:36 +1200 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: <46C3A9E8.7040207@canterbury.ac.nz> Christian Heimes wrote: > Besides I always forget if Windows is '\r\n' or '\n\r'. Oh, that's easy. The teletype needs to get the CR before the LF so that it can start moving the carriage back to the left while it's scrolling the paper up. What? You don't have a teletype? Well, er, in that case... -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Thu Aug 16 03:38:09 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Aug 2007 13:38:09 +1200 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: <46C3AA81.9090602@canterbury.ac.nz> Georg Brandl wrote: > "U" instead of the empty string for universal newline mode. It would be good to leave open the possibility of allowing arbitrary line separation strings at some time in the future. In that case, newline == "U" would mean lines separated by "U". Predefined constants seem like a good idea to me. Otherwise I'm sure I'll always have to look up what '' and None mean. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Thu Aug 16 03:56:50 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Aug 2007 13:56:50 +1200 Subject: [Python-3000] Fix imghdr module for bytes In-Reply-To: <46C33B64.9040104@cornell.edu> References: <200708110235.43664.victor.stinner@haypocalc.com> <79990c6b0708121219x3aecef78hc58443b592c0a13d@mail.gmail.com> <46BFA0FC.2060707@canterbury.ac.nz> <46C33B64.9040104@cornell.edu> Message-ID: <46C3AEE2.1030201@canterbury.ac.nz> Joel Bender wrote: > Would you propose these to be mutable as well? No, they'd be integers. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Thu Aug 16 04:01:04 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Aug 2007 14:01:04 +1200 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: References: Message-ID: <46C3AFE0.4030906@canterbury.ac.nz> Adam Olsen wrote: > On 8/14/07, Guido van Rossum wrote: > > > (2) newline='': input with untranslated universal newlines mode; lines > > may end in \r, \n, or \r\n, and these are returned untranslated. > > Caveat: this mode cannot be supported by sockets. When reading a lone > \r you need to peek ahead to ensure the next character is not a \n, > but for sockets that may block indefinitely. You could return as soon as you see the '\r', with a flag set indicating that if the next character that comes in is '\n' it should be ignored. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From rhamph at gmail.com Thu Aug 16 04:41:36 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 15 Aug 2007 20:41:36 -0600 Subject: [Python-3000] Proposed new language for newline parameter to TextIOBase In-Reply-To: <46C3AFE0.4030906@canterbury.ac.nz> References: <46C3AFE0.4030906@canterbury.ac.nz> Message-ID: On 8/15/07, Greg Ewing wrote: > Adam Olsen wrote: > > On 8/14/07, Guido van Rossum wrote: > > > > > (2) newline='': input with untranslated universal newlines mode; lines > > > may end in \r, \n, or \r\n, and these are returned untranslated. > > > > Caveat: this mode cannot be supported by sockets. When reading a lone > > \r you need to peek ahead to ensure the next character is not a \n, > > but for sockets that may block indefinitely. > > You could return as soon as you see the '\r', with > a flag set indicating that if the next character > that comes in is '\n' it should be ignored. That would be the *other* universal newlines mode. ;) (Once you're already modifying the output, you might as well convert everything to '\n'.) -- Adam Olsen, aka Rhamphoryncus From talin at acm.org Thu Aug 16 05:12:24 2007 From: talin at acm.org (Talin) Date: Wed, 15 Aug 2007 20:12:24 -0700 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C331BF.2020104@trueblade.com> References: <46C2809C.3000806@acm.org> <8f01efd00708150738y36deee91v918bfd3f80944d9b@mail.gmail.com> <46C331BF.2020104@trueblade.com> Message-ID: <46C3C098.6080601@acm.org> Eric Smith wrote: > James Thiele wrote: >> I think the example: >> >> "My name is {0.name}".format(file('out.txt')) >> >> Would be easier to understand if you added: >> >> Which would produce: >> >> "My name is 'out.txt'" > > I agree. > > Also, the example a couple of paragraphs down: > "My name is {0[name]}".format(dict(name='Fred')) > should show the expected output: > "My name is Fred" Those examples are kind of contrived to begin with. Maybe we should replace them with more realistic ones. -- Talin From talin at acm.org Thu Aug 16 05:13:20 2007 From: talin at acm.org (Talin) Date: Wed, 15 Aug 2007 20:13:20 -0700 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C33749.5010702@trueblade.com> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> <8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com> <46C33749.5010702@trueblade.com> Message-ID: <46C3C0D0.50101@acm.org> Eric Smith wrote: > James Thiele wrote: >> The section on the explicit conversion flag contains the following line: >> >> These flags are typically placed before the format specifier: >> >> Where else can they be placed? > > I'd like this to say they can only be placed where the PEP describes > them, or maybe to be only at the end. > "{0!r:20}".format("Hello") > or > "{0:20!r}".format("Hello") > > Putting them at the end makes the parsing easier, although I grant you > that that's not a great reason for specifying it that way. Whatever it > is, I think there should be only one place they can go. Guido expressed a definite preference for having them be first. >> Also there is no description of what action (if any) is taken if an >> unknown explicit conversion flag is encoubtered. > > I would assume a ValueError, but yes, it should be explicit. This is one of those things I leave up to the implementor and then document later :) From eric+python-dev at trueblade.com Thu Aug 16 05:26:44 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 15 Aug 2007 23:26:44 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C3C0D0.50101@acm.org> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> <8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com> <46C33749.5010702@trueblade.com> <46C3C0D0.50101@acm.org> Message-ID: <46C3C3F4.7060307@trueblade.com> Talin wrote: > Eric Smith wrote: >> James Thiele wrote: >>> The section on the explicit conversion flag contains the following line: >>> >>> These flags are typically placed before the format specifier: >>> >>> Where else can they be placed? >> >> I'd like this to say they can only be placed where the PEP describes >> them, or maybe to be only at the end. >> "{0!r:20}".format("Hello") >> or >> "{0:20!r}".format("Hello") >> >> Putting them at the end makes the parsing easier, although I grant you >> that that's not a great reason for specifying it that way. Whatever >> it is, I think there should be only one place they can go. > > Guido expressed a definite preference for having them be first. I was afraid of that. Then can we say they'll always go first? Or is the intent really to say they can go anywhere (PEP says "typically placed")? The sample implementation of vformat in the PEP says they'll go last: # Check for explicit type conversion field_spec, _, explicit = field_spec.partition("!") Eric. From talin at acm.org Thu Aug 16 05:31:18 2007 From: talin at acm.org (Talin) Date: Wed, 15 Aug 2007 20:31:18 -0700 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C3C3F4.7060307@trueblade.com> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> <8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com> <46C33749.5010702@trueblade.com> <46C3C0D0.50101@acm.org> <46C3C3F4.7060307@trueblade.com> Message-ID: <46C3C506.7050802@acm.org> Eric Smith wrote: > Talin wrote: >> Eric Smith wrote: >>> James Thiele wrote: >>>> The section on the explicit conversion flag contains the following >>>> line: >>>> >>>> These flags are typically placed before the format specifier: >>>> >>>> Where else can they be placed? >>> >>> I'd like this to say they can only be placed where the PEP describes >>> them, or maybe to be only at the end. >>> "{0!r:20}".format("Hello") >>> or >>> "{0:20!r}".format("Hello") >>> >>> Putting them at the end makes the parsing easier, although I grant >>> you that that's not a great reason for specifying it that way. >>> Whatever it is, I think there should be only one place they can go. >> >> Guido expressed a definite preference for having them be first. > > I was afraid of that. Then can we say they'll always go first? Or is > the intent really to say they can go anywhere (PEP says "typically > placed")? I can revise it to say that they always come first if that's would make it easier. > The sample implementation of vformat in the PEP says they'll go last: > > # Check for explicit type conversion > field_spec, _, explicit = field_spec.partition("!") That's a bug. Too bad there's no unit tests for pseudo-code :) -- Talin From eric+python-dev at trueblade.com Thu Aug 16 05:37:42 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 15 Aug 2007 23:37:42 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C3C506.7050802@acm.org> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> <8f01efd00708150752i1b09464frb8d3217209b47a11@mail.gmail.com> <46C33749.5010702@trueblade.com> <46C3C0D0.50101@acm.org> <46C3C3F4.7060307@trueblade.com> <46C3C506.7050802@acm.org> Message-ID: <46C3C686.1000500@trueblade.com> Talin wrote: > Eric Smith wrote: >> Talin wrote: >>> Guido expressed a definite preference for having them be first. >> >> I was afraid of that. Then can we say they'll always go first? Or is >> the intent really to say they can go anywhere (PEP says "typically >> placed")? > > I can revise it to say that they always come first if that's would make > it easier. That would make it easier to code, and I suspect easier to read the Python code that uses them. I'll keep coding as if it says they're first; no sense updating the PEP until we batch up some changes. >> The sample implementation of vformat in the PEP says they'll go last: >> >> # Check for explicit type conversion >> field_spec, _, explicit = field_spec.partition("!") > > That's a bug. > > Too bad there's no unit tests for pseudo-code :) There's a task for someone! From aholkner at cs.rmit.edu.au Thu Aug 16 05:17:50 2007 From: aholkner at cs.rmit.edu.au (Alex Holkner) Date: Thu, 16 Aug 2007 13:17:50 +1000 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C2809C.3000806@acm.org> References: <46C2809C.3000806@acm.org> Message-ID: <46C3C1DE.6070302@cs.rmit.edu.au> Talin wrote: > A new version is up, incorporating material from the various discussions > on this list: > > http://www.python.org/dev/peps/pep-3101/ I've been following this thread for a few weeks, and I believe the following issues haven't yet been addressed: The PEP abstract says this proposal will replace the '%' operator, yet all the examples use the more verbose .format() method. Can a later section in the PEP (perhaps "String Methods") confirm that '%' on string is synonymous with the format method in Python 3000? What is the behaviour of whitespace in a format specifier? e.g. how much of the following is valid? "{ foo . name : 20s }".format(foo=open('bar')) One use-case might be to visually line up fields (in source) with a minimum field width. Even if not permitted, I believe this should be mentioned in the PEP. Does a brace that does not begin a format specifier raise an exception or get treated as character data? e.g. "{@foo}" or, if no whitespace is permitted: "{ foo }" or, an unmatched closing brace: " } " I don't have any preference on either behaviour, but would like to see it clarified in the PEP. Has there been any consideration for omitting the field name? The behaviour would be the same as the current string interpolation: "The {:s} sat on the {:s}".format('cat', 'mat') IMO this has gives a nicer abbreviated form for the most common use case: "Your name is {}.".format(name) This has the benefit of having similar syntax to the default interpolation for UNIX find and xargs commands, and eliminating errors from giving the wrong field number (there have been several posts in this thread erroneously using a 1-based index). Apologies if I'm repeating answered questions. Cheers Alex. From talin at acm.org Thu Aug 16 06:07:50 2007 From: talin at acm.org (Talin) Date: Wed, 15 Aug 2007 21:07:50 -0700 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C3C1DE.6070302@cs.rmit.edu.au> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> Message-ID: <46C3CD96.4070902@acm.org> Alex Holkner wrote: > Talin wrote: >> A new version is up, incorporating material from the various discussions >> on this list: >> >> http://www.python.org/dev/peps/pep-3101/ > > I've been following this thread for a few weeks, and I believe the > following issues haven't yet been addressed: > > The PEP abstract says this proposal will replace the '%' operator, yet > all the examples use the more verbose .format() method. Can a later > section in the PEP (perhaps "String Methods") confirm that '%' on string > is synonymous with the format method in Python 3000? Well, originally it was my intent that the .format method would co-exist beside the '%' operator, but Guido appears to want to deprecate the '%' operator (it will continue to be supported until 3.1 at least however.) > What is the behaviour of whitespace in a format specifier? e.g. > how much of the following is valid? > > "{ foo . name : 20s }".format(foo=open('bar')) Eric, it's your call :) > One use-case might be to visually line up fields (in source) with a > minimum field width. Even if not permitted, I believe this should be > mentioned in the PEP. > > Does a brace that does not begin a format specifier raise an exception > or get treated as character data? e.g. > > "{@foo}" > > or, if no whitespace is permitted: > > "{ foo }" > > or, an unmatched closing brace: > > " } " I would say unmatched brace should not be considered an error, but I'm the permissive type. > I don't have any preference on either behaviour, but would like to see > it clarified in the PEP. > > Has there been any consideration for omitting the field name? The > behaviour would be the same as the current string interpolation: > > "The {:s} sat on the {:s}".format('cat', 'mat') I suspect that this is something many people would like to move away from. Particularly in cases where different format strings are being used on the same data (a common example is localized strings), it's useful to be able to change around field order without changing the arguments to the function. > IMO this has gives a nicer abbreviated form for the most common use case: > > "Your name is {}.".format(name) > > This has the benefit of having similar syntax to the default > interpolation for UNIX find and xargs commands, and eliminating errors > from giving the wrong field number (there have been several posts in > this thread erroneously using a 1-based index). > > Apologies if I'm repeating answered questions. > > Cheers > Alex. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/talin%40acm.org > From nnorwitz at gmail.com Thu Aug 16 07:07:05 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Wed, 15 Aug 2007 22:07:05 -0700 Subject: [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: On 8/15/07, Georg Brandl wrote: > Georg Brandl schrieb: > > > > Neal will change his build scripts, so that the 2.6 and 3.0 devel > > documentation pages at docs.python.org will be built from these new > > trees soon. > > Okay, I made the switch. I tagged the state of both Python branches > before the switch as tags/py{26,3k}-before-rstdocs/. http://docs.python.org/dev/ http://docs.python.org/dev/3.0/ The upgrade went smoothly. Below are all the issues I noticed. I had to install a version of python 2.5 since that is a minimum requirement. I had to change from a plain 'make' in the Doc directory to 'make html'. The output is in build/html rather than html/ now. 2.6 output: trying to load pickled env... failed: [Errno 2] No such file or directory: 'build/doctrees/environment.pickle' writing output... ... library/contextlib.rst:3: Warning: 'with' will become a reserved keyword in Python 2.6 tutorial/errors.rst:1: Warning: 'with' will become a reserved keyword in Python 3.0 output: Traceback (most recent call last): File "tools/sphinx-build.py", line 13, in from sphinx import main File "/home/neal/python/py3k/Doc/tools/sphinx/__init__.py", line 16, in from .builder import builders File "/home/neal/python/py3k/Doc/tools/sphinx/builder.py", line 35, in from .environment import BuildEnvironment File "/home/neal/python/py3k/Doc/tools/sphinx/environment.py", line 34, in from docutils.parsers.rst.states import Body File "/home/neal/python/py3k/Doc/tools/docutils/parsers/rst/__init__.py", line 77, in from docutils.parsers.rst import states File "/home/neal/python/py3k/Doc/tools/docutils/parsers/rst/states.py", line 110, in import roman ImportError: No module named roman After this error, I just linked my tools directory to the one in 2.6 (trunk) and that worked. I'm not sure if this will create problems in the future. trying to load pickled env... failed: [Errno 2] No such file or directory: 'build/doctrees/environment.pickle' writing output... ... library/contextlib.rst:3: Warning: 'with' will become a reserved keyword in Python 2.6 library/shutil.rst:17: Warning: 'as' will become a reserved keyword in Python 2.6 library/subprocess.rst:7: Warning: 'as' will become a reserved keyword in Python 2.6 tutorial/errors.rst:1: Warning: 'with' will become a reserved keyword in Python 2.6 I realize none of these are a big deal. However, it would be nice if it was cleaned up so that people unfamiliar with building the docs aren't surprised. n From andrew.j.wade at gmail.com Mon Aug 13 01:58:56 2007 From: andrew.j.wade at gmail.com (Andrew James Wade) Date: Sun, 12 Aug 2007 19:58:56 -0400 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46BD79EC.1020301@acm.org> References: <46BD79EC.1020301@acm.org> Message-ID: <20070812195856.d6f085e8.ajwade+00@andrew.wade.networklinux.net> On Sat, 11 Aug 2007 01:57:16 -0700 Talin wrote: > Taking some ideas from the various threads, here's what I'd like to propose: > > (Assume that brackets [] means 'optional field') > > [:[type][align][sign][[0]minwidth][.precision]][/fill][!r] > > Examples: > > :f # Floating point number of natural width > :f10 # Floating point number, width at least 10 > :f010 # Floating point number, width at least 10, leading zeros > :f.2 # Floating point number with two decimal digits > :8 # Minimum width 8, type defaults to natural type > :d+2 # Integer number, 2 digits, sign always shown > !r # repr() format > :10!r # Field width 10, repr() format > :s10 # String right-aligned within field of minimum width > # of 10 chars. > :s10.10 # String right-aligned within field of minimum width > # of 10 chars, maximum width 10. > :s<10 # String left-aligned in 10 char (min) field. > :d^15 # Integer centered in 15 character field > :>15/. # Right align and pad with '.' chars > :f<+015.5 # Floating point, left aligned, always show sign, > # leading zeros, field width 15 (min), 5 decimal places. > > Notes: > > -- Leading zeros is different than fill character, although the two > are mutually exclusive. (Leading zeros always go between the sign and > the number, padding does not.) > -- For strings, precision is used as maximum field width. > -- __format__ functions are not allowed to re-interpret '!r'. > > I realize that the grouping of things is a little odd - for example, it > would be nice to put minwidth, padding and alignment in their own little > group so that they could be processed independently from __format__. Most custom formatting specs will probably end up putting width, padding and alignment in their own little group and will delegate those functions to str.__format__. Like so: :>30/.,yyyy-MM-dd HH:mm:ss def __format__(self, specifiers): align_spec, foo_spec = (specifiers.split(",",1) + [""])[:2] ... format foo ... return str.__format__(formatted_foo, align_spec.replace("foo", "s")) (I would suggest allowing ,yyyy-MM-dd as a short form of :,yyyy-MM-dd). I suspect there will be few cases where it makes sense to intermingle the width/alignment/padding fields with other fields. I would move !r to the start of the formatting specification; it should be prominent when it appears, and format will want to find it easily and unambiguously rather than leaving it to boilerplate in each __format__ method. -- Andrew From carlmj at hawaii.edu Mon Aug 13 12:08:50 2007 From: carlmj at hawaii.edu (Carl Johnson) Date: Mon, 13 Aug 2007 00:08:50 -1000 Subject: [Python-3000] More PEP 3101 changes incoming Message-ID: <633738EA-3F45-4933-BF81-0410BACBAF1E@hawaii.edu> (First, let me apologize for diving into a bike shed discussion.) There are two proposed ways to handle custom __format__ methods: > class MyInt: > def __format__(self, spec): > if int.is_int_specifier(spec): > return int(self).__format__(spec) > return "MyInt instance with custom specifier " + spec > def __int__(self): > return and > class MyInt: > def __format__(self, spec): > if is_custom_spec(spec): > return "MyInt instance with custom specifier " + spec > return NotImplemented > def __int__(self): > return I think this would be more straightforward as: class MyInt: def __format__(self, spec): if is_MyInt_specific_spec(spec): return "MyInt instance with custom specifier " + spec else: return int(self).__format__(spec) def __int__(self): return The makers of the MyInt class should be the ones responsible for knowing that MyInt can be converted to int as needed for output. If they want MyInt to handle all the same format spec options as MyInt, it's up to them to either implement them all in their __format__ or to cast the instance object to int then call its __format__ object by themselves. I don't see the point in having format guess what MyInt should be converted to if it can't handle the options passed to it. If we go too far down this road, if MyInt craps out when given ":MM-DD-YY", then format will be obliged to try casting to Date just to see if it will work. No, I think the format function should be somewhat dumb, since dumb makes more sense to __format__ implementers than clever. Let them figure out what their type can be cast into. In the case that regular int can't handle the given format spec either, int.__format__ will raise (return?) NotImplemented, in which case the format function will try string conversion, and then if that also pukes, a runtime exception should be raised. I also like the idea of using "!r" for calling repr and agree that it should be listed first. The syntax seems to be calling out for a little bit of extension though. Might it be nice to be able to do something like this? s = "10" print("{0!i:+d}".format(s)) #prints "+10" The !i attempts to cast the string to int. If it fails, then an exception is raised. If it succeeds, then the int.__format__ method is used on the remainder of the spec string. The logic is that ! commands are abbreviated functions that are applied to the input before other formatting options are given. On the one hand, this does risk a descent into "line noise" if too many ! options are provided. On the other hand, I think that providing ! options for just repr, str, int, and float probably wouldn't be too bad, and might save some tedious writing of int(s), etc. in spots. It seems like if we're going to have a weird syntax for repr anyway, we might as well use it to make things more convenient in other ways. Or is this too TMTOWTDI-ish, since one could just write int(s) instead? (But by that logic, one could write repr (s) too?) The format function would end up looking like this: def format(obj, spec): if spec[0] == "!": switch statement for applying obj = repr(obj), obj = int (obj), etc. spec = spec[2:] if obj.__format__ and type(obj) is not str: try: #if spec contains letters not understood, __format__ raises NI return obj.__format__(spec) except NotImplemented: pass #everything gets put through str as a last resort return str(obj).__format__(spec) #last chance before throwing exception Does this make sense to anyone else? --Carl Johnson From talin at acm.org Thu Aug 16 09:03:52 2007 From: talin at acm.org (Talin) Date: Thu, 16 Aug 2007 00:03:52 -0700 Subject: [Python-3000] [Python-Dev] Documentation switch imminent In-Reply-To: References: Message-ID: <46C3F6D8.2090502@acm.org> Neal Norwitz wrote: > On 8/15/07, Georg Brandl wrote: >> Georg Brandl schrieb: >>> Neal will change his build scripts, so that the 2.6 and 3.0 devel >>> documentation pages at docs.python.org will be built from these new >>> trees soon. >> Okay, I made the switch. I tagged the state of both Python branches >> before the switch as tags/py{26,3k}-before-rstdocs/. > > http://docs.python.org/dev/ > http://docs.python.org/dev/3.0/ So awesome. Great job everyone! -- Talin From talin at acm.org Thu Aug 16 09:09:39 2007 From: talin at acm.org (Talin) Date: Thu, 16 Aug 2007 00:09:39 -0700 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <633738EA-3F45-4933-BF81-0410BACBAF1E@hawaii.edu> References: <633738EA-3F45-4933-BF81-0410BACBAF1E@hawaii.edu> Message-ID: <46C3F833.7040209@acm.org> Carl Johnson wrote: > (First, let me apologize for diving into a bike shed discussion.) > > There are two proposed ways to handle custom __format__ methods: > >> class MyInt: >> def __format__(self, spec): >> if int.is_int_specifier(spec): >> return int(self).__format__(spec) >> return "MyInt instance with custom specifier " + spec >> def __int__(self): >> return > > and > >> class MyInt: >> def __format__(self, spec): >> if is_custom_spec(spec): >> return "MyInt instance with custom specifier " + spec >> return NotImplemented >> def __int__(self): >> return > > I think this would be more straightforward as: > > class MyInt: > def __format__(self, spec): > if is_MyInt_specific_spec(spec): > return "MyInt instance with custom specifier " + spec > else: > return int(self).__format__(spec) > def __int__(self): > return > > The makers of the MyInt class should be the ones responsible for > knowing that > MyInt can be converted to int as needed for output. If they want > MyInt to > handle all the same format spec options as MyInt, it's up to them to > either > implement them all in their __format__ or to cast the instance object > to int > then call its __format__ object by themselves. I don't see the point > in having > format guess what MyInt should be converted to if it can't handle the > options > passed to it. If we go too far down this road, if MyInt craps out > when given > ":MM-DD-YY", then format will be obliged to try casting to Date just > to see if > it will work. No, I think the format function should be somewhat > dumb, since > dumb makes more sense to __format__ implementers than clever. Let > them figure > out what their type can be cast into. +1 > In the case that regular int can't handle the given format spec either, > int.__format__ will raise (return?) NotImplemented, in which case the > format > function will try string conversion, and then if that also pukes, a > runtime > exception should be raised. > > I also like the idea of using "!r" for calling repr and agree that it > should be > listed first. The syntax seems to be calling out for a little bit of > extension > though. Might it be nice to be able to do something like this? > > s = "10" > print("{0!i:+d}".format(s)) #prints "+10" It's been talked about extending it. The plan is to first implement the more restricted version and let people hack on it, adding what features are deemed useful in practice. > The !i attempts to cast the string to int. If it fails, then an > exception is > raised. If it succeeds, then the int.__format__ method is used on the > remainder > of the spec string. The logic is that ! commands are abbreviated > functions that > are applied to the input before other formatting options are given. > > On the one hand, this does risk a descent into "line noise" if too > many ! > options are provided. On the other hand, I think that providing ! > options for > just repr, str, int, and float probably wouldn't be too bad, and > might save > some tedious writing of int(s), etc. in spots. It seems like if we're > going to > have a weird syntax for repr anyway, we might as well use it to make > things > more convenient in other ways. Or is this too TMTOWTDI-ish, since one > could > just write int(s) instead? (But by that logic, one could write repr > (s) too?) > > The format function would end up looking like this: > > def format(obj, spec): > if spec[0] == "!": > switch statement for applying obj = repr(obj), obj = int > (obj), etc. > spec = spec[2:] > if obj.__format__ and type(obj) is not str: > try: > #if spec contains letters not understood, __format__ > raises NI > return obj.__format__(spec) > except NotImplemented: > pass #everything gets put through str as a last resort > return str(obj).__format__(spec) #last chance before throwing > exception The built-in 'format' function doesn't handle '!r', that's done by the caller. The 'spec' argument passed in to 'format' is the part *after* the colon. Also, there's no need to test for the existence __format__, because all objects will have a __format__ method which is inherited from object.__format__. > Does this make sense to anyone else? Perfectly. > --Carl Johnson > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/talin%40acm.org > From andrew.j.wade at gmail.com Thu Aug 16 09:28:06 2007 From: andrew.j.wade at gmail.com (Andrew James Wade) Date: Thu, 16 Aug 2007 03:28:06 -0400 Subject: [Python-3000] Format specifier proposal In-Reply-To: <46C3A5D4.3060602@canterbury.ac.nz> References: <46BD79EC.1020301@acm.org> <46C095CF.2060507@ronadam.com> <46C11DDF.2080607@acm.org> <20070814022805.4a2b44a2.ajwade+py3k@andrew.wade.networklinux.net> <20070814230227.0c9be356.ajwade+py3k@andrew.wade.networklinux.net> <46C3A5D4.3060602@canterbury.ac.nz> Message-ID: <20070816032806.a8427bbd.ajwade+py3k@andrew.wade.networklinux.net> On Thu, 16 Aug 2007 13:18:12 +1200 Greg Ewing wrote: > Andrew James Wade wrote: > > {1:!renewal date: %Y-%m-%d} # no special meaning for ! here. > > Yuck. Although it might happen to work due to reuse of > strftime, I'd consider that bad style -- constant parts > of the output string should be outside of the format > specs, i.e.: > > "renewal date: {1:%Y-%m-%d}".format(my_date) To be sure; it's just that I couldn't think of a better example. My point is by putting spec1 last, the only things you need to escape are { and }. (They can be escaped as {lb} and {rb} by passing the right parameters.) The alteratives I see are: 1. [:spec1[,spec2]] - {1: %B %d, %Y} doesn't work as expected. 2. [!spec2][:spec1]] - order reversed. - meaning of spec2 is overloaded by !r, !s.[1] 3. [:spec2[,spec1]] - order reversed. - spec1-only syntax is too elaborate: {1:,%Y-%m-%d} 4. [,spec2][:spec1] long discussion here: http://mail.python.org/pipermail/python-3000/2007-August/009066.html - order reversed problem is particularly bad, because : looks like it should have low precedence. - meaning of spec2 is overloaded by ,r ,s.[1] - On the positive side, this is similar to .NET syntax. 5. { {1:spec1}:spec2} - looks like a replacement field for the name specifier. (Though a spec1 like %Y-%m-%d would tend to counteract that impression.) [1] This is particularly awkward since spec2 should be applied after spec1, but !s and !r should be applied before spec1. And in Talin's proposal, spec2 will be superfluous for strings and integers. It's also not needed when all you want to do is align str(x). I don't think any of them will fly :-(. My guess is that __format__ methods will do the chaining themselves with little standardization on the syntax to do so. -- Andrew From p.f.moore at gmail.com Thu Aug 16 11:44:21 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Aug 2007 10:44:21 +0100 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46C367A8.4040601@ronadam.com> References: <46B13ADE.7080901@acm.org> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C2325D.1010209@ronadam.com> <46C367A8.4040601@ronadam.com> Message-ID: <79990c6b0708160244t51902b6es50540c5684a99345@mail.gmail.com> On 15/08/07, Ron Adam wrote: > EXAMPLES: > [...] > Examples from python3000 list: [...] Can I suggest that these all go into the PEP, to give readers some flavour of what the new syntax will look like? I'd also repeat the suggestion that these examples be posted to comp.lang.python, to get more general community feedback. Paul. From rrr at ronadam.com Thu Aug 16 12:21:00 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 16 Aug 2007 05:21:00 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <79990c6b0708160244t51902b6es50540c5684a99345@mail.gmail.com> References: <46B13ADE.7080901@acm.org> <46BCA9C9.1010306@ronadam.com> <46BD5BD8.7030706@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C2325D.1010209@ronadam.com> <46C367A8.4040601@ronadam.com> <79990c6b0708160244t51902b6es50540c5684a99345@mail.gmail.com> Message-ID: <46C4250C.3050806@ronadam.com> Paul Moore wrote: > On 15/08/07, Ron Adam wrote: >> EXAMPLES: >> > [...] >> Examples from python3000 list: > [...] > > Can I suggest that these all go into the PEP, to give readers some > flavour of what the new syntax will look like? > > I'd also repeat the suggestion that these examples be posted to > comp.lang.python, to get more general community feedback. > > Paul. Currently these particular examples aren't the syntax supported by the PEP. It's an alternative/possibly syntax only if there is enough support for a serial left to right specification pattern as outlined. What the pep supports is a single value that is passed to the __format__ function. So the pep syntax combines alignment and other options into one term that the __format__ methods must decode all at once. I think most of developers here are still looking at various details and are still undecided. Do you have a preference for one or the other yet? Cheers, Ron From p.f.moore at gmail.com Thu Aug 16 13:08:28 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Aug 2007 12:08:28 +0100 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <46C4250C.3050806@ronadam.com> References: <46B13ADE.7080901@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C2325D.1010209@ronadam.com> <46C367A8.4040601@ronadam.com> <79990c6b0708160244t51902b6es50540c5684a99345@mail.gmail.com> <46C4250C.3050806@ronadam.com> Message-ID: <79990c6b0708160408h79c91a9eq7195497ce43a53cb@mail.gmail.com> On 16/08/07, Ron Adam wrote: > Currently these particular examples aren't the syntax supported by the PEP. > It's an alternative/possibly syntax only if there is enough support for a > serial left to right specification pattern as outlined. Ah, I hadn't realised that. I've been skipping most of the discussions, mainly because of the lack of concrete examples :-) > I think most of developers here are still looking at various details and > are still undecided. Do you have a preference for one or the other yet? As evidenced by the fact that I failed to notice the difference, I can't distinguish the two :-) All of the examples I've seen are hard to read. As Greg said, I find that I have to study the format string, mentally breaking it into parts, before I understand it. This is in complete contrast to printf-style "%.10s" formats. I'm not at all sure this is anything more than unfamiliarity, compounded by the fact that most of the examples I see on the list are relatively complex, or edge cases. But it's a major barrier to both understanding and acceptance of the new proposals. I'd still really like to see: 1. A simple listing of common cases, maybe taken from something like stdlib uses of %-formats. Yes, most of them would be pretty trivial. That's the point! 2. A *very short* comparison of a few more advanced cases - I'd suggest formatting floats as fixed width, 2 decimal places (%5.2f), formatting 8-digit hex (%.8X) and maybe a simple date format (%Y-%m-%d). Yes, those are the sort of things I consider advanced. Examples I've seen in the discussion aren't "advanced" in my book, they are "I'll never use that" :-) 3. Another very short list of a couple of things you can do with the new format, which you can't do with the existing % formats. Concentrate here on real-world use cases - tabular reports, reordering fields for internationalisation, things like that. As a data point, I've never needed to centre a field in a print statement. Heck, I don't even recall ever needing to specify how the sign of a number was printed! I get the impression that the "clever" new features aren't actually going to address the sorts of formatting problems I hit a lot. That's fine, I can write code to do what I want, but there's a sense of YAGNI about the discussion, because (for example) by the time I need to format a centred, max-18, min-5 character number with 3 decimal places and the sign hard to the left, I'm also going to want to dot-fill a string to 30 characters and insert commas into the number, and I'm writing code anyway, so why bother with an obscure format string that only does half the job? (It goes without saying that if the format string can do everything I want, that argument doesn't work, but then we get to the complexity issues that hit regular expressions :-)) Sorry if this sounds a bit skeptical, but there's a *lot* of discussion here over a feature I expect to use pretty infrequently. 99% of my % formats use nothing more than %s! Paul. From eric+python-dev at trueblade.com Thu Aug 16 14:08:14 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 16 Aug 2007 08:08:14 -0400 Subject: [Python-3000] Adding __format__ to object Message-ID: <46C43E2E.3000308@trueblade.com> As part of implementing PEP 3101, I need to add __format__ to object, to achieve the equivalent of: class object: def __format__(self, format_spec): return format(str(self), format_spec) I've added __format__ to int, unicode, etc., but I can't figure out where or how to add it to object. Any pointers are appreciated. Something as simple as "look at foo.c" or "grep for __baz__" would be good enough. Thanks! Eric. From skip at pobox.com Thu Aug 16 14:29:12 2007 From: skip at pobox.com (skip at pobox.com) Date: Thu, 16 Aug 2007 07:29:12 -0500 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <46C3C1DE.6070302@cs.rmit.edu.au> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> Message-ID: <18116.17176.123168.265491@montanaro.dyndns.org> Alex> The PEP abstract says this proposal will replace the '%' operator, I hope this really doesn't happen. printf-style formatting has a long history both in C and Python and is well-understood. Its few limitations are mostly due to the binary nature of the % operator, not to the power or flexibility of the format strings themselves. In contrast, the new format "language" seems to have no history (is it based on features in other languages? does anyone know if it will actually be usable in common practice?) and at least to the casual observer of recent threads on this topic seems extremely baroque. Python has a tradition of incorporating the best ideas from other languages. String formatting is so common that it doesn't seem to me we should need to invent a new, unproven mechanism to do this. Skip From eric+python-dev at trueblade.com Thu Aug 16 14:50:21 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 16 Aug 2007 08:50:21 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C3CD96.4070902@acm.org> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <46C3CD96.4070902@acm.org> Message-ID: <46C4480D.1060603@trueblade.com> Talin wrote: > Alex Holkner wrote: >> What is the behaviour of whitespace in a format specifier? e.g. >> how much of the following is valid? >> >> "{ foo . name : 20s }".format(foo=open('bar')) > > Eric, it's your call :) I'm okay with whitespace before the colon (or !, as the case may be). After the colon, I'd say it's significant and can't be automatically removed, because a particular formatter might care (for example, "%Y %M %D" for dates). Currently the code doesn't allow whitespace before the colon. I can add this if time permits, once everything else is implemented. From g.brandl at gmx.net Thu Aug 16 14:58:08 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 16 Aug 2007 14:58:08 +0200 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <18116.17176.123168.265491@montanaro.dyndns.org> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: skip at pobox.com schrieb: > Alex> The PEP abstract says this proposal will replace the '%' operator, > > I hope this really doesn't happen. printf-style formatting has a long > history both in C and Python and is well-understood. Its few limitations > are mostly due to the binary nature of the % operator, not to the power or > flexibility of the format strings themselves. In contrast, the new format > "language" seems to have no history (is it based on features in other > languages? does anyone know if it will actually be usable in common > practice?) and at least to the casual observer of recent threads on this > topic seems extremely baroque. > > Python has a tradition of incorporating the best ideas from other languages. > String formatting is so common that it doesn't seem to me we should need to > invent a new, unproven mechanism to do this. Not to mention the pain of porting %-style format strings and % formatting to {}-style format strings and .format() in Py3k. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From lists at cheimes.de Thu Aug 16 15:12:25 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 16 Aug 2007 15:12:25 +0200 Subject: [Python-3000] Adding __format__ to object In-Reply-To: <46C43E2E.3000308@trueblade.com> References: <46C43E2E.3000308@trueblade.com> Message-ID: <46C44D39.7040408@cheimes.de> Eric Smith wrote: > Any pointers are appreciated. Something as simple as "look at foo.c" or > "grep for __baz__" would be good enough. look at Objects/typeobject.c and grep for PyMethodDef object_methods[] Christian From barry at python.org Thu Aug 16 15:43:29 2007 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Aug 2007 09:43:29 -0400 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <18116.17176.123168.265491@montanaro.dyndns.org> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: <32B81623-9B2E-4B0C-99DF-12415E3F79E3@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 16, 2007, at 8:29 AM, skip at pobox.com wrote: > Alex> The PEP abstract says this proposal will replace the '%' > operator, > > I hope this really doesn't happen. printf-style formatting has a long > history both in C and Python and is well-understood. Its few > limitations > are mostly due to the binary nature of the % operator, not to the > power or > flexibility of the format strings themselves. In contrast, the new > format > "language" seems to have no history (is it based on features in other > languages? does anyone know if it will actually be usable in common > practice?) and at least to the casual observer of recent threads on > this > topic seems extremely baroque. > > Python has a tradition of incorporating the best ideas from other > languages. > String formatting is so common that it doesn't seem to me we should > need to > invent a new, unproven mechanism to do this. There are two parts to this, one is the language you use to define formatting and the other is the syntax you use to invoke it. I've been mostly ignoring the PEP 3101 discussions because every time I see examples of an advanced format string I shudder and resign myself to never remembering how to use it. It certainly doesn't feel like it's going to fit /my/ brain. OTOH, I'm not saying that a super whizzy all-encompassing inscrutable- but-powerful format language is a bad thing for Python, but it may not be the best /default/ language for formatting. OTOH, I don't think the three different formatting languages need three different syntaxes to invoke them. The three formatting languages are, in order of decreasing simplicity and familiarity, but increasing power: - - PEP 292 $-strings - - Python 2 style % substitutions - - PEP 3101 format specifiers I think all three languages have their place, but I would like to see if there's some way to make spelling their use more consistent. Not that I have any great ideas on that front, but I think the proposals in PEP 3101 (which I've only skimmed) are tied too closely to the latter format language. While I agree with Skip's sentiment, I'd say that it's not the %- operator I care as much about as it is the middle road that the formatting language it uses takes. For example, the logging package uses the same language but exposes a better (IMO) way to spell its use: >>> log.info('User %s ate %s', user, food) So the question is, is there some way to unify the use of format strings in the three different formatting languages, giving equal weight to each under the acknowledgment that all three use cases are equally valid? - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsRUgnEjvBPtnXfVAQIgPgP9EeHyZRdco1w5yUG1ro8UoTMFJ5ppsxcK Lyif38XaXTCL0t5nvxbvvI1GZksOHY4qyUwmUYrs+APhJbfSXfoGU1Ih+CzJhWPE 1PMng4s2z2pubpqGbAgV6etHx7Uiy8RPxp9lsD6rBo4GdtJwfTAFGgRgU67foMBl ijlxMujdOR0= =tw3Y -----END PGP SIGNATURE----- From guido at python.org Thu Aug 16 16:53:01 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Aug 2007 07:53:01 -0700 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: > > Alex> The PEP abstract says this proposal will replace the '%' operator, > skip at pobox.com: > > I hope this really doesn't happen. printf-style formatting has a long > > history both in C and Python and is well-understood. Its few limitations > > are mostly due to the binary nature of the % operator, not to the power or > > flexibility of the format strings themselves. In contrast, the new format > > "language" seems to have no history (is it based on features in other > > languages? does anyone know if it will actually be usable in common > > practice?) and at least to the casual observer of recent threads on this > > topic seems extremely baroque. > > > > Python has a tradition of incorporating the best ideas from other languages. > > String formatting is so common that it doesn't seem to me we should need to > > invent a new, unproven mechanism to do this. On 8/16/07, Georg Brandl wrote: > Not to mention the pain of porting %-style format strings and % formatting > to {}-style format strings and .format() in Py3k. There are many aspects to this. First of all, the discussion of PEP 3101 is taking too long, and some of the proposals are indeed outright scary. I have long stopped following it -- basically I only pay attention when Talin personally tells me that there's a new proposal. I *think* that with Monday's breakthrough we're actually close, but that apparently doesn't stop a few folks from continuing the heated discussion. Second, most aspects of the proposal have actually been lifted from other languages. The {...} notation is common in many web templating languages and also in .NET. In .NET, for example, you can write {0}, {1} etc. to reference positional parameters just like in the PEP. I don't recall if it supports {x} to reference to parameters by name, but that's common in web templating languages. The idea of allowing {x.name} and {x[key]} or {x.name[1]} also comes from web templating languages. In .NET, if you hav additional formatting requirements, you can write {0,10} to format parameter 0 with a minimum width of 10, and {0,-10} to right-align it. In .NET you can also write {0:xxx} where xxx is a mini-language used to express more details; this is used to request things like hex output, or the many variants of formatting floats. While we're not copying .NET *exactly*, most of the basic ideas are very similar; the discussion at this point is mostly about the type-specific mini-languages. My proposal is to use *exactly the same mini-language as used in 2.x %-formatting* (but without the '%' character), in particular: {0:10.3f} will format a float in a field of 10 characters wide with 3 digits behind the decimal point, and {0:08x} will format an int in hex with 0-padding in a field 8 characters wide. For strings, you can write {0:10.20s} to specify a min width of 10 and a max width of 20 (I betcha you didn't even know you could do this with %10.20s :-). The only added wrinkle is that you can also write {0!r} to *force* using repr() on the value. This is similar to %r in 2.x. Of course, non-numeric types can define their own mini-language, but that's all advanced stuff. (The concept of type-specific mini-languages is straight from .NET though.) I'm afraid there's an awful lot of bikeshedding going on trying to improve on this, e.g. people want the 'f' or 'x' in front, but I think time to the first alpha release is so close that we should stop discussing this and start implementing. (Fortunately at least one person has already implemented most of this.) Much of the earlies discussion was also terribly misguided because of an earlier assumption that the mini-language should coerce the type. This caused endless confusion about what to do with types that have their own __format__ override. In the end we (I) wisely decided that the object's __format__ always wins and numeric types will just have to support the same mini-language by convention. The user of the format() method won't care about any of this. Now on to the transition. On the one hand I always planned this to *replace* the old %-formatting syntax, which has a number of real problems: "%s" % x raises an exception if x happens to be a tuple, and you have to write "%s" % (x,) to format an object if you aren't sure about its type; also, it's very common to forget the trailing 's' in "%(name)s" % {'name': ...}. On the other hand it's too close to the alpha 1 release to fix all the current uses of %. (In fact it would be just short of a miracle if a working format() implementation made it into 3.0a1 at all. But I believe in miracles.) The mechanical translation is relatively straightforward when the format string is given as a literal, and this part is well within the scope of the 2to3 tool (someone just has to write the converter). The problems come, however, when formatting strings are passed around in variables or arguments. We can't very well assume that every string that happens to contain a % sign is a format string, and we can't assume that every use of the % operator is a formatting operator, either. Talin has jokingly proposed to translate *all* occurrences of x%y into _legacy_percent(x, y) which would be a function that does on-the-fly translation of format strings if x is a string, and returns x%y if it isn't, but that doesn't sound attractive at all. I don't know what percentage of %-formatting uses a string literal on the left; if it's a really high number (high 90s), I'd like to kill %-formatting and go with mechanical translation; otherwise, I think we'll have to phase out %-formatting in 3.x or 4.0. I hope this takes away some of the fears; and gives the PEP 3101 crowd the incentive to stop bikeshedding and start coding! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Thu Aug 16 17:05:36 2007 From: skip at pobox.com (skip at pobox.com) Date: Thu, 16 Aug 2007 10:05:36 -0500 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: <18116.26560.234919.274765@montanaro.dyndns.org> Thanks for the detailed response. Guido> Now on to the transition. On the one hand I always planned this Guido> to *replace* the old %-formatting syntax, which has a number of Guido> real problems: "%s" % x raises an exception if x happens to be a Guido> tuple, and you have to write "%s" % (x,) to format an object if Guido> you aren't sure about its type; also, it's very common to forget Guido> the trailing 's' in "%(name)s" % {'name': ...}. I was conflating the format string and the % operator in some of my (casual) thinking. I'm much less married to retaining the % operator itself (that is the source of most of the current warts I believe), but as you pointed out some of the format string proposals are pretty scary. Guido> I don't know what percentage of %-formatting uses a string Guido> literal on the left; if it's a really high number (high 90s), I'd Guido> like to kill %-formatting and go with mechanical translation; Guido> otherwise, I think we'll have to phase out %-formatting in 3.x or Guido> 4.0. Yow! You're already thinking about Python 4??? Skip From eric+python-dev at trueblade.com Thu Aug 16 17:09:56 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 16 Aug 2007 11:09:56 -0400 Subject: [Python-3000] Adding __format__ to object In-Reply-To: <46C44D39.7040408@cheimes.de> References: <46C43E2E.3000308@trueblade.com> <46C44D39.7040408@cheimes.de> Message-ID: <46C468C4.3020907@trueblade.com> Christian Heimes wrote: > Eric Smith wrote: >> Any pointers are appreciated. Something as simple as "look at foo.c" or >> "grep for __baz__" would be good enough. > > look at Objects/typeobject.c and grep for PyMethodDef object_methods[] I should have mentioned that's among the things I've already tried. But that appears to add methods to 'type', not to an instance of 'object'. If you do dir(object()): $ ./python Python 3.0x (py3k:57077M, Aug 16 2007, 10:10:04) [GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> dir(object()) ['__class__', '__delattr__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__'] You don't see the methods in typeobject.c (__mro__, etc). This is pretty much the last hurdle in finishing my implementation of PEP 3101. The rest of it is either done, or just involves refactoring existing code. Eric. From jimjjewett at gmail.com Thu Aug 16 17:17:18 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 16 Aug 2007 11:17:18 -0400 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <633738EA-3F45-4933-BF81-0410BACBAF1E@hawaii.edu> References: <633738EA-3F45-4933-BF81-0410BACBAF1E@hawaii.edu> Message-ID: On 8/13/07, Carl Johnson wrote: > I also like the idea of using "!r" for calling repr ... > s = "10" > print("{0!i:+d}".format(s)) #prints "+10" > The !i attempts to cast the string to int. ... > The logic is that ! commands are abbreviated functions ... Which does the "i" mean? (1) Call s.__format__(...) with a flag indicating that it should format itself like an integer. (2) Ignore s.__format__, and instead call s.__index__().__format__(...) If it is (case 1) an instruction to the object, then I don't see why it needs to be special-cased; objects can handle (or not) any format string, and "i" may well typically mean integer, but not always. If it is (case 2) an instruction to the format function, then what are the limits? I see the value of r for repr, because that is already a built-in alternative representation. If we also allow int, then we might as well allow arbitrary functions to check for validity constraints. def valcheck(val, spec=None): v=index(v) if not v in range(11): raise ValueError("Expected an integer in [0..10], but got {0!r}".format(v)) if spec is None: return v return spec.format(v) ... "You rated your experience as {0!valcheck:d} out of 10." > ... is this too TMTOWTDI-ish, since one could > just write int(s) instead? You can't write int(s) if you're passing a mapping (or tuple) from someone else; at best you can copy the mapping and modify certain values. -jJ From p.f.moore at gmail.com Thu Aug 16 17:32:43 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Aug 2007 16:32:43 +0100 Subject: [Python-3000] Adding __format__ to object In-Reply-To: <46C468C4.3020907@trueblade.com> References: <46C43E2E.3000308@trueblade.com> <46C44D39.7040408@cheimes.de> <46C468C4.3020907@trueblade.com> Message-ID: <79990c6b0708160832v2a7a105bp9f6323a77b05361b@mail.gmail.com> On 16/08/07, Eric Smith wrote: > Christian Heimes wrote: > > look at Objects/typeobject.c and grep for PyMethodDef object_methods[] > > I should have mentioned that's among the things I've already tried. [...] > You don't see the methods in typeobject.c (__mro__, etc). __mro__ is in type_members (at the top of the file). You want object_methods (lower down). All it currently defines is __reduce__ and __reduce_ex__ (which are in dir(object()). I've not tested (or ever used) this, so I could be wrong, of course. Paul. From talin at acm.org Thu Aug 16 17:49:28 2007 From: talin at acm.org (Talin) Date: Thu, 16 Aug 2007 08:49:28 -0700 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: <46C47208.4070103@acm.org> Guido van Rossum wrote: > While we're not copying .NET *exactly*, most of the basic ideas are > very similar; the discussion at this point is mostly about the > type-specific mini-languages. My proposal is to use *exactly the same > mini-language as used in 2.x %-formatting* (but without the '%' > character), in particular: {0:10.3f} will format a float in a field of > 10 characters wide with 3 digits behind the decimal point, and {0:08x} > will format an int in hex with 0-padding in a field 8 characters wide. > For strings, you can write {0:10.20s} to specify a min width of 10 and > a max width of 20 (I betcha you didn't even know you could do this > with %10.20s :-). The only added wrinkle is that you can also write > {0!r} to *force* using repr() on the value. This is similar to %r in > 2.x. Of course, non-numeric types can define their own mini-language, > but that's all advanced stuff. (The concept of type-specific > mini-languages is straight from .NET though.) Just to follow up on what Guido said: The current language of the PEP uses a formatting mini-language which is very close to the conversion specifiers of the existing '%' operator, and which is backwards compatible with it. So essentially if you understand printf-style formatting, you can use what you know. It may be that additional formatting options can be added in a future release of Python, however they should (a) be backwards compatible with what we have now, (b) demonstrate a compelling need, and (c) require their own PEP. In other words, I'm not going to add any more creeping features to the current PEP. Except for updating the examples and adding some clarifications that people have asked for, it's *done*. I admit that part of this whole syntax discussion was my fault - I did ask for a bit of a bikeshed discussion and I got way more than I bargained for :) -- Talin From janssen at parc.com Thu Aug 16 18:29:05 2007 From: janssen at parc.com (Bill Janssen) Date: Thu, 16 Aug 2007 09:29:05 PDT Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <18116.17176.123168.265491@montanaro.dyndns.org> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: <07Aug16.092913pdt."57996"@synergy1.parc.xerox.com> > Alex> The PEP abstract says this proposal will replace the '%' operator, > > I hope this really doesn't happen. printf-style formatting has a long > history both in C and Python and is well-understood. Its few limitations > are mostly due to the binary nature of the % operator, not to the power or > flexibility of the format strings themselves. In contrast, the new format > "language" seems to have no history (is it based on features in other > languages? does anyone know if it will actually be usable in common > practice?) and at least to the casual observer of recent threads on this > topic seems extremely baroque. > String formatting is so common that it doesn't seem to me we should need to > invent a new, unproven mechanism to do this. I strongly agree with Skip here. The current proposal seems to me a poor solution to a non-problem. Bill From guido at python.org Thu Aug 16 18:33:31 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Aug 2007 09:33:31 -0700 Subject: [Python-3000] Adding __format__ to object In-Reply-To: <79990c6b0708160832v2a7a105bp9f6323a77b05361b@mail.gmail.com> References: <46C43E2E.3000308@trueblade.com> <46C44D39.7040408@cheimes.de> <46C468C4.3020907@trueblade.com> <79990c6b0708160832v2a7a105bp9f6323a77b05361b@mail.gmail.com> Message-ID: Paul's right. I agree it's confusing that object and type are both defined in the same file (though there's probably a good reason, given that type is derived from object and object is an instance of type :-). To add methods to object, add them to object_methods in that file. I've tested this. On 8/16/07, Paul Moore wrote: > On 16/08/07, Eric Smith wrote: > > Christian Heimes wrote: > > > look at Objects/typeobject.c and grep for PyMethodDef object_methods[] > > > > I should have mentioned that's among the things I've already tried. > [...] > > You don't see the methods in typeobject.c (__mro__, etc). > > __mro__ is in type_members (at the top of the file). You want > object_methods (lower down). All it currently defines is __reduce__ > and __reduce_ex__ (which are in dir(object()). > > I've not tested (or ever used) this, so I could be wrong, of course. > Paul. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Aug 16 18:38:32 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Aug 2007 09:38:32 -0700 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <18116.26560.234919.274765@montanaro.dyndns.org> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.26560.234919.274765@montanaro.dyndns.org> Message-ID: On 8/16/07, skip at pobox.com wrote: > > Thanks for the detailed response. > > Guido> Now on to the transition. On the one hand I always planned this > Guido> to *replace* the old %-formatting syntax, which has a number of > Guido> real problems: "%s" % x raises an exception if x happens to be a > Guido> tuple, and you have to write "%s" % (x,) to format an object if > Guido> you aren't sure about its type; also, it's very common to forget > Guido> the trailing 's' in "%(name)s" % {'name': ...}. > > I was conflating the format string and the % operator in some of my (casual) > thinking. I'm much less married to retaining the % operator itself (that is > the source of most of the current warts I believe), but as you pointed out > some of the format string proposals are pretty scary. Which is why they won't be accepted. ;-) To clarify the % operator is the cause of the first wart; the second wart is caused by the syntax for individual formats. Keeping one but changing the other seems to keep some of the flaws of one or the other while still paying the full price of change, so this isn't an option. > Guido> I don't know what percentage of %-formatting uses a string > Guido> literal on the left; if it's a really high number (high 90s), I'd > Guido> like to kill %-formatting and go with mechanical translation; > Guido> otherwise, I think we'll have to phase out %-formatting in 3.x or > Guido> 4.0. > > Yow! You're already thinking about Python 4??? I use it in the same way we used to refer to Python 3000 in the past. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Thu Aug 16 18:46:38 2007 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 16 Aug 2007 10:46:38 -0600 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <79990c6b0708160408h79c91a9eq7195497ce43a53cb@mail.gmail.com> References: <46B13ADE.7080901@acm.org> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C2325D.1010209@ronadam.com> <46C367A8.4040601@ronadam.com> <79990c6b0708160244t51902b6es50540c5684a99345@mail.gmail.com> <46C4250C.3050806@ronadam.com> <79990c6b0708160408h79c91a9eq7195497ce43a53cb@mail.gmail.com> Message-ID: On 8/16/07, Paul Moore wrote: > On 16/08/07, Ron Adam wrote: > > Currently these particular examples aren't the syntax supported by the PEP. > > It's an alternative/possibly syntax only if there is enough support for a > > serial left to right specification pattern as outlined. > > Ah, I hadn't realised that. I've been skipping most of the > discussions, mainly because of the lack of concrete examples :-) > > > I think most of developers here are still looking at various details and > > are still undecided. Do you have a preference for one or the other yet? > > As evidenced by the fact that I failed to notice the difference, I > can't distinguish the two :-) > > All of the examples I've seen are hard to read. As Greg said, I find > that I have to study the format string, mentally breaking it into > parts, before I understand it. This is in complete contrast to > printf-style "%.10s" formats. I'm not at all sure this is anything > more than unfamiliarity, compounded by the fact that most of the > examples I see on the list are relatively complex, or edge cases. But > it's a major barrier to both understanding and acceptance of the new > proposals. > > I'd still really like to see: > > 1. A simple listing of common cases, maybe taken from something like > stdlib uses of %-formats. Yes, most of them would be pretty trivial. > That's the point! Seconded! This discussion needs some grounding. > > 2. A *very short* comparison of a few more advanced cases - I'd > suggest formatting floats as fixed width, 2 decimal places (%5.2f), > formatting 8-digit hex (%.8X) and maybe a simple date format > (%Y-%m-%d). Yes, those are the sort of things I consider advanced. > Examples I've seen in the discussion aren't "advanced" in my book, > they are "I'll never use that" :-) > > 3. Another very short list of a couple of things you can do with the > new format, which you can't do with the existing % formats. > Concentrate here on real-world use cases - tabular reports, reordering > fields for internationalisation, things like that. As a data point, > I've never needed to centre a field in a print statement. Heck, I > don't even recall ever needing to specify how the sign of a number was > printed! > > I get the impression that the "clever" new features aren't actually > going to address the sorts of formatting problems I hit a lot. That's > fine, I can write code to do what I want, but there's a sense of YAGNI > about the discussion, because (for example) by the time I need to > format a centred, max-18, min-5 character number with 3 decimal places > and the sign hard to the left, I'm also going to want to dot-fill a > string to 30 characters and insert commas into the number, and I'm > writing code anyway, so why bother with an obscure format string that > only does half the job? (It goes without saying that if the format > string can do everything I want, that argument doesn't work, but then > we get to the complexity issues that hit regular expressions :-)) > > Sorry if this sounds a bit skeptical, but there's a *lot* of > discussion here over a feature I expect to use pretty infrequently. > 99% of my % formats use nothing more than %s! > > Paul. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/rhamph%40gmail.com > -- Adam Olsen, aka Rhamphoryncus From rhamph at gmail.com Thu Aug 16 18:48:12 2007 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 16 Aug 2007 10:48:12 -0600 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C4480D.1060603@trueblade.com> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <46C3CD96.4070902@acm.org> <46C4480D.1060603@trueblade.com> Message-ID: On 8/16/07, Eric Smith wrote: > Talin wrote: > > Alex Holkner wrote: > >> What is the behaviour of whitespace in a format specifier? e.g. > >> how much of the following is valid? > >> > >> "{ foo . name : 20s }".format(foo=open('bar')) > > > > Eric, it's your call :) > > I'm okay with whitespace before the colon (or !, as the case may be). > After the colon, I'd say it's significant and can't be automatically > removed, because a particular formatter might care (for example, "%Y %M > %D" for dates). > > Currently the code doesn't allow whitespace before the colon. I can add > this if time permits, once everything else is implemented. YAGNI. -- Adam Olsen, aka Rhamphoryncus From eric+python-dev at trueblade.com Thu Aug 16 18:57:13 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 16 Aug 2007 12:57:13 -0400 Subject: [Python-3000] Adding __format__ to object In-Reply-To: References: <46C43E2E.3000308@trueblade.com> <46C44D39.7040408@cheimes.de> <46C468C4.3020907@trueblade.com> <79990c6b0708160832v2a7a105bp9f6323a77b05361b@mail.gmail.com> Message-ID: <46C481E9.9010006@trueblade.com> Guido van Rossum wrote: > Paul's right. I agree it's confusing that object and type are both > defined in the same file (though there's probably a good reason, given > that type is derived from object and object is an instance of type > :-). To add methods to object, add them to object_methods in that > file. I've tested this. Awesome! I thought I looked for all occurrences in that file, but apparently not. Thanks all for the help. Eric. From walter at livinglogic.de Thu Aug 16 19:10:48 2007 From: walter at livinglogic.de (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=) Date: Thu, 16 Aug 2007 19:10:48 +0200 Subject: [Python-3000] UTF-32 codecs Message-ID: <46C48518.3070701@livinglogic.de> I have a patch against the py3k branch (http://bugs.python.org/1775604) that adds UTF-32 codecs. On a narrow build it combines surrogate pairs in the unicode object into one codepoint on encoding and creates surrogate pairs for codepoints outside the BMP on decoding. Should I apply this to the py3k branch only, or do we want that for Python 2.6 too (using str instead of bytes)? Servus, Walter From fdrake at acm.org Thu Aug 16 19:18:44 2007 From: fdrake at acm.org (Fred Drake) Date: Thu, 16 Aug 2007 13:18:44 -0400 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.26560.234919.274765@montanaro.dyndns.org> Message-ID: On Aug 16, 2007, at 12:38 PM, Guido van Rossum wrote: > I use it in the same way we used to refer to Python 3000 in the > past. :-) If we acknowledge that Python 3.0 isn't the same as Python 3000, we don't even need a new name for it. ;-) -Fred -- Fred Drake From martin at v.loewis.de Thu Aug 16 19:42:09 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 16 Aug 2007 19:42:09 +0200 Subject: [Python-3000] UTF-32 codecs In-Reply-To: <46C48518.3070701@livinglogic.de> References: <46C48518.3070701@livinglogic.de> Message-ID: <46C48C71.3000400@v.loewis.de> Walter D?rwald schrieb: > I have a patch against the py3k branch (http://bugs.python.org/1775604) > that adds UTF-32 codecs. On a narrow build it combines surrogate pairs > in the unicode object into one codepoint on encoding and creates > surrogate pairs for codepoints outside the BMP on decoding. > > Should I apply this to the py3k branch only, or do we want that for > Python 2.6 too (using str instead of bytes)? If it's no effort, I would like to seem this on the trunk also. In general, I'm skeptical about the "new features only in 3k" strategy. Some features can be added easily with no backwards-compatibility issues in 2.x, and would have normally been added to the next major 2.x release without much discussion. Regards, Martin From eric+python-dev at trueblade.com Thu Aug 16 19:54:03 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 16 Aug 2007 13:54:03 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C3C098.6080601@acm.org> References: <46C2809C.3000806@acm.org> <8f01efd00708150738y36deee91v918bfd3f80944d9b@mail.gmail.com> <46C331BF.2020104@trueblade.com> <46C3C098.6080601@acm.org> Message-ID: <46C48F3B.50203@trueblade.com> Talin wrote: > Eric Smith wrote: >> James Thiele wrote: >>> I think the example: >>> >>> "My name is {0.name}".format(file('out.txt')) > Those examples are kind of contrived to begin with. Maybe we should > replace them with more realistic ones. I just added this test case: d = datetime.date(2007, 8, 18) self.assertEqual("The year is {0.year}".format(d), "The year is 2007") maybe we can use it. From lists at cheimes.de Thu Aug 16 20:31:29 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 16 Aug 2007 20:31:29 +0200 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <18116.17176.123168.265491@montanaro.dyndns.org> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: skip at pobox.com wrote: > Alex> The PEP abstract says this proposal will replace the '%' operator, [...] I agree with Skip, too. The % printf operator is a very useful and powerful feature. I'm doing newbie support at my university and in #python. Newbies are often astonished how easy and powerful printf() is in Python. I like the % format operator, too. It's easy and fast to type for small jobs. I beg you to keep the feature. I agree that the new PEP 3101 style format is useful and required for more complex string formating. But please keep a simple one for simple jobs. Christian From guido at python.org Thu Aug 16 20:33:19 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Aug 2007 11:33:19 -0700 Subject: [Python-3000] UTF-32 codecs In-Reply-To: <46C48C71.3000400@v.loewis.de> References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de> Message-ID: On 8/16/07, "Martin v. L?wis" wrote: > Walter D?rwald schrieb: > > I have a patch against the py3k branch (http://bugs.python.org/1775604) > > that adds UTF-32 codecs. On a narrow build it combines surrogate pairs > > in the unicode object into one codepoint on encoding and creates > > surrogate pairs for codepoints outside the BMP on decoding. > > > > Should I apply this to the py3k branch only, or do we want that for > > Python 2.6 too (using str instead of bytes)? > > If it's no effort, I would like to seem this on the trunk also. > > In general, I'm skeptical about the "new features only in 3k" strategy. > Some features can be added easily with no backwards-compatibility issues > in 2.x, and would have normally been added to the next major 2.x release > without much discussion. Agreed, especially since we're planning on backporting much to 2.6. I want to draw the line at *dropping* stuff from 2.6 though (or replacing it, or changing it). 2.6 needs to be *very* compatible with 2.5, in order to lure most users into upgrading to 2.6, which is a prerequisite for porting to 3.0 eventually. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steven.bethard at gmail.com Thu Aug 16 20:47:59 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Thu, 16 Aug 2007 12:47:59 -0600 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: On 8/16/07, Christian Heimes wrote: > skip at pobox.com wrote: > > Alex> The PEP abstract says this proposal will replace the '%' operator, > > [...] > > I agree with Skip, too. The % printf operator is a very useful and > powerful feature. I'm doing newbie support at my university and in > #python. Newbies are often astonished how easy and powerful printf() is > in Python. I like the % format operator, too. It's easy and fast to type > for small jobs. > > I beg you to keep the feature. I agree that the new PEP 3101 style > format is useful and required for more complex string formating. But > please keep a simple one for simple jobs. I honestly can't see the point of keeping this:: >>> '%-10s bought %02i apples for %3.2f' % ('John', 8, 3.78) 'John bought 08 apples for 3.78' alongside this:: >>> '{0:-10} bought {1:02i} apples for {2:3.2f}'.format('John', 8, 3.78) 'John bought 08 apples for 3.78' They're so similar I don't see why you think the latter is no longer "easy and powerful". STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From lists at cheimes.de Thu Aug 16 20:53:31 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 16 Aug 2007 20:53:31 +0200 Subject: [Python-3000] Adding __format__ to object In-Reply-To: <46C468C4.3020907@trueblade.com> References: <46C43E2E.3000308@trueblade.com> <46C44D39.7040408@cheimes.de> <46C468C4.3020907@trueblade.com> Message-ID: <46C49D2B.709@cheimes.de> Eric Smith wrote: > I should have mentioned that's among the things I've already tried. > > But that appears to add methods to 'type', not to an instance of > 'object'. If you do dir(object()): > > You don't see the methods in typeobject.c (__mro__, etc). > > This is pretty much the last hurdle in finishing my implementation of > PEP 3101. The rest of it is either done, or just involves refactoring > existing code. It works for me: $ LC_ALL=C svn diff Objects/typeobject.c Index: Objects/typeobject.c =================================================================== --- Objects/typeobject.c (revision 57099) +++ Objects/typeobject.c (working copy) @@ -2938,6 +2938,8 @@ PyDoc_STR("helper for pickle")}, {"__reduce__", object_reduce, METH_VARARGS, PyDoc_STR("helper for pickle")}, + {"__format__", object_reduce, METH_VARARGS, + PyDoc_STR("helper for pickle")}, {0} }; $ ./python Python 3.0x (py3k:57099M, Aug 16 2007, 20:45:17) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> dir(object()) ['__class__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__'] >>> object().__format__ >>> object().__format__() (, (, , None)) Are you sure that you have changed the correct array? Christian From brett at python.org Thu Aug 16 21:05:57 2007 From: brett at python.org (Brett Cannon) Date: Thu, 16 Aug 2007 12:05:57 -0700 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: [Tonight, the role of old, cranky python-dev'er will be played by Brett Cannon. Don't take this personally, Christian, your email just happened to be last. =)] On 8/16/07, Christian Heimes wrote: > skip at pobox.com wrote: > > Alex> The PEP abstract says this proposal will replace the '%' operator, > > [...] > > I agree with Skip, too. The % printf operator is a very useful and > powerful feature. I'm doing newbie support at my university and in > #python. Newbies are often astonished how easy and powerful printf() is > in Python. I like the % format operator, too. It's easy and fast to type > for small jobs. > But how is:: "{0} is happy to see {1}".format('Brett', 'Christian') that less easier to read than:: "%s is happy to see %s" % ('Brett', 'Christian') ? Yes, PEP 3101 style is more to type but it isn't grievous; we have just been spoiled by the overloading of the % operator. And I don't know how newbies think these days, but I know I find the numeric markers much easier to follow then the '%s', especially if the string ends up becoming long. And if it is the use of a method instead of an operator that the newbies might have issues with, well methods and functions come up quick so they probably won't go long without knowing what is going on. > I beg you to keep the feature. I agree that the new PEP 3101 style > format is useful and required for more complex string formating. But > please keep a simple one for simple jobs. This is where the cranky python-dev'er comes in: PEP 3101 was published in April 2006 which is over a year ago! This is not a new PEP or a new plan. I personally stayed out of the discussions on this subject as I knew reasonable people were keeping an eye on it and I didn't feel I had anything to contribute. That means I just go with what they decide whether I like it or not. I understand the feeling of catching up on a thread and going, "oh no, I don't like that!", but that is the nature of the beast. In my view, if you just don't have the time or energy (which I completely understand not having, don't get me wrong) for a thread, you basically have to defer to the people who do and trust that the proper things were discussed and that the group as a whole (or Guido in those cases where his gut tells him to ignore everyone) is going to make a sound decision. At this point the discussion has gone long enough with Guido participating and agreeing with key decisions, that the only way to get this course of action changed is to come up with really good examples of how the % format is hugely better than PEP 3101 and convince the people involved. But just saying you like %s over {0} is like saying you don't like the decorator syntax: that's nice and all, but that is not a compelling reason to change the decision being made. -Brett From eric+python-dev at trueblade.com Thu Aug 16 21:23:01 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 16 Aug 2007 15:23:01 -0400 Subject: [Python-3000] Adding __format__ to object In-Reply-To: <46C49D2B.709@cheimes.de> References: <46C43E2E.3000308@trueblade.com> <46C44D39.7040408@cheimes.de> <46C468C4.3020907@trueblade.com> <46C49D2B.709@cheimes.de> Message-ID: <46C4A415.7090408@trueblade.com> > Are you sure that you have changed the correct array? Yes, that was the issue. I changed the wrong array. I stupidly assumed that it was one object per file, but of course there's no valid reason to make that assumption. I'm sure I don't have the most best version of this coded up, but that's a problem for another day, and I'll ask for help on that when all of my tests pass. Thanks again for your (and others) help on this. I now have object.__format__ working, so I can finally get back to unicode.__format__ and parsing format specifiers. Eric. From rrr at ronadam.com Thu Aug 16 21:35:33 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 16 Aug 2007 14:35:33 -0500 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: <79990c6b0708160408h79c91a9eq7195497ce43a53cb@mail.gmail.com> References: <46B13ADE.7080901@acm.org> <46BF22D8.2090309@trueblade.com> <46BF3CC7.6010405@acm.org> <46C0EEBF.3010206@ronadam.com> <46C2325D.1010209@ronadam.com> <46C367A8.4040601@ronadam.com> <79990c6b0708160244t51902b6es50540c5684a99345@mail.gmail.com> <46C4250C.3050806@ronadam.com> <79990c6b0708160408h79c91a9eq7195497ce43a53cb@mail.gmail.com> Message-ID: <46C4A705.9030506@ronadam.com> Paul Moore wrote: > On 16/08/07, Ron Adam wrote: >> Currently these particular examples aren't the syntax supported by the PEP. >> It's an alternative/possibly syntax only if there is enough support for a >> serial left to right specification pattern as outlined. > > Ah, I hadn't realised that. I've been skipping most of the > discussions, mainly because of the lack of concrete examples :-) And the discussion has progressed and changed in ways that makes thing a bit more confusing as well. So the earlier parts of it don't connect with the later parts well. It looks like the consensus is to keep something very close to the current %xxx style syntax just without the '%' on the front after all. So this is all academic, although it may be useful sometime in th future. >> I think most of developers here are still looking at various details and >> are still undecided. Do you have a preference for one or the other yet? > > As evidenced by the fact that I failed to notice the difference, I > can't distinguish the two :-) > > All of the examples I've seen are hard to read. As Greg said, I find > that I have to study the format string, mentally breaking it into > parts, before I understand it. This is in complete contrast to > printf-style "%.10s" formats. I'm not at all sure this is anything > more than unfamiliarity, compounded by the fact that most of the > examples I see on the list are relatively complex, or edge cases. But > it's a major barrier to both understanding and acceptance of the new > proposals. Yes, familiarity is a big part. I think the last version was the simplest because it breaks thing up logically to start with, (you don't have to do it mentally), it's just a matter of reading it left to right once you are familiar with the basic idea. But it's still quite a bit different from what others are used to. Enough so that it may take a bit of unlearning when things are changed this much. And enough so that it may seem overly complex at first glance. In most cases it wouldn't be. Anyways... Skip to the new threads where they are discussing what the current status and things left to do are. Even with the older syntax, it will still be better than what we had because you can still create objects with custom formatting if you want to. Most people won't even need that I suspect. Cheers, Ron > I'd still really like to see: > > 1. A simple listing of common cases, maybe taken from something like > stdlib uses of %-formats. Yes, most of them would be pretty trivial. > That's the point! > 2. A *very short* comparison of a few more advanced cases - I'd > suggest formatting floats as fixed width, 2 decimal places (%5.2f), > formatting 8-digit hex (%.8X) and maybe a simple date format > (%Y-%m-%d). Yes, those are the sort of things I consider advanced. > Examples I've seen in the discussion aren't "advanced" in my book, > they are "I'll never use that" :-) > > 3. Another very short list of a couple of things you can do with the > new format, which you can't do with the existing % formats. > Concentrate here on real-world use cases - tabular reports, reordering > fields for internationalisation, things like that. As a data point, > I've never needed to centre a field in a print statement. Heck, I > don't even recall ever needing to specify how the sign of a number was > printed! > > I get the impression that the "clever" new features aren't actually > going to address the sorts of formatting problems I hit a lot. That's > fine, I can write code to do what I want, but there's a sense of YAGNI > about the discussion, because (for example) by the time I need to > format a centred, max-18, min-5 character number with 3 decimal places > and the sign hard to the left, I'm also going to want to dot-fill a > string to 30 characters and insert commas into the number, and I'm > writing code anyway, so why bother with an obscure format string that > only does half the job? (It goes without saying that if the format > string can do everything I want, that argument doesn't work, but then > we get to the complexity issues that hit regular expressions :-)) > > Sorry if this sounds a bit skeptical, but there's a *lot* of > discussion here over a feature I expect to use pretty infrequently. > 99% of my % formats use nothing more than %s! > > Paul. > > From walter at livinglogic.de Thu Aug 16 21:47:19 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Thu, 16 Aug 2007 21:47:19 +0200 Subject: [Python-3000] UTF-32 codecs In-Reply-To: References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de> Message-ID: <46C4A9C7.9060408@livinglogic.de> Guido van Rossum wrote: > On 8/16/07, "Martin v. L?wis" wrote: >> Walter D?rwald schrieb: >>> I have a patch against the py3k branch (http://bugs.python.org/1775604) >>> that adds UTF-32 codecs. On a narrow build it combines surrogate pairs >>> in the unicode object into one codepoint on encoding and creates >>> surrogate pairs for codepoints outside the BMP on decoding. >>> >>> Should I apply this to the py3k branch only, or do we want that for >>> Python 2.6 too (using str instead of bytes)? >> If it's no effort, I would like to seem this on the trunk also. >> >> In general, I'm skeptical about the "new features only in 3k" strategy. >> Some features can be added easily with no backwards-compatibility issues >> in 2.x, and would have normally been added to the next major 2.x release >> without much discussion. > > Agreed, especially since we're planning on backporting much to 2.6. I > want to draw the line at *dropping* stuff from 2.6 though (or > replacing it, or changing it). 2.6 needs to be *very* compatible with > 2.5, in order to lure most users into upgrading to 2.6, which is a > prerequisite for porting to 3.0 eventually. OK, then I'll check it into the py3k branch, and backport it to the trunk. Servus, Walter From skip at pobox.com Thu Aug 16 21:52:21 2007 From: skip at pobox.com (skip at pobox.com) Date: Thu, 16 Aug 2007 14:52:21 -0500 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: <18116.43765.194435.952513@montanaro.dyndns.org> STeVe> I honestly can't see the point of keeping this:: >>>> '%-10s bought %02i apples for %3.2f' % ('John', 8, 3.78) STeVe> 'John bought 08 apples for 3.78' STeVe> alongside this:: >>>> '{0:-10} bought {1:02i} apples for {2:3.2f}'.format('John', 8, 3.78) STeVe> 'John bought 08 apples for 3.78' STeVe> They're so similar I don't see why you think the latter is no STeVe> longer "easy and powerful". You mean other than: * the new is more verbose than the old * the curly braces and [012]: prefixes are just syntactic sugar when converting old to new * in situations where the format string isn't a literal that mechanical translation from old to new won't be possible * lots of people are familiar with the old format, few with the new ? I suppose nothing. Skip From martin at v.loewis.de Thu Aug 16 21:56:50 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 16 Aug 2007 21:56:50 +0200 Subject: [Python-3000] UTF-32 codecs In-Reply-To: <46C4A9C7.9060408@livinglogic.de> References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de> <46C4A9C7.9060408@livinglogic.de> Message-ID: <46C4AC02.3020203@v.loewis.de> > OK, then I'll check it into the py3k branch, and backport it to the trunk. This raises another procedural question: are we still merging from the trunk to the 3k branch, or are they now officially split? If we still merge, and assuming that the implementations are sufficiently similar and live in the same files, it would be better to commit into the trunk, then merge (or wait for somebody else to merge), then apply any modifications that the 3k branch needs. Regards, Martin From skip at pobox.com Thu Aug 16 21:57:33 2007 From: skip at pobox.com (skip at pobox.com) Date: Thu, 16 Aug 2007 14:57:33 -0500 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: <18116.44077.156543.671421@montanaro.dyndns.org> Brett> But how is:: Brett> "{0} is happy to see {1}".format('Brett', 'Christian') Brett> that less easier to read than:: Brett> "%s is happy to see %s" % ('Brett', 'Christian') Brett> ? Yes, PEP 3101 style is more to type but it isn't grievous; we Brett> have just been spoiled by the overloading of the % operator. And Brett> I don't know how newbies think these days, but I know I find the Brett> numeric markers much easier to follow then the '%s', especially Brett> if the string ends up becoming long. If you decide to insert another format token in the middle the new is more error-prone than the old: "{0} asks {1} if he is happy to see {1}".format('Brett', 'Skip', 'Christian') ^^^ whoops vs: "%s asks %s if he is happy to see %s" % ('Brett', 'Skip', 'Christian') Now extend that to format strings with more than a couple expansions. Brett> This is where the cranky python-dev'er comes in: PEP 3101 was Brett> published in April 2006 which is over a year ago! This is not a Brett> new PEP or a new plan. Yes, but Python 3 is more real today than 15 months ago, hence the greater focus now than before. Skip From walter at livinglogic.de Thu Aug 16 22:03:07 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Thu, 16 Aug 2007 22:03:07 +0200 Subject: [Python-3000] UTF-32 codecs In-Reply-To: <46C4AC02.3020203@v.loewis.de> References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de> <46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de> Message-ID: <46C4AD7B.5040808@livinglogic.de> Martin v. L?wis wrote: >> OK, then I'll check it into the py3k branch, and backport it to the trunk. > > This raises another procedural question: are we still merging from the > trunk to the 3k branch, or are they now officially split? > > If we still merge, and assuming that the implementations are > sufficiently similar See below. > and live in the same files, Mostly they do, but there are three new files in Lib/encodings: utf_32.py, utf_32_le.py and utf_32_be.py > it would be better > to commit into the trunk, then merge (or wait for somebody else to > merge), then apply any modifications that the 3k branch needs. A simple merge won't work, because in 3.0 the codec uses bytes and in 2.6 it uses str. Also the call to the decoding error handler has changed, because in 3.0 the error handler could modify the mutable input buffer. Servus, Walter From martin at v.loewis.de Thu Aug 16 22:11:19 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 16 Aug 2007 22:11:19 +0200 Subject: [Python-3000] UTF-32 codecs In-Reply-To: <46C4AD7B.5040808@livinglogic.de> References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de> <46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de> <46C4AD7B.5040808@livinglogic.de> Message-ID: <46C4AF67.1090209@v.loewis.de> > A simple merge won't work, because in 3.0 the codec uses bytes and in > 2.6 it uses str. Also the call to the decoding error handler has > changed, because in 3.0 the error handler could modify the mutable input > buffer. So what's the strategy then? Block the trunk revision from merging? Regards, Martin From lists at cheimes.de Thu Aug 16 22:15:05 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 16 Aug 2007 22:15:05 +0200 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: Brett Cannon wrote: > [Tonight, the role of old, cranky python-dev'er will be played by > Brett Cannon. Don't take this personally, Christian, your email just > happened to be last. =)] hehe :) I don't feel offended. > But how is:: > > "{0} is happy to see {1}".format('Brett', 'Christian') > > that less easier to read than:: > > "%s is happy to see %s" % ('Brett', 'Christian') > > ? Yes, PEP 3101 style is more to type but it isn't grievous; we have > just been spoiled by the overloading of the % operator. And I don't > know how newbies think these days, but I know I find the numeric > markers much easier to follow then the '%s', especially if the string > ends up becoming long. > > And if it is the use of a method instead of an operator that the > newbies might have issues with, well methods and functions come up > quick so they probably won't go long without knowing what is going on. My concerns are partly based on my laziness and my antipathy against '{' and '}'. On my German keyboard I have to move my whole hand to another position to enter a { or }. It's right ALT (Alt Gr) + 7 and 0. The % character is much easier to type for me. :] > This is where the cranky python-dev'er comes in: PEP 3101 was > published in April 2006 which is over a year ago! This is not a new > PEP or a new plan. I personally stayed out of the discussions on this > subject as I knew reasonable people were keeping an eye on it and I > didn't feel I had anything to contribute. That means I just go with > what they decide whether I like it or not. I've read the PEP about an year ago. I was always under the impression that the PEP was going to *add* an alternative and more powerful format to Python. I didn't noticed that the PEP was about a *replacement* for the % format operator. My fault ;) > I understand the feeling of catching up on a thread and going, "oh no, > I don't like that!", but that is the nature of the beast. In my view, > if you just don't have the time or energy (which I completely > understand not having, don't get me wrong) for a thread, you basically > have to defer to the people who do and trust that the proper things > were discussed and that the group as a whole (or Guido in those cases > where his gut tells him to ignore everyone) is going to make a sound > decision. > > At this point the discussion has gone long enough with Guido > participating and agreeing with key decisions, that the only way to > get this course of action changed is to come up with really good > examples of how the % format is hugely better than PEP 3101 and > convince the people involved. But just saying you like %s over {0} is > like saying you don't like the decorator syntax: that's nice and all, > but that is not a compelling reason to change the decision being made. You are right. I'm guilty as charged to be a participant of a red bike shed discussion. :) I'm seeing myself as a small Python user and developer who is trying to get in touch with the gods in the temple of python core development (exaggerated *G*). I've been using Python for about 5 years and I'm trying to give something back to the community. In the past months I've submitted patches for small bugs (low hanging fruits) and I've raised my voice to show my personal - sometimes inadequate - opinion. By the way it's great that the core developers are taking their time to discuss this matter with a newbie. Although it is sometimes disappointing to see that my ideas don't make it into the core I don't feel denied. It gives me the feeling that my work is appreciated but not (yet) good enough to meet the quality standards. I'll stick around and see how I can be of service in the future. Christian From barry at python.org Thu Aug 16 22:16:49 2007 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Aug 2007 16:16:49 -0400 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <18116.43765.194435.952513@montanaro.dyndns.org> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> Message-ID: <18861C49-7121-44B2-B17A-30992DF25E0D@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 16, 2007, at 3:52 PM, skip at pobox.com wrote: > STeVe> I honestly can't see the point of keeping this:: > >>>>> '%-10s bought %02i apples for %3.2f' % ('John', 8, 3.78) > STeVe> 'John bought 08 apples for 3.78' > > STeVe> alongside this:: > >>>>> '{0:-10} bought {1:02i} apples for {2:3.2f}'.format('John', 8, >>>>> 3.78) > STeVe> 'John bought 08 apples for 3.78' > > STeVe> They're so similar I don't see why you think the latter > is no > STeVe> longer "easy and powerful". > > You mean other than: > > * the new is more verbose than the old > * the curly braces and [012]: prefixes are just syntactic sugar > when > converting old to new > * in situations where the format string isn't a literal that > mechanical > translation from old to new won't be possible > * lots of people are familiar with the old format, few with the new There's one other problem that I see, though it might be minor or infrequent enough not to matter. %s positional placeholders are easily to generate programmatically than {#} placeholders. Think about translating this: def make_query(flag1, flag2): base_query = 'SELECT %s from %s WHERE name = %s ' if flag1: base_query += 'AND age = %s ' if flag2: base_query += 'AND height = %s ' base_query = 'AND gender = %s' return base_query - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsSwsXEjvBPtnXfVAQLyPgP+P/bnNZhZUGbAcM6lDJdACVYmEFh3bGDR NJH874DLXp7fsn5iLJ3Fel7eiWqLZ0/lvGYEmAAz/4SYagKFnrYAFTsDKglFroiL bHzZBKHHuf/Db1oNJBcuQakpbhddX0WMu+XxcKXbgUK87tJE4kbaZPTjU8WF5XDW EriR/UZBZ40= =qHQs -----END PGP SIGNATURE----- From brett at python.org Thu Aug 16 22:19:49 2007 From: brett at python.org (Brett Cannon) Date: Thu, 16 Aug 2007 13:19:49 -0700 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <18116.44077.156543.671421@montanaro.dyndns.org> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.44077.156543.671421@montanaro.dyndns.org> Message-ID: On 8/16/07, skip at pobox.com wrote: > > Brett> But how is:: > > Brett> "{0} is happy to see {1}".format('Brett', 'Christian') > > Brett> that less easier to read than:: > > Brett> "%s is happy to see %s" % ('Brett', 'Christian') > > Brett> ? Yes, PEP 3101 style is more to type but it isn't grievous; we > Brett> have just been spoiled by the overloading of the % operator. And > Brett> I don't know how newbies think these days, but I know I find the > Brett> numeric markers much easier to follow then the '%s', especially > Brett> if the string ends up becoming long. > > If you decide to insert another format token in the middle the new is more > error-prone than the old: > > "{0} asks {1} if he is happy to see {1}".format('Brett', 'Skip', 'Christian') > > ^^^ whoops > > vs: > > "%s asks %s if he is happy to see %s" % ('Brett', 'Skip', 'Christian') > > Now extend that to format strings with more than a couple expansions. > Sure, but I find the %s form harder to read honestly. Plus you didn't need to insert in that order; you could have put it as:: "{0} asks {2} if he is happy to see {1}".format("Brett", "Christian", "Skip") Not perfect, but it works. Or you just name your arguments. We are talking simple things here, and as soon as you start to scale you will most likely move to name-based arguments anyway once your quick hack format scheme doesn't hold. > Brett> This is where the cranky python-dev'er comes in: PEP 3101 was > Brett> published in April 2006 which is over a year ago! This is not a > Brett> new PEP or a new plan. > > Yes, but Python 3 is more real today than 15 months ago, hence the greater > focus now than before. Well, for me the "realness" of Py3K is the same now as it was back when Guido created teh p3yk branch. -Brett From steven.bethard at gmail.com Thu Aug 16 22:29:31 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Thu, 16 Aug 2007 14:29:31 -0600 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <18116.43765.194435.952513@montanaro.dyndns.org> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> Message-ID: On 8/16/07, skip at pobox.com wrote: > > STeVe> I honestly can't see the point of keeping this:: > > >>>> '%-10s bought %02i apples for %3.2f' % ('John', 8, 3.78) > STeVe> 'John bought 08 apples for 3.78' > > STeVe> alongside this:: > > >>>> '{0:-10} bought {1:02i} apples for {2:3.2f}'.format('John', 8, 3.78) > STeVe> 'John bought 08 apples for 3.78' > > STeVe> They're so similar I don't see why you think the latter is no > STeVe> longer "easy and powerful". > > You mean other than: > > * the new is more verbose than the old > * the curly braces and [012]: prefixes are just syntactic sugar when > converting old to new > * in situations where the format string isn't a literal that mechanical > translation from old to new won't be possible > * lots of people are familiar with the old format, few with the new > > ? As I understand it, it's already been decided that {}-style formatting will be present in Python 3. So the question is not about the merits of {}-style formatting vs. %-style formatting. That debate's already been had. The question is whether it makes sense to keep %-style formatting around when {}-style formatting is so similar. Since %-style formatting saves at most a couple of characters per specifier, that doesn't seem to justify the massive duplication to me. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From eric+python-dev at trueblade.com Thu Aug 16 22:34:19 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 16 Aug 2007 16:34:19 -0400 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <18116.44077.156543.671421@montanaro.dyndns.org> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.44077.156543.671421@montanaro.dyndns.org> Message-ID: <46C4B4CB.7020001@trueblade.com> skip at pobox.com wrote: > Brett> But how is:: > > Brett> "{0} is happy to see {1}".format('Brett', 'Christian') > > Brett> that less easier to read than:: > > Brett> "%s is happy to see %s" % ('Brett', 'Christian') > > Brett> ? Yes, PEP 3101 style is more to type but it isn't grievous; we > Brett> have just been spoiled by the overloading of the % operator. And > Brett> I don't know how newbies think these days, but I know I find the > Brett> numeric markers much easier to follow then the '%s', especially > Brett> if the string ends up becoming long. > > If you decide to insert another format token in the middle the new is more > error-prone than the old: > > "{0} asks {1} if he is happy to see {1}".format('Brett', 'Skip', 'Christian') > > ^^^ whoops > > vs: > > "%s asks %s if he is happy to see %s" % ('Brett', 'Skip', 'Christian') The whole point of the indexes is that the order now doesn't matter: "{0} asks {2} if he is happy to see {1}".format('Brett', 'Christian', 'Skip') If you really have many items to expand, name them, and then it matters even less: "{asker} asks {askee} if he is happy to see {person}".format(asker='Brett', person='Christian', askee='Skip') Which I think is way better than the %-formatting equivalent. From fdrake at acm.org Thu Aug 16 22:47:31 2007 From: fdrake at acm.org (Fred Drake) Date: Thu, 16 Aug 2007 16:47:31 -0400 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.44077.156543.671421@montanaro.dyndns.org> Message-ID: On Aug 16, 2007, at 4:19 PM, Brett Cannon wrote: > Well, for me the "realness" of Py3K is the same now as it was back > when Guido created teh p3yk branch. Somehow, I suspect the reality of Python 3.0 for any individual is strong tied to the amount of time they've had to read the emails, PEPs, blogs(!) and what-not related to it. From where I stand, I'm not yet certain that 2.5 is real. ;-/ -Fred -- Fred Drake From guido at python.org Thu Aug 16 23:15:27 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Aug 2007 14:15:27 -0700 Subject: [Python-3000] UTF-32 codecs In-Reply-To: <46C4AC02.3020203@v.loewis.de> References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de> <46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de> Message-ID: On 8/16/07, "Martin v. L?wis" wrote: > > OK, then I'll check it into the py3k branch, and backport it to the trunk. > > This raises another procedural question: are we still merging from the > trunk to the 3k branch, or are they now officially split? I plan to set up merging again; I still think it's useful. > If we still merge, and assuming that the implementations are > sufficiently similar and live in the same files, it would be better > to commit into the trunk, then merge (or wait for somebody else to > merge), then apply any modifications that the 3k branch needs. Yes. But no biggie for new code if it's done the other way around. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From walter at livinglogic.de Thu Aug 16 23:32:17 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Thu, 16 Aug 2007 23:32:17 +0200 Subject: [Python-3000] UTF-32 codecs In-Reply-To: <46C4AF67.1090209@v.loewis.de> References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de> <46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de> <46C4AD7B.5040808@livinglogic.de> <46C4AF67.1090209@v.loewis.de> Message-ID: <46C4C261.2080304@livinglogic.de> Martin v. L?wis wrote: >> A simple merge won't work, because in 3.0 the codec uses bytes and in >> 2.6 it uses str. Also the call to the decoding error handler has >> changed, because in 3.0 the error handler could modify the mutable input >> buffer. > > So what's the strategy then? Block the trunk revision from merging? I've never used svnmerge, so I don't know what the strategy for automatic merging would be. What I would do is check in the patch for the py3k branch, then apply the patch to the trunk, get it to work and check it in. Servus, Walter From guido at python.org Thu Aug 16 23:43:49 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Aug 2007 14:43:49 -0700 Subject: [Python-3000] UTF-32 codecs In-Reply-To: <46C4C261.2080304@livinglogic.de> References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de> <46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de> <46C4AD7B.5040808@livinglogic.de> <46C4AF67.1090209@v.loewis.de> <46C4C261.2080304@livinglogic.de> Message-ID: On 8/16/07, Walter D?rwald wrote: > Martin v. L?wis wrote: > > >> A simple merge won't work, because in 3.0 the codec uses bytes and in > >> 2.6 it uses str. Also the call to the decoding error handler has > >> changed, because in 3.0 the error handler could modify the mutable input > >> buffer. > > > > So what's the strategy then? Block the trunk revision from merging? > > I've never used svnmerge, so I don't know what the strategy for > automatic merging would be. What I would do is check in the patch for > the py3k branch, then apply the patch to the trunk, get it to work and > check it in. Go right ahead. I'll clean up afterwards. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Thu Aug 16 23:50:50 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 16 Aug 2007 17:50:50 -0400 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: On 8/16/07, Brett Cannon wrote: > But how is:: > "{0} is happy to see {1}".format('Brett', 'Christian') > that less easier to read than:: > "%s is happy to see %s" % ('Brett', 'Christian') Excluding minor aesthetics, they are equivalent. The new format only has advantages when the formatting string is complicated. So the real question is: Should we keep the old way for backwards compatibility? Or should we force people to upgrade their code (and their translation data files), even if their code doesn't benefit, and wouldn't need to change otherwise? Remember that most of the time, the old way worked fine, and it will be the new way that seems redundant. Remember also that 2to3 won't get this change entirely right. Remember that people can already subclass string.Template if they really do need fancy logic. Note that this removal alone would go a huge way toward preventing code that works in both 3.0 and 2.5 (or 2.2). > But just saying you like %s over {0} is > like saying you don't like the decorator syntax: that's nice and all, > but that is not a compelling reason to change the decision being made. It is more like saying you prefer the old style of rebinding the name. Adding the new format is one thing; removing the old is another. -jJ From martin at v.loewis.de Thu Aug 16 23:52:07 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 16 Aug 2007 23:52:07 +0200 Subject: [Python-3000] UTF-32 codecs In-Reply-To: <46C4C261.2080304@livinglogic.de> References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de> <46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de> <46C4AD7B.5040808@livinglogic.de> <46C4AF67.1090209@v.loewis.de> <46C4C261.2080304@livinglogic.de> Message-ID: <46C4C707.5010302@v.loewis.de> > I've never used svnmerge, so I don't know what the strategy for > automatic merging would be. Reading about svnmerge tells me that you probably should use "svnmerge merge -r -S trunk -M" on the 3k branch; this should record the revision from the branch as already (manually) merged. Regards, Martin From brett at python.org Fri Aug 17 00:11:42 2007 From: brett at python.org (Brett Cannon) Date: Thu, 16 Aug 2007 15:11:42 -0700 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: On 8/16/07, Christian Heimes wrote: [SNIP] > You are right. I'm guilty as charged to be a participant of a red bike > shed discussion. :) I'm seeing myself as a small Python user and > developer who is trying to get in touch with the gods in the temple of > python core development (exaggerated *G*). I've been using Python for > about 5 years and I'm trying to give something back to the community. Which is great! > In > the past months I've submitted patches for small bugs (low hanging > fruits) and I've raised my voice to show my personal - sometimes > inadequate - opinion. > Your opinion can't be inadequate; perk of it being subjective. =) And I partially did the email as I did to explain to you and to other people who have not been around for a long time how stuff like this goes when a late-in-the-process objection tends to be taken. > By the way it's great that the core developers are taking their time to > discuss this matter with a newbie. Reasonable discussions are fine, newbie or not. Perk of the Python community being friendly is most people are happy to answer questions. > Although it is sometimes > disappointing to see that my ideas don't make it into the core I don't > feel denied. That's good. We all have ideas that have been rejected (including Guido =). > It gives me the feeling that my work is appreciated but not > (yet) good enough to meet the quality standards. > > I'll stick around and see how I can be of service in the future. Wonderful! -Brett From martin at v.loewis.de Fri Aug 17 00:35:39 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 17 Aug 2007 00:35:39 +0200 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <18116.43765.194435.952513@montanaro.dyndns.org> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> Message-ID: <46C4D13B.8070608@v.loewis.de> > * the new is more verbose than the old > * the curly braces and [012]: prefixes are just syntactic sugar when > converting old to new > * in situations where the format string isn't a literal that mechanical > translation from old to new won't be possible > * lots of people are familiar with the old format, few with the new I think most of these points are irrelevant. The curly braces are not just syntactic sugar, at least the opening brace is not; the digit is not syntactic sugar in the case of message translations. That lots of people are familiar with the old format and only few are with the new is merely a matter of time. As Guido van Rossum says: the number of Python programs yet to be written is hopefully larger than the number of programs already written (or else continuing the Python development is futile). That the new format is more verbose than the old one is true, but only slightly so - typing .format is actually easier for me than typing % (which requires a shift key). Porting programs that have computed format strings is indeed a challenge. The theory here is that this affects only few programs. Regards, Martin From martin at v.loewis.de Fri Aug 17 00:45:32 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 17 Aug 2007 00:45:32 +0200 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <18861C49-7121-44B2-B17A-30992DF25E0D@python.org> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> <18861C49-7121-44B2-B17A-30992DF25E0D@python.org> Message-ID: <46C4D38C.8000402@v.loewis.de> > There's one other problem that I see, though it might be minor or > infrequent enough not to matter. %s positional placeholders are > easily to generate programmatically than {#} placeholders. Think > about translating this: > > def make_query(flag1, flag2): > base_query = 'SELECT %s from %s WHERE name = %s ' > if flag1: > base_query += 'AND age = %s ' > if flag2: > base_query += 'AND height = %s ' > base_query = 'AND gender = %s' > return base_query Of course, *this* specific example is flawed: you are likely to pass the result to a DB-API library, which supports %s as a placeholder independent of whether strings support the modulo operator (it is then flawed also in that you don't typically have placeholders for the result fields and table name - not sure whether you even can in DB-API). If I had to generate a computed format string, I'd probably use the named placeholders, rather than the indexed ones. base_query = 'SELECT {field} FROM {table} WHERE name = {name} ' if flag1: base_query += 'AND age = {age} ' if flag2: base_query += 'AND height = {height} ' base_query += 'AND gender = {gender}' return base_query Regards, Martin From janssen at parc.com Fri Aug 17 01:09:32 2007 From: janssen at parc.com (Bill Janssen) Date: Thu, 16 Aug 2007 16:09:32 PDT Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: <07Aug16.160935pdt."57996"@synergy1.parc.xerox.com> > But just saying you like %s over {0} is > like saying you don't like the decorator syntax: that's nice and all, > but that is not a compelling reason to change the decision being made. My guess is that the folks who object to it are, like me, folks who primarily work in Python and C, and don't want to try to keep two really different sets of formatting codes in their heads. Folks who work primarily in Python and .NET (or whereever these new-fangled codes come from) probably feel the opposite. But it's a mistake to say that it's just "taste"; for this one there's a real cognitive load that affects ease of programming. Bill From janssen at parc.com Fri Aug 17 01:16:10 2007 From: janssen at parc.com (Bill Janssen) Date: Thu, 16 Aug 2007 16:16:10 PDT Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <46C4D13B.8070608@v.loewis.de> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> <46C4D13B.8070608@v.loewis.de> Message-ID: <07Aug16.161620pdt."57996"@synergy1.parc.xerox.com> > I think most of these points are irrelevant. The curly braces are not > just syntactic sugar, at least the opening brace is not; the digit > is not syntactic sugar in the case of message translations. Are there "computation of matching braces" problems here? > That lots of people are familiar with the old format and only few are > with the new is merely a matter of time. Sure, but the problem is that there are a lot of Python programmers *now* and learning the new syntax imposes a burden on all of *them*. Who cares how many people know the syntax in the future? > That the new format is more verbose than the old one is true, but only > slightly so - typing .format is actually easier for me than typing > % (which requires a shift key). I don't mind the switch to ".format"; it's the formatting codes that I don't want to see changed. > Porting programs that have computed format strings is indeed a > challenge. The theory here is that this affects only few programs. I think you'll find it's more than a few. This issue is obviously an iceberg issue; most folks never thought you were going to remove the old formatting codes, just add a newer and more capable set. Bill From alexandre at peadrop.com Fri Aug 17 01:43:10 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Thu, 16 Aug 2007 19:43:10 -0400 Subject: [Python-3000] [Python-Dev] Documentation switch imminent In-Reply-To: References: Message-ID: On 8/16/07, Neal Norwitz wrote: > On 8/15/07, Georg Brandl wrote: > > Okay, I made the switch. I tagged the state of both Python branches > > before the switch as tags/py{26,3k}-before-rstdocs/. > > http://docs.python.org/dev/ > http://docs.python.org/dev/3.0/ > Is it just me, or the markup of the new docs is quite heavy? alex% wget -q -O- http://docs.python.org/api/genindex.html | wc -c 77868 alex% wget -q -O- http://docs.python.org/dev/3.0/genindex.html | wc -c 918359 Firefox, on my fairly recent machine, takes ~5 seconds rendering the index of the new docs from disk, compared to a fraction of a second for the old one. -- Alexandre From greg.ewing at canterbury.ac.nz Fri Aug 17 03:05:32 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 17 Aug 2007 13:05:32 +1200 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: <46C4F45C.1040904@canterbury.ac.nz> Guido van Rossum wrote: > The only added wrinkle is that you can also write > {0!r} to *force* using repr() on the value. What if you want a field width with that? Will you be able to write {0!10r} or will it have to be {0!r:10}? -- Greg From greg.ewing at canterbury.ac.nz Fri Aug 17 03:16:20 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 17 Aug 2007 13:16:20 +1200 Subject: [Python-3000] More PEP 3101 changes incoming In-Reply-To: References: <633738EA-3F45-4933-BF81-0410BACBAF1E@hawaii.edu> Message-ID: <46C4F6E4.9050602@canterbury.ac.nz> Jim Jewett wrote: > You can't write int(s) if you're passing a mapping (or tuple) from > someone else; at best you can copy the mapping and modify certain > values. Maybe this could be handled using a wrapper object that takes a sequence or mapping and a collection of functions to be applied to specified items. "i = {0}, x = {1}".format(convert(stuff, int, float)) or using names "i = {i}, x = {x}".format(convert(stuff, i = int, x = float)) This would have the advantage of allowing arbitrarily complex conversions while keeping the potentially verbose specifications of those conversions out of the format string. Plus the convert() wrapper could be useful in its own right for other things besides formatting. -- Greg From greg.ewing at canterbury.ac.nz Fri Aug 17 03:36:51 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 17 Aug 2007 13:36:51 +1200 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <18116.44077.156543.671421@montanaro.dyndns.org> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.44077.156543.671421@montanaro.dyndns.org> Message-ID: <46C4FBB3.4060607@canterbury.ac.nz> skip at pobox.com wrote: > "{0} asks {1} if he is happy to see {1}".format('Brett', 'Skip', 'Christian') > > ^^^ whoops This kind of mistake is easy to spot if the format string is short. If it's not short, it would be better to use names: "{asker} asks {askee} if he is happy to see {friend}".format( asker = 'Brett', askee = 'Skip', friend = 'Christian') -- Greg From skip at pobox.com Fri Aug 17 03:58:28 2007 From: skip at pobox.com (skip at pobox.com) Date: Thu, 16 Aug 2007 20:58:28 -0500 Subject: [Python-3000] Is it possible to avoid filenames with spaces? Message-ID: <18117.196.883933.981385@montanaro.dyndns.org> Given that filenames containing spaces make things a bit more challenging for tools like find and xargs would it be possible to get rid of them in the Python source tree? I only see two files containing spaces at the moment: ./Mac/Icons/Disk Image.icns ./Mac/Icons/Python Folder.icns Can they be easily renamed without creating havoc somewhere else? Thx, Skip From greg.ewing at canterbury.ac.nz Fri Aug 17 04:03:21 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 17 Aug 2007 14:03:21 +1200 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <46C4D38C.8000402@v.loewis.de> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> <18861C49-7121-44B2-B17A-30992DF25E0D@python.org> <46C4D38C.8000402@v.loewis.de> Message-ID: <46C501E9.5030408@canterbury.ac.nz> Martin v. L?wis wrote: > (it is then flawed also in that you don't typically > have placeholders for the result fields and table name - not sure > whether you even can in DB-API). It probably depends on the underlying DB, but most DB interfaces I know of don't allow such a thing. My biggest gripe about the DB-API is that you have to use one of about five different ways of marking parameters *depending on the DB*. Which kind of defeats the purpose of having a DB-independent API in the first place... -- Greg From guido at python.org Fri Aug 17 04:11:11 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Aug 2007 19:11:11 -0700 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <46C4F45C.1040904@canterbury.ac.nz> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <46C4F45C.1040904@canterbury.ac.nz> Message-ID: On 8/16/07, Greg Ewing wrote: > Guido van Rossum wrote: > > The only added wrinkle is that you can also write > > {0!r} to *force* using repr() on the value. > > What if you want a field width with that? Will you be > able to write {0!10r} or will it have to be {0!r:10}? The latter. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 17 05:53:29 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 16 Aug 2007 20:53:29 -0700 Subject: [Python-3000] Two new test failures (one OSX PPC only) Message-ID: I see two new tests failing tonight: - test_xmlrpc fails on all platforms I have. This is due to several new tests that were merged in from the trunk; presumably those tests need changes due to str vs. bytes. - test_codecs fails on OSX PPC only. This is in the new UTF-32 codecs; probably a byte order issue. There's still one leak that Neal would like to see fixed, in test_zipimport. Instructions to reproduce: in a *debug* build, run this command: ./python Lib/test/regrtest.py -R1:1: test_zipimport This reports 29 leaked references. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Fri Aug 17 06:40:37 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 17 Aug 2007 06:40:37 +0200 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <07Aug16.161620pdt."57996"@synergy1.parc.xerox.com> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> <46C4D13B.8070608@v.loewis.de> <07Aug16.161620pdt."57996"@synergy1.parc.xerox.com> Message-ID: <46C526C5.20906@v.loewis.de> Bill Janssen schrieb: >> I think most of these points are irrelevant. The curly braces are not >> just syntactic sugar, at least the opening brace is not; the digit >> is not syntactic sugar in the case of message translations. > > Are there "computation of matching braces" problems here? I don't understand: AFAIK, the braces don't nest, so the closing brace just marks the end of the place holder (which in the printf format is defined by the type letter). >> That lots of people are familiar with the old format and only few are >> with the new is merely a matter of time. > > Sure, but the problem is that there are a lot of Python programmers > *now* and learning the new syntax imposes a burden on all of *them*. > Who cares how many people know the syntax in the future? That is the problem of any change, right? People know the current language; they don't know the changed language. Still, there are conditions when change is "allowed". For example, the syntax of the except clause changes in Py3k, replacing the comma with "as"; this is also a burden for all Python programmers, yet the change has been made. >> That the new format is more verbose than the old one is true, but only >> slightly so - typing .format is actually easier for me than typing >> % (which requires a shift key). > > I don't mind the switch to ".format"; it's the formatting codes that I > don't want to see changed. Ok. For these, the "more verbose" argument holds even less: in the most simple case, it's just one character more verbose per placeholder. Regards, Martin From eric+python-dev at trueblade.com Fri Aug 17 07:09:21 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Fri, 17 Aug 2007 01:09:21 -0400 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <46C526C5.20906@v.loewis.de> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> <46C4D13B.8070608@v.loewis.de> <07Aug16.161620pdt."57996"@synergy1.parc.xerox.com> <46C526C5.20906@v.loewis.de> Message-ID: <46C52D81.9010003@trueblade.com> Martin v. L?wis wrote: > Bill Janssen schrieb: >>> I think most of these points are irrelevant. The curly braces are not >>> just syntactic sugar, at least the opening brace is not; the digit >>> is not syntactic sugar in the case of message translations. >> Are there "computation of matching braces" problems here? > > I don't understand: AFAIK, the braces don't nest, so the closing > brace just marks the end of the place holder (which in the printf > format is defined by the type letter). I don't understand, either. The braces do nest, but I don't know what the "computation of matching brace" problem is. This test currently passes in my implementation: self.assertEqual('{0[{bar}]}'.format('abcdefg', bar=4), 'e') This shows nesting braces working. Bill, what problem are you thinking of? Eric. From rrr at ronadam.com Fri Aug 17 07:49:33 2007 From: rrr at ronadam.com (Ron Adam) Date: Fri, 17 Aug 2007 00:49:33 -0500 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <46C526C5.20906@v.loewis.de> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> <46C4D13B.8070608@v.loewis.de> <07Aug16.161620pdt."57996"@synergy1.parc.xerox.com> <46C526C5.20906@v.loewis.de> Message-ID: <46C536ED.6060500@ronadam.com> Martin v. Lo"wis wrote: > Bill Janssen schrieb: >>> I think most of these points are irrelevant. The curly braces are not >>> just syntactic sugar, at least the opening brace is not; the digit >>> is not syntactic sugar in the case of message translations. >> Are there "computation of matching braces" problems here? > > I don't understand: AFAIK, the braces don't nest, so the closing > brace just marks the end of the place holder (which in the printf > format is defined by the type letter). They can nest. See these tests that Eric posted earlier, second example. Eric Smith wrote: > These tests all pass: > > self.assertEquals('{0[{1}]}'.format('abcdefg', 4), 'e') > self.assertEquals('{foo[{bar}]}'.format(foo='abcdefg', bar=4), 'e') > self.assertEqual("My name is {0}".format('Fred'), "My name is Fred") > self.assertEqual("My name is {0[name]}".format(dict(name='Fred')), > "My name is Fred") > self.assertEqual("My name is {0} :-{{}}".format('Fred'), > "My name is Fred :-{}") So expressions like the following might be difficult to spell. '{{foo}{bar}}'.format(foo='FOO', bar='BAR', FOOBAR = "Fred") This would probably produce an unmatched brace error on the first '}'. >>> That lots of people are familiar with the old format and only few are >>> with the new is merely a matter of time. >> Sure, but the problem is that there are a lot of Python programmers >> *now* and learning the new syntax imposes a burden on all of *them*. >> Who cares how many people know the syntax in the future? > > That is the problem of any change, right? People know the current > language; they don't know the changed language. Still, there are > conditions when change is "allowed". > > For example, the syntax of the except clause changes in Py3k, replacing > the comma with "as"; this is also a burden for all Python programmers, > yet the change has been made. > >>> That the new format is more verbose than the old one is true, but only >>> slightly so - typing .format is actually easier for me than typing >>> % (which requires a shift key). >> I don't mind the switch to ".format"; it's the formatting codes that I >> don't want to see changed. > > Ok. For these, the "more verbose" argument holds even less: in the most > simple case, it's just one character more verbose per placeholder. I think having more verbose syntax is a matter of trade offs. I don't mind one or two characters if it saves me from writing 20 or 30 someplace else. Such is the case if you can't do something in the format string, it means you need to do it someplace else that will often take up a few lines rather than a few characters. _RON From g.brandl at gmx.net Fri Aug 17 08:16:55 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 17 Aug 2007 08:16:55 +0200 Subject: [Python-3000] [Python-Dev] Documentation switch imminent In-Reply-To: References: Message-ID: Alexandre Vassalotti schrieb: > On 8/16/07, Neal Norwitz wrote: >> On 8/15/07, Georg Brandl wrote: >> > Okay, I made the switch. I tagged the state of both Python branches >> > before the switch as tags/py{26,3k}-before-rstdocs/. >> >> http://docs.python.org/dev/ >> http://docs.python.org/dev/3.0/ >> > > Is it just me, or the markup of the new docs is quite heavy? Docutils markup tends to be a bit verbose, yes, but the index is not even generated by them. > alex% wget -q -O- http://docs.python.org/api/genindex.html | wc -c > 77868 > alex% wget -q -O- http://docs.python.org/dev/3.0/genindex.html | wc -c > 918359 The new index includes all documents (api, lib, ref, ...), so the ratio is more like 678000 : 950000 (using 2.6 here), and the difference can be explained quite easily because (a) sphinx uses different anchor names ("mailbox.Mailbox.__contains__" vs "l2h-849") and the hrefs have to include subdirs like "reference/". I've now removed leading spaces in the index output, and the character count is down to 850000. > Firefox, on my fairly recent machine, takes ~5 seconds rendering the > index of the new docs from disk, compared to a fraction of a second > for the old one. But you're right that rendering is slow there. It may be caused by the more complicated CSS... perhaps the index should be split up in several pages. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From walter at livinglogic.de Fri Aug 17 10:20:22 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Fri, 17 Aug 2007 10:20:22 +0200 Subject: [Python-3000] Two new test failures (one OSX PPC only) In-Reply-To: References: Message-ID: <46C55A46.4040608@livinglogic.de> Guido van Rossum wrote: > I see two new tests failing tonight: > > - test_xmlrpc fails on all platforms I have. This is due to several > new tests that were merged in from the trunk; presumably those tests > need changes due to str vs. bytes. > > - test_codecs fails on OSX PPC only. This is in the new UTF-32 codecs; > probably a byte order issue. We have a PPC Mac here at work, so I can investigate where the problem lies. > [...] Servus, Walter From victor.stinner at haypocalc.com Fri Aug 17 11:23:00 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 17 Aug 2007 11:23:00 +0200 Subject: [Python-3000] format() method and % operator Message-ID: <200708171123.00674.victor.stinner@haypocalc.com> Hi, I read many people saying that "{0} {1}".format('Hello', 'World') is easiert to read than "%s %s" % ('Hello', 'World') But for me it looks to be more complex: we have to maintain indexes (0, 1, 2, ...), marker is different ({0} != {1}), etc. I didn't read the PEP nor all email discussions. So can you tell me if it would be possible to write simply: "{} {}".format('Hello', 'World') Victor Stinner aka haypo http://hachoir.org/ From eric+python-dev at trueblade.com Fri Aug 17 12:45:54 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Fri, 17 Aug 2007 06:45:54 -0400 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <46C536ED.6060500@ronadam.com> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> <46C4D13B.8070608@v.loewis.de> <07Aug16.161620pdt."57996"@synergy1.parc.xerox.com> <46C526C5.20906@v.loewis.de> <46C536ED.6060500@ronadam.com> Message-ID: <46C57C62.3040000@trueblade.com> Ron Adam wrote: > > Martin v. Lo"wis wrote: >> Bill Janssen schrieb: >>>> I think most of these points are irrelevant. The curly braces are not >>>> just syntactic sugar, at least the opening brace is not; the digit >>>> is not syntactic sugar in the case of message translations. >>> Are there "computation of matching braces" problems here? >> I don't understand: AFAIK, the braces don't nest, so the closing >> brace just marks the end of the place holder (which in the printf >> format is defined by the type letter). > So expressions like the following might be difficult to spell. > > '{{foo}{bar}}'.format(foo='FOO', bar='BAR', FOOBAR = "Fred") > > This would probably produce an unmatched brace error on the first '}'. Ah, I see. I hadn't thought of that case. You're correct, it gives an error on the first '}'. This is a case where allowing whitespace would solve the problem, sort of like C++'s "< <" template issue (which I think they've since addressed). I'm not sure if it's worth doing, though: '{ {foo}{bar} }'.format(foo='FOO', bar='BAR', FOOBAR = "Fred") On second thought, that won't work. For example, this currently doesn't work: '{0[{foo}{bar}]}'.format({'FOOBAR': 'abc'}, foo='FOO', bar='BAR') KeyError: 'FOO' I can't decide if that's a bug or not. Eric. From skip at pobox.com Fri Aug 17 15:02:35 2007 From: skip at pobox.com (skip at pobox.com) Date: Fri, 17 Aug 2007 08:02:35 -0500 Subject: [Python-3000] AtheOS? Message-ID: <18117.40043.124714.520626@montanaro.dyndns.org> I just got rid of BeOS and RiscOS. I'm about to launch into Irix and Tru64, the other two listed on http://wiki.python.org/moin/Py3kDeprecated I wonder, should AtheOS support be removed as well? According to Wikipedia it's no longer being developed, having been superceded by something called Syllable. According to the Syllable Wikipedia page, it supports Python. Skip From lists at cheimes.de Fri Aug 17 14:44:55 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 17 Aug 2007 14:44:55 +0200 Subject: [Python-3000] Two new test failures (one OSX PPC only) In-Reply-To: References: Message-ID: <46C59847.9040501@cheimes.de> Guido van Rossum wrote: > There's still one leak that Neal would like to see fixed, in > test_zipimport. Instructions to reproduce: in a *debug* build, run > this command: > > ./python Lib/test/regrtest.py -R1:1: test_zipimport > > This reports 29 leaked references. > Err, this patch is using a better name for the data. *blush* Index: Modules/zipimport.c =================================================================== --- Modules/zipimport.c (Revision 57115) +++ Modules/zipimport.c (Arbeitskopie) @@ -851,10 +851,11 @@ } buf[data_size] = '\0'; - if (compress == 0) { /* data is not compressed */ - raw_data = PyBytes_FromStringAndSize(buf, data_size); - return raw_data; - } + if (compress == 0) { /* data is not compressed */ + data = PyBytes_FromStringAndSize(buf, data_size); + Py_DECREF(raw_data); + return data; + } /* Decompress with zlib */ decompress = get_decompress_func(); Christian From lists at cheimes.de Fri Aug 17 14:41:07 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 17 Aug 2007 14:41:07 +0200 Subject: [Python-3000] Two new test failures (one OSX PPC only) In-Reply-To: References: Message-ID: <46C59763.5080408@cheimes.de> Guido van Rossum wrote: > There's still one leak that Neal would like to see fixed, in > test_zipimport. Instructions to reproduce: in a *debug* build, run > this command: > > ./python Lib/test/regrtest.py -R1:1: test_zipimport > > This reports 29 leaked references. > I found the problem in Modules/zipimport.c around line 850. raw_data wasn't DECREFed. LC_ALL=C svn diff Modules/zipimport.c Index: Modules/zipimport.c =================================================================== --- Modules/zipimport.c (revision 57115) +++ Modules/zipimport.c (working copy) @@ -851,10 +851,11 @@ } buf[data_size] = '\0'; - if (compress == 0) { /* data is not compressed */ - raw_data = PyBytes_FromStringAndSize(buf, data_size); - return raw_data; - } + if (compress == 0) { /* data is not compressed */ + decompress = PyBytes_FromStringAndSize(buf, data_size); + Py_DECREF(raw_data); + return decompress; + } /* Decompress with zlib */ decompress = get_decompress_func(); Christian From ncoghlan at gmail.com Fri Aug 17 16:00:25 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 18 Aug 2007 00:00:25 +1000 Subject: [Python-3000] [Python-Dev] Documentation switch imminent In-Reply-To: References: Message-ID: <46C5A9F9.1070002@gmail.com> Georg Brandl wrote: >> Firefox, on my fairly recent machine, takes ~5 seconds rendering the >> index of the new docs from disk, compared to a fraction of a second >> for the old one. > > But you're right that rendering is slow there. It may be caused by the > more complicated CSS... perhaps the index should be split up in several > pages. Splitting out the C API index would probably be a reasonable start. (It may also be worth considering ignoring a leading Py or _Py in that index - many of the C API index entries end up under just two index groups). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Fri Aug 17 16:32:15 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Aug 2007 07:32:15 -0700 Subject: [Python-3000] AtheOS? In-Reply-To: <18117.40043.124714.520626@montanaro.dyndns.org> References: <18117.40043.124714.520626@montanaro.dyndns.org> Message-ID: I'd get in touch with the last known maintainer of the AtheOS port, or some of the Syllabe maintainers (if all else fails, spam their wiki's front page :-). If it's just a renaming maybe they're relying on the same #ifdefs still. Thanks for doing this BTW! I love cleanups. (If there are other people interested in helping out with cleanups, getting rid of deprecated behavior is also a great starter project. Look for DeprecationWarning in Python or C code.) On 8/17/07, skip at pobox.com wrote: > I just got rid of BeOS and RiscOS. I'm about to launch into Irix and Tru64, > the other two listed on > > http://wiki.python.org/moin/Py3kDeprecated > > I wonder, should AtheOS support be removed as well? According to Wikipedia > it's no longer being developed, having been superceded by something called > Syllable. According to the Syllable Wikipedia page, it supports Python. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rrr at ronadam.com Fri Aug 17 17:37:52 2007 From: rrr at ronadam.com (Ron Adam) Date: Fri, 17 Aug 2007 10:37:52 -0500 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <46C57C62.3040000@trueblade.com> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> <46C4D13B.8070608@v.loewis.de> <07Aug16.161620pdt."57996"@synergy1.parc.xerox.com> <46C526C5.20906@v.loewis.de> <46C536ED.6060500@ronadam.com> <46C57C62.3040000@trueblade.com> Message-ID: <46C5C0D0.7040609@ronadam.com> Eric Smith wrote: > Ron Adam wrote: >> >> Martin v. Lo"wis wrote: >>> Bill Janssen schrieb: >>>>> I think most of these points are irrelevant. The curly braces are not >>>>> just syntactic sugar, at least the opening brace is not; the digit >>>>> is not syntactic sugar in the case of message translations. >>>> Are there "computation of matching braces" problems here? >>> I don't understand: AFAIK, the braces don't nest, so the closing >>> brace just marks the end of the place holder (which in the printf >>> format is defined by the type letter). > >> So expressions like the following might be difficult to spell. >> >> '{{foo}{bar}}'.format(foo='FOO', bar='BAR', FOOBAR = "Fred") >> >> This would probably produce an unmatched brace error on the first '}'. > > Ah, I see. I hadn't thought of that case. You're correct, it gives an > error on the first '}'. This is a case where allowing whitespace would > solve the problem, sort of like C++'s "< <" template issue (which I > think they've since addressed). I'm not sure if it's worth doing, though: > > '{ {foo}{bar} }'.format(foo='FOO', bar='BAR', FOOBAR = "Fred") > > On second thought, that won't work. For example, this currently doesn't > work: > '{0[{foo}{bar}]}'.format({'FOOBAR': 'abc'}, foo='FOO', bar='BAR') > KeyError: 'FOO' > > I can't decide if that's a bug or not. I think it will be a bug. Some one is bound to run into it at some point if they are using nested braces routinely. Although most people never will, so it may be a limitation we can live with. White space will only work on the name side, not the specifier side of the colon as it's significant on that side. _RON From guido at python.org Fri Aug 17 17:42:56 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Aug 2007 08:42:56 -0700 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <46C57C62.3040000@trueblade.com> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> <46C4D13B.8070608@v.loewis.de> <46C526C5.20906@v.loewis.de> <46C536ED.6060500@ronadam.com> <46C57C62.3040000@trueblade.com> Message-ID: I think you should just disallow {...} for the start of the variable reference. I.e. {0.{1}} is okay, but {{1}} is not. On 8/17/07, Eric Smith wrote: > Ron Adam wrote: > > > > Martin v. Lo"wis wrote: > >> Bill Janssen schrieb: > >>>> I think most of these points are irrelevant. The curly braces are not > >>>> just syntactic sugar, at least the opening brace is not; the digit > >>>> is not syntactic sugar in the case of message translations. > >>> Are there "computation of matching braces" problems here? > >> I don't understand: AFAIK, the braces don't nest, so the closing > >> brace just marks the end of the place holder (which in the printf > >> format is defined by the type letter). > > > So expressions like the following might be difficult to spell. > > > > '{{foo}{bar}}'.format(foo='FOO', bar='BAR', FOOBAR = "Fred") > > > > This would probably produce an unmatched brace error on the first '}'. > > Ah, I see. I hadn't thought of that case. You're correct, it gives an > error on the first '}'. This is a case where allowing whitespace would > solve the problem, sort of like C++'s "< <" template issue (which I > think they've since addressed). I'm not sure if it's worth doing, though: > > '{ {foo}{bar} }'.format(foo='FOO', bar='BAR', FOOBAR = "Fred") > > On second thought, that won't work. For example, this currently doesn't > work: > '{0[{foo}{bar}]}'.format({'FOOBAR': 'abc'}, foo='FOO', bar='BAR') > KeyError: 'FOO' > > I can't decide if that's a bug or not. > > Eric. > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From cjw at sympatico.ca Fri Aug 17 16:42:51 2007 From: cjw at sympatico.ca (Colin J. Williams) Date: Fri, 17 Aug 2007 10:42:51 -0400 Subject: [Python-3000] format() method and % operator In-Reply-To: <200708171123.00674.victor.stinner@haypocalc.com> References: <200708171123.00674.victor.stinner@haypocalc.com> Message-ID: Victor Stinner wrote: > Hi, > > I read many people saying that > "{0} {1}".format('Hello', 'World') > is easiert to read than > "%s %s" % ('Hello', 'World') > Not me. > > But for me it looks to be more complex: we have to maintain indexes (0, 1, > 2, ...), marker is different ({0} != {1}), etc. > > > I didn't read the PEP nor all email discussions. Ditto Colin W. From martin at v.loewis.de Fri Aug 17 18:17:48 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 17 Aug 2007 18:17:48 +0200 Subject: [Python-3000] Nested brackets (Was: Please don't kill the % operator...) In-Reply-To: <46C5C0D0.7040609@ronadam.com> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> <46C4D13B.8070608@v.loewis.de> <07Aug16.161620pdt."57996"@synergy1.parc.xerox.com> <46C526C5.20906@v.loewis.de> <46C536ED.6060500@ronadam.com> <46C57C62.3040000@trueblade.com> <46C5C0D0.7040609@ronadam.com> Message-ID: <46C5CA2C.1000306@v.loewis.de> >> On second thought, that won't work. For example, this currently >> doesn't work: >> '{0[{foo}{bar}]}'.format({'FOOBAR': 'abc'}, foo='FOO', bar='BAR') >> KeyError: 'FOO' >> >> I can't decide if that's a bug or not. > > I think it will be a bug. Some one is bound to run into it at some > point if they are using nested braces routinely. Although most people > never will, so it may be a limitation we can live with. OK, I think both the PEP and the understanding must get some serious tightening. According to the PEP, "The rules for parsing an item key are very simple" - unfortunately without specifying what the rules actually *are*, other than "If it starts with a digit, then its treated as a number, otherwise it is used as a string". So we know the key is a string (it does not start with a digit); the next question: which string? The PEP says, as an implementation note, "The str.format() function will have a minimalist parser which only attempts to figure out when it is "done" with an identifier (by finding a '.' or a ']', or '}', etc.)." This probably means to say that it looks for ']' in this context (a getitem operator), so then the string would be "{foo}{bar}". I would expect that this produces KeyError: '{foo}{bar}' I.e. according to the PEP a) nested curly braces are not supported in compound field names (*), the only valid operators are '.' and '[]'. b) concatenation of strings in keys is not supported (again because the only operators are getattr and getitem) I now agree with Bill that we have a "computation of matching braces problem", surprisingly: people disagree with each other and with the PEP what the meaning of the braces in above example is. Regards, Martin (*) they are supported in format specifiers From alexandre at peadrop.com Fri Aug 17 18:28:43 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Fri, 17 Aug 2007 12:28:43 -0400 Subject: [Python-3000] [Python-Dev] Documentation switch imminent In-Reply-To: References: Message-ID: On 8/17/07, Georg Brandl wrote: > Alexandre Vassalotti schrieb: > > On 8/16/07, Neal Norwitz wrote: > >> On 8/15/07, Georg Brandl wrote: > >> > Okay, I made the switch. I tagged the state of both Python branches > >> > before the switch as tags/py{26,3k}-before-rstdocs/. > >> > >> http://docs.python.org/dev/ > >> http://docs.python.org/dev/3.0/ > >> > > > > Is it just me, or the markup of the new docs is quite heavy? > > Docutils markup tends to be a bit verbose, yes, but the index is not > even generated by them. > > > alex% wget -q -O- http://docs.python.org/api/genindex.html | wc -c > > 77868 > > alex% wget -q -O- http://docs.python.org/dev/3.0/genindex.html | wc -c > > 918359 > > The new index includes all documents (api, lib, ref, ...), so the ratio > is more like 678000 : 950000 (using 2.6 here), and the difference can be > explained quite easily because (a) sphinx uses different anchor names > ("mailbox.Mailbox.__contains__" vs "l2h-849") and the hrefs have to > include subdirs like "reference/". Ah, I didn't notice that index included all the documents. That explains the huge size increase. However, would it be possible to keep the indexes separated? I noticed that I find I want more quickly when the indexes are separated. > I've now removed leading spaces in the index output, and the character > count is down to 850000. > > > Firefox, on my fairly recent machine, takes ~5 seconds rendering the > > index of the new docs from disk, compared to a fraction of a second > > for the old one. > > But you're right that rendering is slow there. It may be caused by the > more complicated CSS... perhaps the index should be split up in several > pages. > I disabled CSS-support (with View->Page Style->No Style), but it didn't affect the initial rendering speed. However, scrolling was *much* faster without CSS. -- Alexandre From martin at v.loewis.de Fri Aug 17 18:32:01 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 17 Aug 2007 18:32:01 +0200 Subject: [Python-3000] AtheOS? In-Reply-To: References: <18117.40043.124714.520626@montanaro.dyndns.org> Message-ID: <46C5CD81.1040903@v.loewis.de> Guido van Rossum schrieb: > I'd get in touch with the last known maintainer of the AtheOS port, or > some of the Syllabe maintainers (if all else fails, spam their wiki's > front page :-). If it's just a renaming maybe they're relying on the > same #ifdefs still. The port was originally contributed by Octavian Cerna (sf:tavyc), in bugs.python.org/488073. He did that because the version of Python that came with AtheOS was 1.5.2. > Thanks for doing this BTW! I love cleanups. It took some effort to integrate this for 2.3, so I feel sad that this is now all ripped out again. I'm not certain the code gets cleaner that way - just smaller. Perhaps I should just reject patches that port Python to minor platforms in the future, as the chance is high that the original contributor won't keep it up-to-date, and nobody else will, either, for several years. Regards, Martin From guido at python.org Fri Aug 17 18:40:25 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Aug 2007 09:40:25 -0700 Subject: [Python-3000] AtheOS? In-Reply-To: <46C5CD81.1040903@v.loewis.de> References: <18117.40043.124714.520626@montanaro.dyndns.org> <46C5CD81.1040903@v.loewis.de> Message-ID: On 8/17/07, "Martin v. L?wis" wrote: > Guido van Rossum schrieb: > > I'd get in touch with the last known maintainer of the AtheOS port, or > > some of the Syllabe maintainers (if all else fails, spam their wiki's > > front page :-). If it's just a renaming maybe they're relying on the > > same #ifdefs still. > > The port was originally contributed by Octavian Cerna (sf:tavyc), > in bugs.python.org/488073. He did that because the version of Python > that came with AtheOS was 1.5.2. > > > Thanks for doing this BTW! I love cleanups. > > It took some effort to integrate this for 2.3, so I feel sad that this > is now all ripped out again. I'm not certain the code gets cleaner > that way - just smaller. Perhaps I should just reject patches that > port Python to minor platforms in the future, as the chance is high > that the original contributor won't keep it up-to-date, and nobody else > will, either, for several years. True, I'm also a bit sad -- my pride used to be the number of platforms that ran Python. But minority platforms need to learn that they should support the established conventions rather than invent their own if they want to be able to run most open source software. And I think they *are* learning. Now all we need to do is get rid of all the silly difference between the *BSD versions. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From walter at livinglogic.de Fri Aug 17 18:42:22 2007 From: walter at livinglogic.de (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=) Date: Fri, 17 Aug 2007 18:42:22 +0200 Subject: [Python-3000] UTF-32 codecs In-Reply-To: References: <46C48518.3070701@livinglogic.de> <46C48C71.3000400@v.loewis.de> <46C4A9C7.9060408@livinglogic.de> <46C4AC02.3020203@v.loewis.de> <46C4AD7B.5040808@livinglogic.de> <46C4AF67.1090209@v.loewis.de> <46C4C261.2080304@livinglogic.de> Message-ID: <46C5CFEE.8030301@livinglogic.de> Guido van Rossum wrote: > On 8/16/07, Walter D?rwald wrote: >> Martin v. L?wis wrote: >> >>>> A simple merge won't work, because in 3.0 the codec uses bytes and in >>>> 2.6 it uses str. Also the call to the decoding error handler has >>>> changed, because in 3.0 the error handler could modify the mutable input >>>> buffer. >>> So what's the strategy then? Block the trunk revision from merging? >> I've never used svnmerge, so I don't know what the strategy for >> automatic merging would be. What I would do is check in the patch for >> the py3k branch, then apply the patch to the trunk, get it to work and >> check it in. > > Go right ahead. Done! The bug surfacing on Mac is fixed too (stupid typo). > I'll clean up afterwards. Thanks! Servus, Walter From jimjjewett at gmail.com Fri Aug 17 18:47:12 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 17 Aug 2007 12:47:12 -0400 Subject: [Python-3000] format() method and % operator In-Reply-To: <200708171123.00674.victor.stinner@haypocalc.com> References: <200708171123.00674.victor.stinner@haypocalc.com> Message-ID: On 8/17/07, Victor Stinner wrote: > But for me it looks to be more complex: we have to maintain indexes (0, 1, > 2, ...), marker is different ({0} != {1}), etc. > ... tell me if it would be possible to write simply: > "{} {}".format('Hello', 'World') It would be possible to support that, but I think it was excluded intentionally, as a nudge toward more robust formatting strings. (1) Translators may need to reorder the arguments. So the format string might change from "{0} xxx {1}" to a more idiomatic (in the other language) "yyy {1} {0}" This doesn't by itself rule out {} for the default case, but being explicit makes things more parallel, and easier to verify. (2) You already have to maintain indices mentally; it is just bug-prone on strings long enough for the formatting language to matter. For example, if gossip="%s told %s that %" changes to gossip="%s told %s on %s that %" Then in some other part of the program, you will also have to change gossip % (name1, name2, msg) to gossip % (name1, date, name2, msg) Using a name mapping (speaker=, ... hearer=..., ) is a better answer, but explicit numbers are a halfway measure. -jJ From skip at pobox.com Fri Aug 17 19:02:23 2007 From: skip at pobox.com (skip at pobox.com) Date: Fri, 17 Aug 2007 12:02:23 -0500 Subject: [Python-3000] AtheOS? In-Reply-To: <46C5CD81.1040903@v.loewis.de> References: <18117.40043.124714.520626@montanaro.dyndns.org> <46C5CD81.1040903@v.loewis.de> Message-ID: <18117.54431.306667.256066@montanaro.dyndns.org> Martin> It took some effort to integrate this for 2.3, so I feel sad Martin> that this is now all ripped out again. I'm not certain the code Martin> gets cleaner that way - just smaller. Well, fewer #ifdefs can't be a bad thing. I noticed this on the Syllable Wikipedia page: It was forked from the stagnant AtheOS in July 2002. It would appear that AtheOS has been defunct for quite awhile. Skip From martin at v.loewis.de Fri Aug 17 19:22:18 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 17 Aug 2007 19:22:18 +0200 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> Message-ID: <46C5D94A.4030808@v.loewis.de> > The odd thing here is that RFC 2047 (MIME) seems to be about encoding > non-ASCII character sets in ASCII. So the spec is kind of odd here. > The actual bytes on the wire seem to be ASCII, but they may an > interpretation where those ASCII bytes represent a non-ASCII string. HTTP is fairly confused about usage of non-ASCII characters in headers. For example, RFC 2617 specifies that, for Basic authentication, userid and password are *TEXT (excluding : in the userid); it then says that user-pass is base64-encoded. It nowhere says what the charset of userid or password should be. People now interpret that as saying: it's TEXT, so you need to encode it according to RFC 2047 before using it in a header, requiring that the userid first gets MIME-Q-encoded (say, or B), and then the result gets base64-encoded again, then transmitted. Neither web browsers nor web servers implement that correctly today. But in short, the intention seems to be that the HTTP headers are strict ASCII on the wire, with non-ASCII encoded using MIME header encoding. A library implementing that in Python should certainly use bytes at the network (stream) side, and strings at the application side. Even though the format is human-readable, the protocol is byte-oriented, not character-oriented. Regards, Martin From jimjjewett at gmail.com Fri Aug 17 19:27:57 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 17 Aug 2007 13:27:57 -0400 Subject: [Python-3000] PEP 3101 clarification requests Message-ID: The PEP says: The general form of a standard format specifier is: [[fill]align][sign][width][.precision][type] but then says: A zero fill character without an alignment flag implies an alignment type of '='. In the above form, how can you get a fill character without an alignment character? And why would you want to support it; It just makes the width look like an (old-style) octal. (The spec already says that you need the alignment if you use a digit other than zero.) -------------- The explicit conversion flag is limited to "r" and "s", but I assume that can be overridden in a Formatter subclass. That possibility might be worth mentioning explicitly. -------------- 'check_unused_args' is used to implement checking for unused arguments ... The intersection of these two sets will be the set of unused args. Huh? I *think* the actual intent is (args union kwargs)-used. I can't find an intersection in there. ----------------- This can easily be done by overriding get_named() as follows: I assume that should be get_value. class NamespaceFormatter(Formatter): def __init__(self, namespace={}, flags=0): Formatter.__init__(self, flags) but the Formatter class took no init parameters -- should flags be added to the Formatter constructor, or taken out of here? The get_value override can be expressed more simply as def get_value(self, key, args, kwds): try: # simplify even more by assuming PEP 3135? super(NamespaceFormatter, self).get_value(key, args, kwds) except KeyError: return self.namespace[name] The example usage then takes globals()... fmt = NamespaceFormatter(globals()) greeting = "hello" print(fmt("{greeting}, world!")) Is there now a promise that the objects returned by locals() and globals() will be "live", so that they would reflect the new value of "greeting", even though it was set after the Formatter was created? -jJ From martin at v.loewis.de Fri Aug 17 19:36:09 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 17 Aug 2007 19:36:09 +0200 Subject: [Python-3000] AtheOS? In-Reply-To: <18117.54431.306667.256066@montanaro.dyndns.org> References: <18117.40043.124714.520626@montanaro.dyndns.org> <46C5CD81.1040903@v.loewis.de> <18117.54431.306667.256066@montanaro.dyndns.org> Message-ID: <46C5DC89.5040207@v.loewis.de> > Martin> It took some effort to integrate this for 2.3, so I feel sad > Martin> that this is now all ripped out again. I'm not certain the code > Martin> gets cleaner that way - just smaller. > > Well, fewer #ifdefs can't be a bad thing. By that principle, it would be best if Python supported only a single platform - I would chose Linux (that would also save me creating Windows binaries :-) Fewer ifdefs are a bad thing if they also go along with fewer functionality, or worse portability. As I said, people contributed their time to write this code (in this case, it took me several hours of work, to understand and adjust the patch being contributed), and I do find it bad, in principle, that this work is now all declared wasted. I'm in favor of removing code that is clearly not needed anymore, and I (sadly) agree to removing the AtheOS port - although it's not obvious to me that it isn't needed anymore. My only plea is that PEP 11 gets followed strictly, i.e. that code is only removed after users of a platform have been given the chance to object. If it isn't followed, I better withdraw it (I notice that AtheOS is listed for "unsupported" status in 2.6, so in this case, it's fine). Regards, Martin From steven.bethard at gmail.com Fri Aug 17 20:13:05 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Fri, 17 Aug 2007 12:13:05 -0600 Subject: [Python-3000] format() method and % operator In-Reply-To: <200708171123.00674.victor.stinner@haypocalc.com> References: <200708171123.00674.victor.stinner@haypocalc.com> Message-ID: On 8/17/07, Victor Stinner wrote: > I didn't read the PEP nor all email discussions. So can you tell me if it > would be possible to write simply: > "{} {}".format('Hello', 'World') If you really want to write this, I suggest adding the following helper function to your code somewhere:: >>> def fix_format(fmt): ... def get_index(match, indices=itertools.count()): ... return str(indices.next()) ... return re.sub(r'(?<={)(?=})', get_index, fmt) ... >>> fix_format('{} {}') '{0} {1}' >>> fix_format('{} {} blah {}') '{0} {1} blah {2}' That way, if you really want to bypass the precautions that the new format strings try to take for you, you can do it in only four lines of code. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From eric+python-dev at trueblade.com Fri Aug 17 21:03:46 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Fri, 17 Aug 2007 15:03:46 -0400 Subject: [Python-3000] Looking for advice on PEP 3101 implementation details Message-ID: <46C5F112.5050101@trueblade.com> I'm refactoring the sandbox implementation, and I need to add the code that parses the standard format specifiers to py3k. Since strings, ints, and floats share same format specifiers, I want to have only a single parser. My first question is: where should this parser code live? Should I create a file Python/format.c, or is there a better place? Should the .h file be Include/format.h? I also need to have C code that is called by both str.format, and that is also used by the Formatter implementation. So my second question is: should I create a Module/_format.c for this code? And why do some of these modules have leading underscores? Is it a problem if str.format uses code in Module/_format.c? Where would the .h file for this code go, if str.format (implemented in unicodeobject.c) needs to get access to it? Thanks for your help, and ongoing patience with a Python internals newbie (but C/C++ veteran). Eric. PS: I realize that both of my questions have multiple parts. Sorry if that's confusing. From barry at python.org Fri Aug 17 21:03:11 2007 From: barry at python.org (Barry Warsaw) Date: Fri, 17 Aug 2007 15:03:11 -0400 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: <46C5D94A.4030808@v.loewis.de> References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> <46C5D94A.4030808@v.loewis.de> Message-ID: <786213B3-2067-41DA-9EE0-75ECB78B240A@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 17, 2007, at 1:22 PM, Martin v. L?wis wrote: > A library implementing that in Python should certainly use bytes > at the network (stream) side, and strings at the application side. > Even though the format is human-readable, the protocol is byte- > oriented, > not character-oriented. I should point out that the email package does not really intend to operate from the wire endpoints. For example, for sending a message over an smtp connection it expects that something like smtplib would properly transform \n to \r\n as required by RFC 2821. It's a bit dicier on the input side since you could envision a milter or something taking an on-the-wire email representation from an smptd and parsing it into an internal representation. As I'm working on the email package I'm realizing that classes like the parser and generator need to be stricter about how they interpret their input, and that both use cases are reasonable in many situations. Sometimes you want the parser to accept strings, but bytes are not always unreasonable. Similarly with generating output. Internally though, I feel fairly strongly that an email message should be represented as strings, though sometimes (certainly for idempotency) you still need to carry around the charset (i.e. encoding). Headers are an example of this. The email package conflates 8-bit strings and bytes all over the place and I'm trying now to make its semantics much clearer. Ideally, the package would be well suited not only for wire-to-wire and all-internal uses, but also related domains like HTTP and other RFC 2822-like contexts. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsXw8HEjvBPtnXfVAQIjrAP+LJ5X3CPqYYMpTZHl3WQeMPq1p4SA36yo exM518OJl/10i5DGDCxnwdVylnlQpvKG+wnjNCXSdfEf9O/Fk63tDrpGqlGBNBkx lNGcHl/s2b+vMm8uhkqu0d1wjOo90od8HFtMA3Y1iSsJw73F4/6sZ7XPR6ERd0yU o1EIR1sHuwE= =pE1O -----END PGP SIGNATURE----- From brett at python.org Fri Aug 17 21:23:46 2007 From: brett at python.org (Brett Cannon) Date: Fri, 17 Aug 2007 12:23:46 -0700 Subject: [Python-3000] AtheOS? In-Reply-To: <18117.40043.124714.520626@montanaro.dyndns.org> References: <18117.40043.124714.520626@montanaro.dyndns.org> Message-ID: On 8/17/07, skip at pobox.com wrote: > I just got rid of BeOS and RiscOS. Just so you know, Skip, BeOS still has a maintainer on the 2.x branch. Whether we want to continue support past 2.x is another question (as Guido says in another email, it's a hassle and so we should try to minimize the OS support to those that follow convention). > I'm about to launch into Irix and Tru64, > the other two listed on > > http://wiki.python.org/moin/Py3kDeprecated > > I wonder, should AtheOS support be removed as well? According to Wikipedia > it's no longer being developed, having been superceded by something called > Syllable. According to the Syllable Wikipedia page, it supports Python. AtheOS has been slated for removal after 2.6 already, so you should be able to get rid of it. I couldn't get a hold of a maintainer for it. -Brett From guido at python.org Fri Aug 17 21:36:37 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Aug 2007 12:36:37 -0700 Subject: [Python-3000] Looking for advice on PEP 3101 implementation details In-Reply-To: <46C5F112.5050101@trueblade.com> References: <46C5F112.5050101@trueblade.com> Message-ID: On 8/17/07, Eric Smith wrote: > I'm refactoring the sandbox implementation, and I need to add the code > that parses the standard format specifiers to py3k. Since strings, > ints, and floats share same format specifiers, I want to have only a > single parser. Really? Strings support only a tiny subset of the numeric mini-language (only [-]N[.N]). > My first question is: where should this parser code live? Should I > create a file Python/format.c, or is there a better place? Should the > .h file be Include/format.h? Is it only callable from C? Or is it also callable from Python? If so, how would Python access it? > I also need to have C code that is called by both str.format, and that > is also used by the Formatter implementation. > > So my second question is: should I create a Module/_format.c for this > code? And why do some of these modules have leading underscores? Is it > a problem if str.format uses code in Module/_format.c? Where would the > .h file for this code go, if str.format (implemented in unicodeobject.c) > needs to get access to it? > > Thanks for your help, and ongoing patience with a Python internals > newbie (but C/C++ veteran). Unless the plan is for it to be importable from Python, it should not live in Modules. Modules with an underscore are typically imported only by a "wrapper" .py module (e.g. _hashlib.c vs. hashlib.py). Modules without an underscore are for direct import (though there are a few legacy exceptions, e.g. socket.c should really be _socket.c). Putting it in Modules makes it harder to access from C, as those modules are dynamically loaded. If you can't put it in floatobject.c, and it's not for import, you could create a new file under Python/. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From eric+python-dev at trueblade.com Fri Aug 17 22:14:31 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Fri, 17 Aug 2007 16:14:31 -0400 Subject: [Python-3000] Looking for advice on PEP 3101 implementation details In-Reply-To: References: <46C5F112.5050101@trueblade.com> Message-ID: <46C601A7.6050408@trueblade.com> Guido van Rossum wrote: > On 8/17/07, Eric Smith wrote: >> I'm refactoring the sandbox implementation, and I need to add the code >> that parses the standard format specifiers to py3k. Since strings, >> ints, and floats share same format specifiers, I want to have only a >> single parser. > > Really? Strings support only a tiny subset of the numeric > mini-language (only [-]N[.N]). I think strings are: [[fill]align][width][.precision][type] ints are: [[fill]align][sign][width][type] floats are the full thing: [[fill]align][sign][width][.precision][type] They seem similar enough that a single parser would make sense. Is it acceptable to put this parse in unicodeobject.c, and have it callable by floatobject.c and longobject.c? I'm okay with that, I just want to make sure I'm not violating some convention that objects don't call into each other's implementation files. >> My first question is: where should this parser code live? Should I >> create a file Python/format.c, or is there a better place? Should the >> .h file be Include/format.h? > > Is it only callable from C? Or is it also callable from Python? If so, > how would Python access it? I think the parser only needs to be callable from C. >> I also need to have C code that is called by both str.format, and that >> is also used by the Formatter implementation. >> >> So my second question is: should I create a Module/_format.c for this >> code? And why do some of these modules have leading underscores? Is it >> a problem if str.format uses code in Module/_format.c? Where would the >> .h file for this code go, if str.format (implemented in unicodeobject.c) >> needs to get access to it? >> >> Thanks for your help, and ongoing patience with a Python internals >> newbie (but C/C++ veteran). > > Unless the plan is for it to be importable from Python, it should not > live in Modules. Modules with an underscore are typically imported > only by a "wrapper" .py module (e.g. _hashlib.c vs. hashlib.py). > Modules without an underscore are for direct import (though there are > a few legacy exceptions, e.g. socket.c should really be _socket.c). The PEP calls for a string.Formatter class, that is subclassable in Python code. I was originally thinking that this class would be written in Python, but now I'm not so sure. Let me digest your answers here and I'll re-read the PEP, and see where it takes me. > Putting it in Modules makes it harder to access from C, as those > modules are dynamically loaded. If you can't put it in floatobject.c, > and it's not for import, you could create a new file under Python/. Okay. Thanks for the help. Eric. From skip at pobox.com Fri Aug 17 22:17:19 2007 From: skip at pobox.com (skip at pobox.com) Date: Fri, 17 Aug 2007 15:17:19 -0500 Subject: [Python-3000] AtheOS? In-Reply-To: References: <18117.40043.124714.520626@montanaro.dyndns.org> Message-ID: <18118.591.126364.329836@montanaro.dyndns.org> Brett> On 8/17/07, skip at pobox.com wrote: >> I just got rid of BeOS and RiscOS. Brett> Just so you know, Skip, BeOS still has a maintainer on the 2.x Brett> branch. Whether we want to continue support past 2.x is another Brett> question (as Guido says in another email, it's a hassle and so we Brett> should try to minimize the OS support to those that follow Brett> convention). I was going by the list on the wiki. BeOS was on that list. I removed it in a single checkin so if it's decided in the near future to put it back that should be easy to do. >> I'm about to launch into Irix and Tru64, >> the other two listed on >> >> http://wiki.python.org/moin/Py3kDeprecated >> >> I wonder, should AtheOS support be removed as well? According to >> Wikipedia it's no longer being developed, having been superceded by >> something called Syllable. According to the Syllable Wikipedia page, >> it supports Python. Brett> AtheOS has been slated for removal after 2.6 already, so you Brett> should be able to get rid of it. I couldn't get a hold of a Brett> maintainer for it. I'm curious about this Syllable thing. It is a fork of AtheOS, appears to be currently maintained, and advertises that Python is supported. I posted a note to a discussion forum asking whether Syllable relies on the AtheOS bits: http://www.syllable.org/discussion.php?topic_id=2320 and got a reply back saying that yes, they do use the AtheOS stuff. I will hold off on removing the AtheOS bits until we clear up things with the Syllable folks. Skip From alexandre at peadrop.com Fri Aug 17 22:50:24 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Fri, 17 Aug 2007 16:50:24 -0400 Subject: [Python-3000] AtheOS? In-Reply-To: <18117.54431.306667.256066@montanaro.dyndns.org> References: <18117.40043.124714.520626@montanaro.dyndns.org> <46C5CD81.1040903@v.loewis.de> <18117.54431.306667.256066@montanaro.dyndns.org> Message-ID: [disclaimer: I am a clueless newbie in the portability area.] On 8/17/07, skip at pobox.com wrote: > > Martin> It took some effort to integrate this for 2.3, so I feel sad > Martin> that this is now all ripped out again. I'm not certain the code > Martin> gets cleaner that way - just smaller. > > Well, fewer #ifdefs can't be a bad thing. > Perhaps, it would be a good idea to take Plan9's approach to portability -- i.e., you develop an extreme allergy to code filled with #if, #ifdef, #else, #elseif; localize system dependencies in separate files and hide them behind interfaces. By the way, there is a great chapter about portability in The Practice of Programming, by Brian W. Kernighan and Rob Pike (http://plan9.bell-labs.com/cm/cs/tpop/). That is where I first learned about this approach. -- Alexandre From janssen at parc.com Fri Aug 17 22:54:38 2007 From: janssen at parc.com (Bill Janssen) Date: Fri, 17 Aug 2007 13:54:38 PDT Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: <786213B3-2067-41DA-9EE0-75ECB78B240A@python.org> References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> <46C5D94A.4030808@v.loewis.de> <786213B3-2067-41DA-9EE0-75ECB78B240A@python.org> Message-ID: <07Aug17.135444pdt."57996"@synergy1.parc.xerox.com> > Ideally, the package would be well suited not only for wire-to-wire > and all-internal uses, but also related domains like HTTP and other > RFC 2822-like contexts. But that's exactly why the internal representation should be bytes, not strings. HTTP's use of MIME, for instance, uses "binary" quite a lot. > Internally though, I feel fairly strongly that an email > message should be represented as strings, though sometimes (certainly > for idempotency) you still need to carry around the charset (i.e. > encoding). What if you've got a PNG as one of the multipart components? With a Content-Transfer-Encoding of "binary"? There's no way to represent that as a string. I wonder if we're misunderstanding each other here. The "mail message" itself is essentially a binary data structure, not a sequence of strings, though many of its fields consist of carefully specified string values. Is that what you're saying? That when decoding the message, the fields which are string-valued should be kept as strings in the internal Python representation of the message? Bill From guido at python.org Fri Aug 17 22:58:51 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Aug 2007 13:58:51 -0700 Subject: [Python-3000] Looking for advice on PEP 3101 implementation details In-Reply-To: <46C601A7.6050408@trueblade.com> References: <46C5F112.5050101@trueblade.com> <46C601A7.6050408@trueblade.com> Message-ID: On 8/17/07, Eric Smith wrote: > Guido van Rossum wrote: > > On 8/17/07, Eric Smith wrote: > >> I'm refactoring the sandbox implementation, and I need to add the code > >> that parses the standard format specifiers to py3k. Since strings, > >> ints, and floats share same format specifiers, I want to have only a > >> single parser. > > > > Really? Strings support only a tiny subset of the numeric > > mini-language (only [-]N[.N]). > > I think strings are: > [[fill]align][width][.precision][type] The fill doesn't do anything in 2.x. > ints are: > [[fill]align][sign][width][type] I thought the sign came first. But it appears both orders are accepted. > floats are the full thing: > [[fill]align][sign][width][.precision][type] > > They seem similar enough that a single parser would make sense. Is it > acceptable to put this parse in unicodeobject.c, and have it callable by > floatobject.c and longobject.c? I'm okay with that, I just want to make > sure I'm not violating some convention that objects don't call into each > other's implementation files. Sure, that's fine. > >> My first question is: where should this parser code live? Should I > >> create a file Python/format.c, or is there a better place? Should the > >> .h file be Include/format.h? > > > > Is it only callable from C? Or is it also callable from Python? If so, > > how would Python access it? > > I think the parser only needs to be callable from C. Great. > >> I also need to have C code that is called by both str.format, and that > >> is also used by the Formatter implementation. > >> > >> So my second question is: should I create a Module/_format.c for this > >> code? And why do some of these modules have leading underscores? Is it > >> a problem if str.format uses code in Module/_format.c? Where would the > >> .h file for this code go, if str.format (implemented in unicodeobject.c) > >> needs to get access to it? > >> > >> Thanks for your help, and ongoing patience with a Python internals > >> newbie (but C/C++ veteran). > > > > Unless the plan is for it to be importable from Python, it should not > > live in Modules. Modules with an underscore are typically imported > > only by a "wrapper" .py module (e.g. _hashlib.c vs. hashlib.py). > > Modules without an underscore are for direct import (though there are > > a few legacy exceptions, e.g. socket.c should really be _socket.c). > > The PEP calls for a string.Formatter class, that is subclassable in > Python code. I was originally thinking that this class would be written > in Python, but now I'm not so sure. Let me digest your answers here and > I'll re-read the PEP, and see where it takes me. Also talk to Talin, we had long discussions about this at some point. I think the Formatter class can be written in Python, because none of the C code involved in the built-in format() needs it. > > Putting it in Modules makes it harder to access from C, as those > > modules are dynamically loaded. If you can't put it in floatobject.c, > > and it's not for import, you could create a new file under Python/. > > Okay. Thanks for the help. You're welcome. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rrr at ronadam.com Fri Aug 17 23:03:59 2007 From: rrr at ronadam.com (Ron Adam) Date: Fri, 17 Aug 2007 16:03:59 -0500 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: <46C57C62.3040000@trueblade.com> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> <46C4D13B.8070608@v.loewis.de> <07Aug16.161620pdt."57996"@synergy1.parc.xerox.com> <46C526C5.20906@v.loewis.de> <46C536ED.6060500@ronadam.com> <46C57C62.3040000@trueblade.com> Message-ID: <46C60D3F.1000808@ronadam.com> Eric Smith wrote: > Ron Adam wrote: >> >> Martin v. Lo"wis wrote: >>> Bill Janssen schrieb: >>>>> I think most of these points are irrelevant. The curly braces are not >>>>> just syntactic sugar, at least the opening brace is not; the digit >>>>> is not syntactic sugar in the case of message translations. >>>> Are there "computation of matching braces" problems here? >>> I don't understand: AFAIK, the braces don't nest, so the closing >>> brace just marks the end of the place holder (which in the printf >>> format is defined by the type letter). > >> So expressions like the following might be difficult to spell. >> >> '{{foo}{bar}}'.format(foo='FOO', bar='BAR', FOOBAR = "Fred") >> >> This would probably produce an unmatched brace error on the first '}'. > > Ah, I see. I hadn't thought of that case. You're correct, it gives an > error on the first '}'. This is a case where allowing whitespace would > solve the problem, sort of like C++'s "< <" template issue (which I > think they've since addressed). I'm not sure if it's worth doing, though: > > '{ {foo}{bar} }'.format(foo='FOO', bar='BAR', FOOBAR = "Fred") > > On second thought, that won't work. For example, this currently doesn't > work: > '{0[{foo}{bar}]}'.format({'FOOBAR': 'abc'}, foo='FOO', bar='BAR') > KeyError: 'FOO' > > I can't decide if that's a bug or not. I think if we escaped the braces with '\' it will work nicer. I used the following to test the idea and it seems to work and should convert to C without any trouble. So to those who can say, would something like this be an ok solution? def vformat(self, format_string, args, kwargs): # Needs unused args check code. while 1: front, field, back = self._get_inner_field(format_string) if not field: break key, sep, spec = field.partition(':') value = self.get_value(key, args, kwargs) result = self.format_field(value, spec) format_string = front + result + back return format_string.replace('\{', '{').replace('\}', '}') def _get_inner_field(self, s): # Get an inner most field from right to left. end = 0 while end < len(s): if s[end] == '}' and not self._is_escaped(s, end, '}'): break end += 1 if end == len(s): return s, '', '' start = end - 1 while start >= 0: if s[start] == '{' and not self._is_escaped(s, start, '{'): break start -= 1 if start < 0: raise(ValueError, "missmatched braces") return s[:start], s[start+1:end], s[end+1:] def _is_escaped(self, s, i, char): # Determine if the char is escaped with '\'. if s[i] != char or i == 0: return False i -= 1 n = 0 while i >= 0 and s[i] == '\\': n += 1 i -= 1 return n % 2 == 1 From brett at python.org Fri Aug 17 23:06:48 2007 From: brett at python.org (Brett Cannon) Date: Fri, 17 Aug 2007 14:06:48 -0700 Subject: [Python-3000] AtheOS? In-Reply-To: <18118.591.126364.329836@montanaro.dyndns.org> References: <18117.40043.124714.520626@montanaro.dyndns.org> <18118.591.126364.329836@montanaro.dyndns.org> Message-ID: On 8/17/07, skip at pobox.com wrote: > > Brett> On 8/17/07, skip at pobox.com wrote: > >> I just got rid of BeOS and RiscOS. > > Brett> Just so you know, Skip, BeOS still has a maintainer on the 2.x > Brett> branch. Whether we want to continue support past 2.x is another > Brett> question (as Guido says in another email, it's a hassle and so we > Brett> should try to minimize the OS support to those that follow > Brett> convention). > > I was going by the list on the wiki. BeOS was on that list. Don't know who created the list. > I removed it > in a single checkin so if it's decided in the near future to put it back > that should be easy to do. > OK. I will contact the BeOS maintainer to see if they are up for doing Python 3.0 as well. -Brett From jeremy at alum.mit.edu Fri Aug 17 23:17:44 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 17 Aug 2007 17:17:44 -0400 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: <-7726188769332043533@unknownmsgid> References: <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> <46C5D94A.4030808@v.loewis.de> <786213B3-2067-41DA-9EE0-75ECB78B240A@python.org> <-7726188769332043533@unknownmsgid> Message-ID: On 8/17/07, Bill Janssen wrote: > > Ideally, the package would be well suited not only for wire-to-wire > > and all-internal uses, but also related domains like HTTP and other > > RFC 2822-like contexts. > > But that's exactly why the internal representation should be bytes, > not strings. HTTP's use of MIME, for instance, uses "binary" quite a > lot. In the specific case of HTTP, it certainly looks like the headers are represented on the wire as 7-bit ASCII and could be treated as bytes or strings by the header processing code it uses via rfc822.py. The actual body of the response should still be represented as bytes, which can be converted to strings by the application. I assume the current rfc822 handling means that MIME-encoded binary data in HTTP headers will come back as un-decoded strings. (But I'm not sure.) We don't have any tests in the httplib code for these cases. I would expect an application would prefer bytes for the un-decoded data or strings for the decoded data. Will email / rfc822 support this? Jeremy > > Internally though, I feel fairly strongly that an email > > message should be represented as strings, though sometimes (certainly > > for idempotency) you still need to carry around the charset (i.e. > > encoding). > > What if you've got a PNG as one of the multipart components? With a > Content-Transfer-Encoding of "binary"? There's no way to represent that > as a string. > > I wonder if we're misunderstanding each other here. The "mail > message" itself is essentially a binary data structure, not a sequence > of strings, though many of its fields consist of carefully specified > string values. Is that what you're saying? That when decoding the > message, the fields which are string-valued should be kept as strings > in the internal Python representation of the message? > > Bill > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu > From martin at v.loewis.de Fri Aug 17 23:22:41 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 17 Aug 2007 23:22:41 +0200 Subject: [Python-3000] AtheOS? In-Reply-To: References: <18117.40043.124714.520626@montanaro.dyndns.org> <46C5CD81.1040903@v.loewis.de> <18117.54431.306667.256066@montanaro.dyndns.org> Message-ID: <46C611A1.9080604@v.loewis.de> > Perhaps, it would be a good idea to take Plan9's approach to > portability -- i.e., you develop an extreme allergy to code filled > with #if, #ifdef, #else, #elseif; localize system dependencies in > separate files and hide them behind interfaces. > > By the way, there is a great chapter about portability in The Practice > of Programming, by Brian W. Kernighan and Rob Pike > (http://plan9.bell-labs.com/cm/cs/tpop/). That is where I first > learned about this approach. I'm doubtful whether that makes the code more readable, as you need to go through layers of indirections to find the place where something is actually implemented. In any case, contributions to apply this strategy to selected places are welcome, assuming they don't slow down the code too much. Regards, Martin From martin at v.loewis.de Fri Aug 17 23:26:12 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 17 Aug 2007 23:26:12 +0200 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: <07Aug17.135444pdt."57996"@synergy1.parc.xerox.com> References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> <46C5D94A.4030808@v.loewis.de> <786213B3-2067-41DA-9EE0-75ECB78B240A@python.org> <07Aug17.135444pdt."57996"@synergy1.parc.xerox.com> Message-ID: <46C61274.1050501@v.loewis.de> > What if you've got a PNG as one of the multipart components? With a > Content-Transfer-Encoding of "binary"? There's no way to represent that > as a string. Sure is. Any byte sequence can be interpreted as latin-1. Not that I think this would be a good thing to do. > I wonder if we're misunderstanding each other here. The "mail > message" itself is essentially a binary data structure, not a sequence > of strings, though many of its fields consist of carefully specified > string values. Is that what you're saying? I don't think so - I assume Barry really wants to use strings as the data type to represent the internal structure. It works fine for all aspects except for the 8bit and binary content-transfer-encodings. Regards, Martin From guido at python.org Fri Aug 17 23:44:56 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Aug 2007 14:44:56 -0700 Subject: [Python-3000] AtheOS? In-Reply-To: <46C611A1.9080604@v.loewis.de> References: <18117.40043.124714.520626@montanaro.dyndns.org> <46C5CD81.1040903@v.loewis.de> <18117.54431.306667.256066@montanaro.dyndns.org> <46C611A1.9080604@v.loewis.de> Message-ID: On 8/17/07, "Martin v. L?wis" wrote: > > Perhaps, it would be a good idea to take Plan9's approach to > > portability -- i.e., you develop an extreme allergy to code filled > > with #if, #ifdef, #else, #elseif; localize system dependencies in > > separate files and hide them behind interfaces. > > > > By the way, there is a great chapter about portability in The Practice > > of Programming, by Brian W. Kernighan and Rob Pike > > (http://plan9.bell-labs.com/cm/cs/tpop/). That is where I first > > learned about this approach. > > I'm doubtful whether that makes the code more readable, as you need > to go through layers of indirections to find the place where something > is actually implemented. In any case, contributions to apply this > strategy to selected places are welcome, assuming they don't slow down > the code too much. We already do this, e.g. pyport.h contains lots of stuff like that, and sometimes thinking about it some more makes it possible to move more stuff there (or to another platform-specific file). It just doesn't make sense to turn *every* silly little #ifdef into a system API, no matter what my esteemed colleagues say. ;-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steve at holdenweb.com Sat Aug 18 00:58:44 2007 From: steve at holdenweb.com (Steve Holden) Date: Fri, 17 Aug 2007 18:58:44 -0400 Subject: [Python-3000] [Python-Dev] Documentation switch imminent In-Reply-To: References: Message-ID: <46C62824.90002@holdenweb.com> Alexandre Vassalotti wrote: > On 8/17/07, Georg Brandl wrote: [...] > Ah, I didn't notice that index included all the documents. That > explains the huge size increase. However, would it be possible to keep > the indexes separated? I noticed that I find I want more quickly when > the indexes are separated. > Which is fine when you know which section to expect to find your content in. But let's retain an "all-documentation" index if we can, as this is particularly helpful to the newcomers who aren't that familiar with the structure of the documentation. >> I've now removed leading spaces in the index output, and the character >> count is down to 850000. >> >>> Firefox, on my fairly recent machine, takes ~5 seconds rendering the >>> index of the new docs from disk, compared to a fraction of a second >>> for the old one. >> But you're right that rendering is slow there. It may be caused by the >> more complicated CSS... perhaps the index should be split up in several >> pages. >> > > I disabled CSS-support (with View->Page Style->No Style), but it > didn't affect the initial rendering speed. However, scrolling was > *much* faster without CSS. > Probably because the positional calculations are more straightforward then. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden --------------- Asciimercial ------------------ Get on the web: Blog, lens and tag the Internet Many services currently offer free registration ----------- Thank You for Reading ------------- From janssen at parc.com Sat Aug 18 01:28:14 2007 From: janssen at parc.com (Bill Janssen) Date: Fri, 17 Aug 2007 16:28:14 PDT Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: References: <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> <46C5D94A.4030808@v.loewis.de> <786213B3-2067-41DA-9EE0-75ECB78B240A@python.org> <-7726188769332043533@unknownmsgid> Message-ID: <07Aug17.162825pdt."57996"@synergy1.parc.xerox.com> > On 8/17/07, Bill Janssen wrote: > > > Ideally, the package would be well suited not only for wire-to-wire > > > and all-internal uses, but also related domains like HTTP and other > > > RFC 2822-like contexts. > > > > But that's exactly why the internal representation should be bytes, > > not strings. HTTP's use of MIME, for instance, uses "binary" quite a > > lot. > > In the specific case of HTTP, it certainly looks like the headers are > represented on the wire as 7-bit ASCII and could be treated as bytes > or strings by the header processing code it uses via rfc822.py. The > actual body of the response should still be represented as bytes, > which can be converted to strings by the application. Note that, in the case of HTTP, both the request message and the response message may contain MIME-tagged binary data. And some of the header values for those message types may contain arbitrary RFC-8859-1 octets, not necessarily encoded. See sections 4.2 and 2.2 of RFC 2616. But we're not really interested in those message headers -- that's a consideration for the HTTP libraries. I'm just concerned about the MIME standard, which both HTTP and email use, though in different ways. The MIME processing in the "email" module must follow the MIME spec, RFC 2045, 2046, etc., rather than assume RFC 2821 (SMTP) and RFC 2822 encoding everywhere. SMTP is only one form of message envelope. The important thing is that we understand that raw mail messages -- say in MH format in a file -- do not consist of "lines" of "text"; they are complicated binary data structures, often largely composed of pieces of text encoded in very specific ways. As such, the raw message *must* be treated as a sequence of bytes. And the content of any body part may also be an arbitrary sequence of bytes (which, in an RFC 2822 context, must be encoded into ASCII octets). The values of any header may be an arbitrary string in an arbitrary language in an arbitrary character set (see RFCs 2047 and 2231), though it must be put into the message appropriately encoded as a sequence of octets which must be drawn from a set of octets which happens to be a subset of the octets in ASCII. Maybe all of this argues for separating "mime" and "email" into two different packages. And maybe renaming "email" "internet-email" or "rfc2822-email". Bill From janssen at parc.com Sat Aug 18 01:33:05 2007 From: janssen at parc.com (Bill Janssen) Date: Fri, 17 Aug 2007 16:33:05 PDT Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: <46C61274.1050501@v.loewis.de> References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> <46C5D94A.4030808@v.loewis.de> <786213B3-2067-41DA-9EE0-75ECB78B240A@python.org> <07Aug17.135444pdt."57996"@synergy1.parc.xerox.com> <46C61274.1050501@v.loewis.de> Message-ID: <07Aug17.163310pdt."57996"@synergy1.parc.xerox.com> > > What if you've got a PNG as one of the multipart components? With a > > Content-Transfer-Encoding of "binary"? There's no way to represent that > > as a string. > > Sure is. Any byte sequence can be interpreted as latin-1. Last time I looked, Latin-1 didn't cover the octets 0x80 - 0x9F. Maybe you're thinking of Microsoft codepage 1252? > > I wonder if we're misunderstanding each other here. The "mail > > message" itself is essentially a binary data structure, not a sequence > > of strings, though many of its fields consist of carefully specified > > string values. Is that what you're saying? > > I don't think so - I assume Barry really wants to use strings as the > data type to represent the internal structure. It works fine for all > aspects except for the 8bit and binary content-transfer-encodings. Yep, that's what I'm saying -- doing it that way breaks on certain content-transfer-encodings. There's also a problem with line endings; the mail standards call for an explicit CRLF sequence. These things really aren't strings. Few data packets are. Bill From bjourne at gmail.com Sat Aug 18 02:31:10 2007 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Sat, 18 Aug 2007 00:31:10 +0000 Subject: [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: <740c3aec0708171731qc9324c3o17debfafe4c1530d@mail.gmail.com> It is fantastic! Totally super work. I just have one small request; pretty please do not set the font. I'm very happy with my browsers default (Verdana), and Bitstream Vera Sans renders badly for me. -- mvh Bj?rn From greg.ewing at canterbury.ac.nz Sat Aug 18 03:01:48 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 18 Aug 2007 13:01:48 +1200 Subject: [Python-3000] Nested brackets (Was: Please don't kill the % operator...) In-Reply-To: <46C5CA2C.1000306@v.loewis.de> References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> <18116.43765.194435.952513@montanaro.dyndns.org> <46C4D13B.8070608@v.loewis.de> <07Aug16.161620pdt.57996@synergy1.parc.xerox.com> <46C526C5.20906@v.loewis.de> <46C536ED.6060500@ronadam.com> <46C57C62.3040000@trueblade.com> <46C5C0D0.7040609@ronadam.com> <46C5CA2C.1000306@v.loewis.de> Message-ID: <46C644FC.7020108@canterbury.ac.nz> Martin v. L?wis wrote: > I now agree with Bill that we have a "computation of matching braces > problem", surprisingly: people disagree with each other and with the > PEP what the meaning of the braces in above example is. I think it should be considered an error to use a name which is not a valid identifier, even if the implementation doesn't detect this. -- Greg From stephen at xemacs.org Sat Aug 18 03:20:11 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 18 Aug 2007 10:20:11 +0900 Subject: [Python-3000] AtheOS? In-Reply-To: References: <18117.40043.124714.520626@montanaro.dyndns.org> <46C5CD81.1040903@v.loewis.de> <18117.54431.306667.256066@montanaro.dyndns.org> Message-ID: <87tzqx3agk.fsf@uwakimon.sk.tsukuba.ac.jp> Alexandre Vassalotti writes: > Perhaps, it would be a good idea to take Plan9's approach to > portability -- i.e., you develop an extreme allergy to code filled > with #if, #ifdef, #else, #elseif; localize system dependencies in > separate files and hide them behind interfaces. If I understand correctly, Emacs uses this approach (although not thoroughly). The resulting portability files are called s & m files for several reasons. One of course is that they are kept in subdirectories called "s" for "system" and "m" for "machine". The more relevant here is the pain they inflict on developers, because they make it more difficult to find implementations of functions, and hide (or at least distribute across many files) potentially relevant information. Although parsing a deep #ifdef tree is very error-prone and annoying, this can be improved with technology (eg, hideif.el in Emacs). The #ifdef approach also has an advantage that all alternative implementations and their comments are in your face. Mostly they're irrelevant, but often enough they're very suggestive. Though I suppose a sufficiently disciplined programmer would think to use that resource if split out into files, I am not one. This is an important difference between the approaches for me and perhaps for others who only intermittently work on the code, and only on parts relevant to their daily lives. In the end, the difference doesn't seem to be that great, but for us preferred practice (especially as we add support for new platforms) is definitely to put the platform dependencies into configure, to try to organize them by feature rather than by cpu-os-versions-thereof, and to use #ifdefs. FWIW YMMV. From talin at acm.org Sat Aug 18 03:48:54 2007 From: talin at acm.org (Talin) Date: Fri, 17 Aug 2007 18:48:54 -0700 Subject: [Python-3000] PEP 3101 clarification requests In-Reply-To: References: Message-ID: <46C65006.9030907@acm.org> Wow, excellent feedback. I've added your email to the list of reminders for the next round of edits. Jim Jewett wrote: > The PEP says: > > The general form of a standard format specifier is: > > [[fill]align][sign][width][.precision][type] > > but then says: > > A zero fill character without an alignment flag > implies an alignment type of '='. > > In the above form, how can you get a fill character without an > alignment character? And why would you want to support it; It just > makes the width look like an (old-style) octal. (The spec already > says that you need the alignment if you use a digit other than zero.) > > -------------- > > The explicit conversion flag is limited to "r" and "s", but I assume > that can be overridden in a Formatter subclass. That possibility > might be worth mentioning explicitly. > > -------------- > > > 'check_unused_args' is used to implement checking > for unused arguments ... The intersection of these two > sets will be the set of unused args. > > Huh? I *think* the actual intent is (args union kwargs)-used. I > can't find an intersection in there. > > ----------------- > This can easily be done by overriding get_named() as follows: > > I assume that should be get_value. > > class NamespaceFormatter(Formatter): > def __init__(self, namespace={}, flags=0): > Formatter.__init__(self, flags) > > but the Formatter class took no init parameters -- should flags be > added to the Formatter constructor, or taken out of here? > > The get_value override can be expressed more simply as > > def get_value(self, key, args, kwds): > try: > # simplify even more by assuming PEP 3135? > super(NamespaceFormatter, self).get_value(key, args, kwds) > except KeyError: > return self.namespace[name] > > The example usage then takes globals()... > > fmt = NamespaceFormatter(globals()) > greeting = "hello" > print(fmt("{greeting}, world!")) > > Is there now a promise that the objects returned by locals() and > globals() will be "live", so that they would reflect the new value of > "greeting", even though it was set after the Formatter was created? > > -jJ > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/talin%40acm.org > From talin at acm.org Sat Aug 18 03:54:36 2007 From: talin at acm.org (Talin) Date: Fri, 17 Aug 2007 18:54:36 -0700 Subject: [Python-3000] AtheOS? In-Reply-To: <46C5DC89.5040207@v.loewis.de> References: <18117.40043.124714.520626@montanaro.dyndns.org> <46C5CD81.1040903@v.loewis.de> <18117.54431.306667.256066@montanaro.dyndns.org> <46C5DC89.5040207@v.loewis.de> Message-ID: <46C6515C.20804@acm.org> Martin v. L?wis wrote: >> Martin> It took some effort to integrate this for 2.3, so I feel sad >> Martin> that this is now all ripped out again. I'm not certain the code >> Martin> gets cleaner that way - just smaller. >> >> Well, fewer #ifdefs can't be a bad thing. > > By that principle, it would be best if Python supported only a single > platform - I would chose Linux (that would also save me creating > Windows binaries :-) > > Fewer ifdefs are a bad thing if they also go along with fewer > functionality, or worse portability. As I said, people contributed their > time to write this code (in this case, it took me several hours of work, > to understand and adjust the patch being contributed), and I do find > it bad, in principle, that this work is now all declared wasted. I wonder how hard it would be - and how much it would distort the Python code base - if most if not all platform-specific differences could be externalized from the core Python source code. Ideally, a platform wishing to support Python shouldn't have to be part of the core Python distribution, and a "port" of Python should consist of the concatenation of two packages, the universal Python sources, and a set of platform-specific adapters, which may or may not be hosted on the main Python site. However, this just might be wishful thinking on my part... -- Talin From guido at python.org Sat Aug 18 04:35:37 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Aug 2007 19:35:37 -0700 Subject: [Python-3000] AtheOS? In-Reply-To: <46C6515C.20804@acm.org> References: <18117.40043.124714.520626@montanaro.dyndns.org> <46C5CD81.1040903@v.loewis.de> <18117.54431.306667.256066@montanaro.dyndns.org> <46C5DC89.5040207@v.loewis.de> <46C6515C.20804@acm.org> Message-ID: On 8/17/07, Talin wrote: > I wonder how hard it would be - and how much it would distort the Python > code base - if most if not all platform-specific differences could be > externalized from the core Python source code. Ideally, a platform > wishing to support Python shouldn't have to be part of the core Python > distribution, and a "port" of Python should consist of the concatenation > of two packages, the universal Python sources, and a set of > platform-specific adapters, which may or may not be hosted on the main > Python site. Go read the source code and look for #ifdefs. They are all over the place and for all sorts of reasons. It would be nice if there was a limited number of established platform dependent APIs, like "open", "read", "write" etc. But those rarely are the problem. The real platform differences are things like which pre-release version of pthreads they support, what the symbol is you have to #define to get the BSD extensions added to the header files, whether there's a bug in their va_args implementation (and which bug it is), what the header file name is to get the ANSI C-compatible C signal definitions, what the error code is for interrupted I/O, whether I/O is interruptable at all, etc., etc. Don't forget that the POSIX and C standards *require* #ifdefs for many features that are optional or that have different possible semantics. Just read through fileobject.c or import.c or posixmodule.c. Sure, there are *some* situations where this approach could clarify some code a bit. But *most* places are too ad-hoc to define an interface -- having three lines of random code in a different file instead of inside an #ifdef doesn't really have any benefits. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Sat Aug 18 07:50:33 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 18 Aug 2007 07:50:33 +0200 Subject: [Python-3000] should rfc822 accept text io or binary io? In-Reply-To: <07Aug17.163310pdt."57996"@synergy1.parc.xerox.com> References: <18103.34967.170146.660275@montanaro.dyndns.org> <3CC0BA3B-38D2-44F6-A230-E3B59C1016DE@acm.org> <46C5D94A.4030808@v.loewis.de> <786213B3-2067-41DA-9EE0-75ECB78B240A@python.org> <07Aug17.135444pdt."57996"@synergy1.parc.xerox.com> <46C61274.1050501@v.loewis.de> <07Aug17.163310pdt."57996"@synergy1.parc.xerox.com> Message-ID: <46C688A9.9020702@v.loewis.de> >>> What if you've got a PNG as one of the multipart components? With a >>> Content-Transfer-Encoding of "binary"? There's no way to represent that >>> as a string. >> Sure is. Any byte sequence can be interpreted as latin-1. > > Last time I looked, Latin-1 didn't cover the octets 0x80 - 0x9F. Depends on where you looked. The IANA charset ISO_8859-1:1987 (MIBenum 4, alias latin1), defined in RFC 1345, has the C1 controls in this place. Python's Latin-1 codec implements that specification, and when Unicode says that the first 256 Unicode code points are identical to Latin-1, they also refer to this definition of Latin-1. If you look at section 1 of ISO 8859-1, you'll find that it can be used with the coded control functions in ISO 6429. People typically assume that it is indeed used in such a way, because you could not encode line breaks otherwise (among other things). > Maybe you're thinking of Microsoft codepage 1252? Definitely not. Regards, Martin From oliphant.travis at ieee.org Sat Aug 18 13:33:51 2007 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Sat, 18 Aug 2007 05:33:51 -0600 Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch Message-ID: Hello all, I'm sorry I won't be attending the Google sprints next week. I'm going to be moving from Utah to Texas next week and will be offline for several days. In preparation for the sprints, I have converted all Python objects to use the new buffer protocol PEP and implemented most of the C-API. This work took place in the py3k-buffer branch which now passes all the tests that py3k does. So, I merged the changes back to the py3k branch in hopes that others can continue working on what I've done. The merge took place after fully syncing the py3k-buffer branch with the current trunk. There will be somebody from our community that will be at the Sprints next week. He has agreed to try and work on the buffer protocol some more. He is new to Python and so will probably need some help. He has my cell phone number and will call me with questions which I hope to answer. Left to do: 1) Finish the MemoryViewObject (getitem/setitem needs work). 2) Finish the struct module changes (I've started, but have not checked the changes in). 3) Add tests Possible problems: It seems that whenever a PyExc_BufferError is raised, problems (like segfaults) occur. I tried to add a new error object by copying how Python did it for other errors, but it's likely that I didn't do it right. I will have email contact for a few days (until Tuesday) but will not have much time to work. Thanks, -Travis Oliphant From skip at pobox.com Sat Aug 18 14:07:30 2007 From: skip at pobox.com (skip at pobox.com) Date: Sat, 18 Aug 2007 07:07:30 -0500 Subject: [Python-3000] Bus error after updating this morning Message-ID: <18118.57602.910814.663900@montanaro.dyndns.org> After reading Travis's email about the py3k-buffer merge this morning I updated my sandbox on my Mac and rebuilt. I got a bus error when trying to run the tests. (gdb) run -E -tt ./Lib/test/regrtest.py -l Starting program: /Users/skip/src/python-svn/py3k/python.exe -E -tt ./Lib/test/regrtest.py -l Reading symbols for shared libraries . done Reading symbols for shared libraries . done Reading symbols for shared libraries . done Reading symbols for shared libraries . done Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000021 0x006d0ec8 in binascii_hexlify (self=0x0, args=0x1012e70) at /Users/skip/src/python-svn/py3k/Modules/binascii.c:953 953 retbuf[j++] = c; (gdb) bt #0 0x006d0ec8 in binascii_hexlify (self=0x0, args=0x1012e70) at /Users/skip/src/python-svn/py3k/Modules/binascii.c:953 #1 0x0011b020 in PyCFunction_Call (func=0x1011c78, arg=0x1012e70, kw=0x0) at Objects/methodobject.c:73 ... The build was configured like so: ./configure --prefix=/Users/skip/local LDFLAGS=-L/opt/local/lib \ CPPFLAGS=-I/opt/local/include --with-pydebug Thinking maybe something didn't get rebuilt that should have I am rebuilding from scratch after a make distclean. I'll report back when I have more info if someone with a faster computer doesn't beat me to it. Skip From skip at pobox.com Sat Aug 18 14:21:12 2007 From: skip at pobox.com (skip at pobox.com) Date: Sat, 18 Aug 2007 07:21:12 -0500 Subject: [Python-3000] Bus error after updating this morning Message-ID: <18118.58424.155135.287777@montanaro.dyndns.org> Thinking maybe something didn't get rebuilt that should have I am rebuilding from scratch after a make distclean. make test is actually getting to the point where it's actually running tests, so the make distclean seems to have solved the problem. Perhaps there's a missing Makefile dependency somewhere. Skip From ncoghlan at gmail.com Sat Aug 18 15:04:38 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 18 Aug 2007 23:04:38 +1000 Subject: [Python-3000] bytes: compare bytes to integer In-Reply-To: <200708110225.28056.victor.stinner@haypocalc.com> References: <200708110225.28056.victor.stinner@haypocalc.com> Message-ID: <46C6EE66.8010701@gmail.com> Victor Stinner wrote: > Hi, > > I don't like the behaviour of Python 3000 when we compare a bytes strings > with length=1: > >>> b'xyz'[0] == b'x' > False > > The code can be see as: > >>> ord(b'x') == b'x' > False This seems to suggest its own solution: bytes_obj[0] == ord('x') (Given that ord converts *characters* to bytes, does it actually make sense to allow a bytes object as an argument to ord()?) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Sat Aug 18 18:45:00 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 18 Aug 2007 09:45:00 -0700 Subject: [Python-3000] Bus error after updating this morning In-Reply-To: <18118.58424.155135.287777@montanaro.dyndns.org> References: <18118.58424.155135.287777@montanaro.dyndns.org> Message-ID: On 8/18/07, skip at pobox.com wrote: > Thinking maybe something didn't get rebuilt that should have I am > rebuilding from scratch after a make distclean. > > make test is actually getting to the point where it's actually running > tests, so the make distclean seems to have solved the problem. Perhaps > there's a missing Makefile dependency somewhere. This typically happens when an essential .h file changes -- the setup.py scripts that builds the extension module doesn't check these dependencies. Nothing we can fix in the Makefile, alas. The shorter fix is rm -rf build. Thanks for figuring this out! I was just about to start an investigation. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Aug 18 19:05:57 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 18 Aug 2007 10:05:57 -0700 Subject: [Python-3000] Wanted: tasks for Py3k Sprint next week Message-ID: I'm soliciting ideas for things that need to be done for the 3.0 release that would make good sprint topics. Assume we'll have a mix of more and less experienced developers on hand. (See wiki.python.org/moin/GoogleSprint .) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Sat Aug 18 20:28:57 2007 From: brett at python.org (Brett Cannon) Date: Sat, 18 Aug 2007 11:28:57 -0700 Subject: [Python-3000] AtheOS? In-Reply-To: <18118.591.126364.329836@montanaro.dyndns.org> References: <18117.40043.124714.520626@montanaro.dyndns.org> <18118.591.126364.329836@montanaro.dyndns.org> Message-ID: On 8/17/07, skip at pobox.com wrote: > > Brett> On 8/17/07, skip at pobox.com wrote: > >> I just got rid of BeOS and RiscOS. > > Brett> Just so you know, Skip, BeOS still has a maintainer on the 2.x > Brett> branch. Whether we want to continue support past 2.x is another > Brett> question (as Guido says in another email, it's a hassle and so we > Brett> should try to minimize the OS support to those that follow > Brett> convention). > > I was going by the list on the wiki. BeOS was on that list. I removed it > in a single checkin so if it's decided in the near future to put it back > that should be easy to do. > Well, the maintainer of the current port said he has been moving away from BeOS. He guessed the Haiku developers didn't need the special support (but that's a guess). Looks like this can probably be removed from 2.6 as well if there is no maintainer. -Brett From skip at pobox.com Sat Aug 18 21:10:15 2007 From: skip at pobox.com (skip at pobox.com) Date: Sat, 18 Aug 2007 14:10:15 -0500 Subject: [Python-3000] AtheOS? In-Reply-To: References: <18117.40043.124714.520626@montanaro.dyndns.org> <18118.591.126364.329836@montanaro.dyndns.org> Message-ID: <18119.17431.670483.153498@montanaro.dyndns.org> Brett> Well, the maintainer of the current port said he has been moving Brett> away from BeOS. He guessed the Haiku developers didn't need the Brett> special support (but that's a guess). What does poetry have to do with BeOS? Brett> Looks like this can probably be removed from 2.6 as well if there Brett> is no maintainer. I'll update PEP 11. Deprecate in 2.6. Break the build in 2.7. Gone altogether in 3.0. Skip From brett at python.org Sat Aug 18 21:13:08 2007 From: brett at python.org (Brett Cannon) Date: Sat, 18 Aug 2007 12:13:08 -0700 Subject: [Python-3000] AtheOS? In-Reply-To: <18119.17431.670483.153498@montanaro.dyndns.org> References: <18117.40043.124714.520626@montanaro.dyndns.org> <18118.591.126364.329836@montanaro.dyndns.org> <18119.17431.670483.153498@montanaro.dyndns.org> Message-ID: On 8/18/07, skip at pobox.com wrote: > > Brett> Well, the maintainer of the current port said he has been moving > Brett> away from BeOS. He guessed the Haiku developers didn't need the > Brett> special support (but that's a guess). > > What does poetry have to do with BeOS? I am assuming you are referencing Haiku. =) Haiku is to Syllabus what AtheOS is to Be; a group of people who loved a dead OS enough to start a new open source project to mimick the original. > > Brett> Looks like this can probably be removed from 2.6 as well if there > Brett> is no maintainer. > > I'll update PEP 11. Deprecate in 2.6. Break the build in 2.7. Gone > altogether in 3.0. Great! -Brett From guido at python.org Sat Aug 18 21:48:54 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 18 Aug 2007 12:48:54 -0700 Subject: [Python-3000] bytes: compare bytes to integer In-Reply-To: <46C6EE66.8010701@gmail.com> References: <200708110225.28056.victor.stinner@haypocalc.com> <46C6EE66.8010701@gmail.com> Message-ID: On 8/18/07, Nick Coghlan wrote: > Victor Stinner wrote: > > Hi, > > > > I don't like the behaviour of Python 3000 when we compare a bytes strings > > with length=1: > > >>> b'xyz'[0] == b'x' > > False > > > > The code can be see as: > > >>> ord(b'x') == b'x' > > False > > This seems to suggest its own solution: > > bytes_obj[0] == ord('x') > > (Given that ord converts *characters* to bytes, does it actually make > sense to allow a bytes object as an argument to ord()?) No, I added that as a quick hack during the transition. If someone has the time, please kill this behavior and fix the (hopefully) few places that were relying on it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Aug 18 22:18:47 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 18 Aug 2007 13:18:47 -0700 Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch In-Reply-To: References: Message-ID: Wow. Thanks for a great job, Travis! I'll accept your PEP now. :-) We'll attend to the details at the sprint. --Guido On 8/18/07, Travis E. Oliphant wrote: > > Hello all, > > I'm sorry I won't be attending the Google sprints next week. I'm going > to be moving from Utah to Texas next week and will be offline for > several days. > > In preparation for the sprints, I have converted all Python objects to > use the new buffer protocol PEP and implemented most of the C-API. This > work took place in the py3k-buffer branch which now passes all the tests > that py3k does. > > So, I merged the changes back to the py3k branch in hopes that others > can continue working on what I've done. The merge took place after > fully syncing the py3k-buffer branch with the current trunk. > > There will be somebody from our community that will be at the Sprints > next week. He has agreed to try and work on the buffer protocol some > more. He is new to Python and so will probably need some help. He has > my cell phone number and will call me with questions which I hope to > answer. > > Left to do: > > 1) Finish the MemoryViewObject (getitem/setitem needs work). > 2) Finish the struct module changes (I've started, but have not checked > the changes in). > 3) Add tests > > Possible problems: > > It seems that whenever a PyExc_BufferError is raised, problems (like > segfaults) occur. I tried to add a new error object by copying how > Python did it for other errors, but it's likely that I didn't do it right. > > I will have email contact for a few days (until Tuesday) but will not > have much time to work. > > Thanks, > > > -Travis Oliphant > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at pythoncraft.com Sun Aug 19 00:13:44 2007 From: aahz at pythoncraft.com (Aahz) Date: Sat, 18 Aug 2007 15:13:44 -0700 Subject: [Python-3000] Please don't kill the % operator... In-Reply-To: References: <46C2809C.3000806@acm.org> <46C3C1DE.6070302@cs.rmit.edu.au> <18116.17176.123168.265491@montanaro.dyndns.org> Message-ID: <20070818221344.GA5742@panix.com> On Thu, Aug 16, 2007, Guido van Rossum wrote: > > I don't know what percentage of %-formatting uses a string literal on > the left; if it's a really high number (high 90s), I'd like to kill > %-formatting and go with mechanical translation; otherwise, I think > we'll have to phase out %-formatting in 3.x or 4.0. Then there's the pseudo-literal for long lines: if not data: msg = "Syntax error at line %s: missing 'data' element" msg %= line_number Including those, my code probably has quite a few non-literals. Even when you exclude them, going through and finding the non-literals will cause much pain, because we do use "%" for numeric purposes and because our homebrew templating language uses "%" to indicate a variable. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "And if that makes me an elitist...I couldn't be happier." --JMS From skip at pobox.com Sun Aug 19 00:27:28 2007 From: skip at pobox.com (skip at pobox.com) Date: Sat, 18 Aug 2007 17:27:28 -0500 Subject: [Python-3000] PEP 11 update - Call for port maintainers to step forward Message-ID: <18119.29264.543717.894262@montanaro.dyndns.org> I made a quick update to PEP 11, "Removing support for little used platforms". I added details about ending support for AtheOS/Syllable and BeOS. I also added a yet-to-be-fleshed out section entitled "Platform Maintainers". I intend that to the extent possible we document the responsible parties for various platforms. Obviously, common platforms like Windows, Mac OS X, Linux and common Unix platforms (Solaris, *BSD, what else?) will continue to be supported by the core Python developer community, but lesser platforms should have one or more champions, and we should be able to get ahold of them to determine their continued interest in supporting Python on their platform(s). If you are the "owner" of a minor platform, please drop me a note. Ones I'm aware of that probably need specialized support outside the core Python developers include: IRIX Tru64 (aka OSF/1 and other names (what else?)) OS2/EMX (Andrew MacIntyre?) Cygwin MinGW HP-UX AIX Solaris < version 8 SCO Unixware IRIX and Tru64 are likely to go the way of the dodo if someone doesn't step up soon to offer support. I don't expect the others to disappear soon, but they tend to need more specialized support, especially in more "challenging" areas (shared library support, threading, etc). If you maintain the platform-specific aspects for any of these platforms, please let me know. If you aren't that person but know who is, please pass this note along to them. If I've missed any other platforms (I know I must have have missed something), let me know that as well. Thanks, -- Skip Montanaro - skip at pobox.com - http://www.webfast.com/~skip/ From greg.ewing at canterbury.ac.nz Sun Aug 19 02:27:38 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 19 Aug 2007 12:27:38 +1200 Subject: [Python-3000] bytes: compare bytes to integer In-Reply-To: <46C6EE66.8010701@gmail.com> References: <200708110225.28056.victor.stinner@haypocalc.com> <46C6EE66.8010701@gmail.com> Message-ID: <46C78E7A.10600@canterbury.ac.nz> Nick Coghlan wrote: > bytes_obj[0] == ord('x') That's a rather expensive way of comparing an integer with a constant, though. -- Greg From lists at cheimes.de Sun Aug 19 03:18:29 2007 From: lists at cheimes.de (Christian Heimes) Date: Sun, 19 Aug 2007 03:18:29 +0200 Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch In-Reply-To: References: Message-ID: Travis E. Oliphant wrote: > Left to do: > > 1) Finish the MemoryViewObject (getitem/setitem needs work). > 2) Finish the struct module changes (I've started, but have not checked > the changes in). > 3) Add tests > > Possible problems: > > It seems that whenever a PyExc_BufferError is raised, problems (like > segfaults) occur. I tried to add a new error object by copying how > Python did it for other errors, but it's likely that I didn't do it right. > > I will have email contact for a few days (until Tuesday) but will not > have much time to work. I was wondering what the memoryview is doing so I tried it with a string: ./python -c "memoryview('test')" Segmentation fault Ooops! gdb says this about the error: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1210415424 (LWP 14436)] 0x080f77a0 in PyErr_SetObject (exception=0x81962c0, value=0xb7cee3a8) at Python/errors.c:55 55 if (exception != NULL && Bug report: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1777057&group_id=5470 Christian From nnorwitz at gmail.com Sun Aug 19 06:32:24 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sat, 18 Aug 2007 21:32:24 -0700 Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch In-Reply-To: References: Message-ID: On 8/18/07, Travis E. Oliphant wrote: > > In preparation for the sprints, I have converted all Python objects to > use the new buffer protocol PEP and implemented most of the C-API. This > work took place in the py3k-buffer branch which now passes all the tests > that py3k does. > > So, I merged the changes back to the py3k branch in hopes that others > can continue working on what I've done. The merge took place after > fully syncing the py3k-buffer branch with the current trunk. > > Left to do: > > 1) Finish the MemoryViewObject (getitem/setitem needs work). > 2) Finish the struct module changes (I've started, but have not checked > the changes in). > 3) Add tests Also need to add doc. I noticed not all the new APIs mentioned the meaning of the return value. Do all the new functions which return int only return 0 on success and -1 on failure. Or do any return a size. I'm thinking of possible issues with Py_ssize_t vs int mismatches. I saw a couple which might have been a problem. See below. > Possible problems: > > It seems that whenever a PyExc_BufferError is raised, problems (like > segfaults) occur. I tried to add a new error object by copying how > Python did it for other errors, but it's likely that I didn't do it right. I think I fixed this. Needed to add PRE_INIT and POST_INIT for the new exception. This fixed the problem reported by Christian Heimes in this thread. I checked in revision 57193 which was a code review. I pointed out all the places I thought there were problems. Since some of this code is tricky, I expect there will be more issues. This code really, really needs tests. I added a comment about a memory leak. Below is the stack trace of where the memory was allocated. I added a comment (in release buffer) where I thought it could be freed, but I'm not sure that's the right place. When I ran the test suite test_xmlrpc failed. I'm not sure if this was from your checkin, my checkin, or something else. n -- Memory leaked when allocated from: array_buffer_getbuf (arraymodule.c:1775) buffer_getbuf (bufferobject.c:28) bytes_init (bytesobject.c:807) type_call (typeobject.c:429) From guido at python.org Sun Aug 19 06:37:42 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 18 Aug 2007 21:37:42 -0700 Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch In-Reply-To: References: Message-ID: On 8/18/07, Neal Norwitz wrote: > When I ran the test suite test_xmlrpc failed. I'm not sure if this > was from your checkin, my checkin, or something else. This was already failing before; I think I reported it Friday or Thursday night. This started happening after a merge from the trunk brought in a bunch of new unit test code for xmlrpc. I'm guessing it's str/bytes issues. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Sun Aug 19 06:50:10 2007 From: rhamph at gmail.com (Adam Olsen) Date: Sat, 18 Aug 2007 22:50:10 -0600 Subject: [Python-3000] Wanted: tasks for Py3k Sprint next week In-Reply-To: References: Message-ID: On 8/18/07, Guido van Rossum wrote: > I'm soliciting ideas for things that need to be done for the 3.0 > release that would make good sprint topics. Assume we'll have a mix of > more and less experienced developers on hand. > > (See wiki.python.org/moin/GoogleSprint .) Would ripping out the malloc macros[1] be a suitable suggestion? [1] Include/objimpl.h:#define PyObject_MALLOC PyMem_MALLOC Include/pymem.h:#define PyMem_MALLOC PyObject_MALLOC -- Adam Olsen, aka Rhamphoryncus From steve at holdenweb.com Sat Aug 18 00:58:44 2007 From: steve at holdenweb.com (Steve Holden) Date: Fri, 17 Aug 2007 18:58:44 -0400 Subject: [Python-3000] [Python-Dev] Documentation switch imminent In-Reply-To: References: Message-ID: <46C62824.90002@holdenweb.com> Alexandre Vassalotti wrote: > On 8/17/07, Georg Brandl wrote: [...] > Ah, I didn't notice that index included all the documents. That > explains the huge size increase. However, would it be possible to keep > the indexes separated? I noticed that I find I want more quickly when > the indexes are separated. > Which is fine when you know which section to expect to find your content in. But let's retain an "all-documentation" index if we can, as this is particularly helpful to the newcomers who aren't that familiar with the structure of the documentation. >> I've now removed leading spaces in the index output, and the character >> count is down to 850000. >> >>> Firefox, on my fairly recent machine, takes ~5 seconds rendering the >>> index of the new docs from disk, compared to a fraction of a second >>> for the old one. >> But you're right that rendering is slow there. It may be caused by the >> more complicated CSS... perhaps the index should be split up in several >> pages. >> > > I disabled CSS-support (with View->Page Style->No Style), but it > didn't affect the initial rendering speed. However, scrolling was > *much* faster without CSS. > Probably because the positional calculations are more straightforward then. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden --------------- Asciimercial ------------------ Get on the web: Blog, lens and tag the Internet Many services currently offer free registration ----------- Thank You for Reading ------------- From nnorwitz at gmail.com Sun Aug 19 20:50:47 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sun, 19 Aug 2007 11:50:47 -0700 Subject: [Python-3000] cleaning up different ways to free an object Message-ID: I just fixed a bug in the new memoryview that used PyObject_DEL which caused a problem in debug mode. I had to change it to a Py_DECREF. It seems we have a lot of spellings of ways to free an object and I wonder if there are more problems lurking in there. $ cat */*.c | grep -c PyObject_Del 103 $ cat */*.c | grep -c PyObject_DEL 16 $ cat */*.c | grep -c PyObject_Free 16 $ cat */*.c | grep -c PyObject_FREE 19 I don't know how many of these are correct or incorrect. Note in Include/objimpl, the Del and Free variants are the same. I plan to get rid of one of them. #define PyObject_Del PyObject_Free #define PyObject_DEL PyObject_FREE PyObject_{MALLOC,REALLOC,FREE} depend upon whether python is compiled with debug mode, pymalloc, or not. What are the rules for when a particular API should be used (or not used) to free an object? n From brett at python.org Sun Aug 19 23:08:41 2007 From: brett at python.org (Brett Cannon) Date: Sun, 19 Aug 2007 14:08:41 -0700 Subject: [Python-3000] cleaning up different ways to free an object In-Reply-To: References: Message-ID: On 8/19/07, Neal Norwitz wrote: > I just fixed a bug in the new memoryview that used PyObject_DEL which > caused a problem in debug mode. I had to change it to a Py_DECREF. > It seems we have a lot of spellings of ways to free an object and I > wonder if there are more problems lurking in there. > > $ cat */*.c | grep -c PyObject_Del > 103 > $ cat */*.c | grep -c PyObject_DEL > 16 > $ cat */*.c | grep -c PyObject_Free > 16 > $ cat */*.c | grep -c PyObject_FREE > 19 > > I don't know how many of these are correct or incorrect. > > Note in Include/objimpl, the Del and Free variants are the same. I > plan to get rid of one of them. > > #define PyObject_Del PyObject_Free > #define PyObject_DEL PyObject_FREE > > PyObject_{MALLOC,REALLOC,FREE} depend upon whether python is compiled > with debug mode, pymalloc, or not. > > What are the rules for when a particular API should be used (or not > used) to free an object? If you read the comment at the top of Include/objimpl.h, it says that PyObject_(New|NewVar|Del) are for object allocation while PyObject_(Malloc|Realloc|Free) are just like malloc/free, but they use pymalloc instead of the system malloc. After that there are the usual performance macros. I am sure that prefixing the pymalloc versions of malloc/free PyObject is confusing for people. Maybe that can change to something like PyMalloc_* or something to disambiguate better. -Brett From rhamph at gmail.com Sun Aug 19 23:33:48 2007 From: rhamph at gmail.com (Adam Olsen) Date: Sun, 19 Aug 2007 15:33:48 -0600 Subject: [Python-3000] cleaning up different ways to free an object In-Reply-To: References: Message-ID: On 8/19/07, Neal Norwitz wrote: > I just fixed a bug in the new memoryview that used PyObject_DEL which > caused a problem in debug mode. I had to change it to a Py_DECREF. > It seems we have a lot of spellings of ways to free an object and I > wonder if there are more problems lurking in there. > > $ cat */*.c | grep -c PyObject_Del > 103 > $ cat */*.c | grep -c PyObject_DEL > 16 > $ cat */*.c | grep -c PyObject_Free > 16 > $ cat */*.c | grep -c PyObject_FREE > 19 > > I don't know how many of these are correct or incorrect. > > Note in Include/objimpl, the Del and Free variants are the same. I > plan to get rid of one of them. > > #define PyObject_Del PyObject_Free > #define PyObject_DEL PyObject_FREE > > PyObject_{MALLOC,REALLOC,FREE} depend upon whether python is compiled > with debug mode, pymalloc, or not. > > What are the rules for when a particular API should be used (or not > used) to free an object? Going from the lowest level to the highest level we have: {malloc,realloc,free} - libc's functions, arguments are bytes. PyMem_{Malloc,Realloc,Free} - Simple wrapper of {malloc,realloc,free} or PyObject_{Malloc,Realloc,Free}, but guarantees 0-byte allocations will always succeed. Do we really need this? At best it seems synonymous with PyObject_{Malloc,Realloc,Free}. It is a better name though. PyObject_{Malloc,Realloc,Free} - obmalloc.c's functions, arguments are bytes. Despite the name, I believe it can be used for arbitrary allocations (not just PyObjects.) Probably shouldn't be in Objects/. configure calls these pymalloc and they are controlled by the WITH_PYMALLOC define. Also guarantees 0-byte allocations will succeed. _PyObject_{New,NewVar} - object.c's functions, arguments are a PyTypeObject and optionally a size. Determines the number of bytes automatically, initializes ob_type and ob_refcnt fields. _PyObject_Del - Does nothing in particular (wraps free/Free), but the argument is intended to be a PyObject returned by _PyObject_{New,NewVar}. Exists only to complement other functions. Currently only a macro. Could be extended to sanity-check ob_refcnt field on debug builds. _PyObject_GC_{New,NewVar,Del} - As _PyObject_{New,NewVar,Del}, but adds hidden accounting info needed by cycle GC. PyObject{,_GC}_{New,NewVar,Del} - Macros that add typecasting to the above. -- Adam Olsen, aka Rhamphoryncus From nnorwitz at gmail.com Mon Aug 20 02:18:28 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sun, 19 Aug 2007 17:18:28 -0700 Subject: [Python-3000] cleaning up different ways to free an object In-Reply-To: References: Message-ID: On 8/19/07, Brett Cannon wrote: > On 8/19/07, Neal Norwitz wrote: > > I just fixed a bug in the new memoryview that used PyObject_DEL which > > caused a problem in debug mode. I had to change it to a Py_DECREF. > > It seems we have a lot of spellings of ways to free an object and I > > wonder if there are more problems lurking in there. > > > > $ cat */*.c | grep -c PyObject_Del > > 103 > > $ cat */*.c | grep -c PyObject_DEL > > 16 > > $ cat */*.c | grep -c PyObject_Free > > 16 > > $ cat */*.c | grep -c PyObject_FREE > > 19 > > > > I don't know how many of these are correct or incorrect. > > > > Note in Include/objimpl, the Del and Free variants are the same. I > > plan to get rid of one of them. > > > > #define PyObject_Del PyObject_Free > > #define PyObject_DEL PyObject_FREE > > > > PyObject_{MALLOC,REALLOC,FREE} depend upon whether python is compiled > > with debug mode, pymalloc, or not. > > > > What are the rules for when a particular API should be used (or not > > used) to free an object? > > If you read the comment at the top of Include/objimpl.h, it says that > PyObject_(New|NewVar|Del) are for object allocation while > PyObject_(Malloc|Realloc|Free) are just like malloc/free, but they use > pymalloc instead of the system malloc. Ya, I'm not talking about the distinctions/categories. They makes sense. The 'correctness' I was referring to (thus the rules) was when to use PyObject_Del vs Py_DECREF (ie, the problem with memoryview). I was trying to point out with the greps that the DELs/FREEs were infrequently used. I know there are some cases in _bsddb.c and I'm wondering if those are correct (there are a handful of other modules which also use them). The Del variants are used more in Modules while the Free variants are used more in the core. Changing PyObject_Del to Py_DECREF may require that more of a structure needs to be initialized before DECREFing, otherwise the dealloc might access uninitialized memory. I guess I can't really get rid of the aliases though, not without making the API inconsistent. > After that there are the usual performance macros. Another thing that kinda bugs me is that the 'macro' versions are not normally macros. In a default build (ie, with pymalloc), they are non-inlined function calls. n From eric+python-dev at trueblade.com Mon Aug 20 02:56:03 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Sun, 19 Aug 2007 20:56:03 -0400 Subject: [Python-3000] PEP 3101 clarification requests In-Reply-To: <46C65006.9030907@acm.org> References: <46C65006.9030907@acm.org> Message-ID: <46C8E6A3.4070202@trueblade.com> Talin wrote: > Wow, excellent feedback. I've added your email to the list of reminders > for the next round of edits. Here's something else for future edits: 1. When converting a string to an integer, what should the rules be? Should: format("0xd", "d") produce "13", or should it be an error? 2. I'm making the format specifiers as strict as I can. So, I've made these ValueError's: For strings: - specifying a sign - specifying an alignment of '=' For longs: - specify a precision - specify a sign with type of 'c' Eric. From eric+python-dev at trueblade.com Mon Aug 20 03:16:34 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Sun, 19 Aug 2007 21:16:34 -0400 Subject: [Python-3000] PEP 3101 clarification requests In-Reply-To: <46C8E6A3.4070202@trueblade.com> References: <46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com> Message-ID: <46C8EB72.2030505@trueblade.com> Eric Smith wrote: > 2. I'm making the format specifiers as strict as I can. So, I've made > these ValueError's: I should have mentioned that I expect there to be criticism of this decision. I'd like to start with making the specifier parser strict, we can always loosen it if we find the need, when converting actual code. Eric. From oliphant.travis at ieee.org Mon Aug 20 09:21:18 2007 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Mon, 20 Aug 2007 01:21:18 -0600 Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch In-Reply-To: References: Message-ID: Neal Norwitz wrote: > On 8/18/07, Travis E. Oliphant wrote: >> In preparation for the sprints, I have converted all Python objects to >> use the new buffer protocol PEP and implemented most of the C-API. This >> work took place in the py3k-buffer branch which now passes all the tests >> that py3k does. >> >> So, I merged the changes back to the py3k branch in hopes that others >> can continue working on what I've done. The merge took place after >> fully syncing the py3k-buffer branch with the current trunk. >> >> Left to do: >> >> 1) Finish the MemoryViewObject (getitem/setitem needs work). >> 2) Finish the struct module changes (I've started, but have not checked >> the changes in). >> 3) Add tests > > Also need to add doc. I noticed not all the new APIs mentioned the > meaning of the return value. Do all the new functions which return > int only return 0 on success and -1 on failure. Or do any return a > size. I'm thinking of possible issues with Py_ssize_t vs int > mismatches. I saw a couple which might have been a problem. See > below. Yes, IIRC that is correct. > >> Possible problems: >> >> It seems that whenever a PyExc_BufferError is raised, problems (like >> segfaults) occur. I tried to add a new error object by copying how >> Python did it for other errors, but it's likely that I didn't do it right. > > I think I fixed this. Needed to add PRE_INIT and POST_INIT for the > new exception. This fixed the problem reported by Christian Heimes in > this thread. Thanks very much. > > I checked in revision 57193 which was a code review. I pointed out > all the places I thought there were problems. Since some of this code > is tricky, I expect there will be more issues. This code really, > really needs tests. If Chris (the guy who will be at the sprint) does not write tests, I will, but it will probably be after about Aug. 27. > > I added a comment about a memory leak. Below is the stack trace of > where the memory was allocated. I added a comment (in release buffer) > where I thought it could be freed, but I'm not sure that's the right > place. There should be no memory to free there. The get and release buffer mechanism doesn't allocate or free any memory (there was a hack in arrayobject which I just fixed). Now, perhaps there are some reference counting issues, but the mechanism doesn't really play with reference counts either. I will be around after August 27th to test the code more (it will help to finish implementing the MemoryView Object -- i.e. get its tolist function working, and so forth). > > When I ran the test suite test_xmlrpc failed. I'm not sure if this > was from your checkin, my checkin, or something else. > This was definitely happening prior to my checking in. > n > -- > Memory leaked when allocated from: > array_buffer_getbuf (arraymodule.c:1775) > buffer_getbuf (bufferobject.c:28) > bytes_init (bytesobject.c:807) > type_call (typeobject.c:429) Hmm. I'm not sure what memory is being leaked unless there are reference counting issues I'm not seeing. In bytes_init for example, that line number is a static memory allocation? How is static memory being leaked? The arraymodule.c malloc call should be gone now as the possible strings needed are now in the source code itself. From brett at python.org Mon Aug 20 09:51:41 2007 From: brett at python.org (Brett Cannon) Date: Mon, 20 Aug 2007 00:51:41 -0700 Subject: [Python-3000] Planning to switch to new tracker on August 23rd Message-ID: Having squashed the final issues, we are now ready to switch over to the new tracker! The plan is to do it on the 23rd. But before I announce to the community I wanted to make sure there was not some specific objection by python-dev or python-3000. If there is please let me know by midday Monday so that we can postpone to next week if needed. -Brett From nnorwitz at gmail.com Mon Aug 20 18:37:30 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Mon, 20 Aug 2007 09:37:30 -0700 Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch In-Reply-To: References: Message-ID: On 8/20/07, Travis E. Oliphant wrote: > > > Memory leaked when allocated from: > > array_buffer_getbuf (arraymodule.c:1775) > > buffer_getbuf (bufferobject.c:28) > > bytes_init (bytesobject.c:807) > > type_call (typeobject.c:429) > > Hmm. I'm not sure what memory is being leaked unless there are > reference counting issues I'm not seeing. > > In bytes_init for example, that line number is a static memory > allocation? How is static memory being leaked? I'm not sure if this was before or after my checkin, so the line numbers could have been off a bit. > The arraymodule.c malloc call should be gone now as the possible strings > needed are now in the source code itself. That was the only leak AFAIK. So hopefully by removing it there aren't any more. Once there are tests it will be worthwhile to check again. I don't think I checked for refleaks. n From rhamph at gmail.com Mon Aug 20 18:49:09 2007 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 20 Aug 2007 10:49:09 -0600 Subject: [Python-3000] cleaning up different ways to free an object In-Reply-To: References: Message-ID: On 8/19/07, Neal Norwitz wrote: > On 8/19/07, Brett Cannon wrote: > > On 8/19/07, Neal Norwitz wrote: > > > I just fixed a bug in the new memoryview that used PyObject_DEL which > > > caused a problem in debug mode. I had to change it to a Py_DECREF. > > > It seems we have a lot of spellings of ways to free an object and I > > > wonder if there are more problems lurking in there. > > > > > > $ cat */*.c | grep -c PyObject_Del > > > 103 > > > $ cat */*.c | grep -c PyObject_DEL > > > 16 > > > $ cat */*.c | grep -c PyObject_Free > > > 16 > > > $ cat */*.c | grep -c PyObject_FREE > > > 19 > > > > > > I don't know how many of these are correct or incorrect. > > > > > > Note in Include/objimpl, the Del and Free variants are the same. I > > > plan to get rid of one of them. > > > > > > #define PyObject_Del PyObject_Free > > > #define PyObject_DEL PyObject_FREE > > > > > > PyObject_{MALLOC,REALLOC,FREE} depend upon whether python is compiled > > > with debug mode, pymalloc, or not. > > > > > > What are the rules for when a particular API should be used (or not > > > used) to free an object? > > > > If you read the comment at the top of Include/objimpl.h, it says that > > PyObject_(New|NewVar|Del) are for object allocation while > > PyObject_(Malloc|Realloc|Free) are just like malloc/free, but they use > > pymalloc instead of the system malloc. > > Ya, I'm not talking about the distinctions/categories. They makes > sense. The 'correctness' I was referring to (thus the rules) was when > to use PyObject_Del vs Py_DECREF (ie, the problem with memoryview). I > was trying to point out with the greps that the DELs/FREEs were > infrequently used. I know there are some cases in _bsddb.c and I'm > wondering if those are correct (there are a handful of other modules > which also use them). The Del variants are used more in Modules while > the Free variants are used more in the core. Thus my suggestion to add a refcount check to _PyObject_Del. It should only be used when the refcounts hits 0. Using it at 1 could be allowed too, or maybe that should be a ForceDel variant? > Changing PyObject_Del to Py_DECREF may require that more of a > structure needs to be initialized before DECREFing, otherwise the > dealloc might access uninitialized memory. > > I guess I can't really get rid of the aliases though, not without > making the API inconsistent. > > > After that there are the usual performance macros. > > Another thing that kinda bugs me is that the 'macro' versions are not > normally macros. In a default build (ie, with pymalloc), they are > non-inlined function calls. I'd much like to see the macros (other than the type casting ones) ripped out. I doubt a performance advantage for normal non-debug uses can be demonstrated. (Prove me wrong of course.) The hardest part of trying to understand what to call is because of all these macros and ifdefs. -- Adam Olsen, aka Rhamphoryncus From guido at python.org Mon Aug 20 21:01:16 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Aug 2007 12:01:16 -0700 Subject: [Python-3000] PEP 3101 clarification requests In-Reply-To: <46C8E6A3.4070202@trueblade.com> References: <46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com> Message-ID: On 8/19/07, Eric Smith wrote: > Talin wrote: > > Wow, excellent feedback. I've added your email to the list of reminders > > for the next round of edits. > > Here's something else for future edits: > > 1. When converting a string to an integer, what should the rules be? > Should: > format("0xd", "d") > produce "13", or should it be an error? I can't see that as anything besides an error. There should be no implicit conversions from strings to ints. > 2. I'm making the format specifiers as strict as I can. So, I've made > these ValueError's: > > For strings: > - specifying a sign > - specifying an alignment of '=' > > For longs: > - specify a precision > - specify a sign with type of 'c' Works for me. Will probably catch a few bugs. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 20 21:05:13 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Aug 2007 12:05:13 -0700 Subject: [Python-3000] Wanted: tasks for Py3k Sprint next week In-Reply-To: References: Message-ID: If one of these pairs exists solely for backwards compatibility, yes. I think Neal Norwitz started a discussion of a similar issue. On 8/18/07, Adam Olsen wrote: > On 8/18/07, Guido van Rossum wrote: > > I'm soliciting ideas for things that need to be done for the 3.0 > > release that would make good sprint topics. Assume we'll have a mix of > > more and less experienced developers on hand. > > > > (See wiki.python.org/moin/GoogleSprint .) > > Would ripping out the malloc macros[1] be a suitable suggestion? > > [1] > Include/objimpl.h:#define PyObject_MALLOC PyMem_MALLOC > Include/pymem.h:#define PyMem_MALLOC PyObject_MALLOC > > -- > Adam Olsen, aka Rhamphoryncus > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From eric+python-dev at trueblade.com Mon Aug 20 21:11:27 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Mon, 20 Aug 2007 15:11:27 -0400 Subject: [Python-3000] PEP 3101 clarification requests In-Reply-To: References: <46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com> Message-ID: <46C9E75F.2040208@trueblade.com> Guido van Rossum wrote: > On 8/19/07, Eric Smith wrote: >> Talin wrote: >>> Wow, excellent feedback. I've added your email to the list of reminders >>> for the next round of edits. >> Here's something else for future edits: >> >> 1. When converting a string to an integer, what should the rules be? >> Should: >> format("0xd", "d") >> produce "13", or should it be an error? > > I can't see that as anything besides an error. There should be no > implicit conversions from strings to ints. OK. I had been planning on implicitly converting between strings, ints, and floats (in all directions). The PEP doesn't really say. So the only implicit conversions will be: int->float int->string float->int float->string Now that I look at it, % doesn't support string->float or string->int conversions. Not sure where I got the idea it was needed. I'll remove it and update my test cases. Converting to strings doesn't really buy you much, since we have the !s specifier. But I think it's needed for backward compatibility with % formatting. From guido at python.org Mon Aug 20 21:16:08 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Aug 2007 12:16:08 -0700 Subject: [Python-3000] PEP 3101 clarification requests In-Reply-To: <46C9E75F.2040208@trueblade.com> References: <46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com> <46C9E75F.2040208@trueblade.com> Message-ID: On 8/20/07, Eric Smith wrote: > Guido van Rossum wrote: > > On 8/19/07, Eric Smith wrote: > >> Talin wrote: > >>> Wow, excellent feedback. I've added your email to the list of reminders > >>> for the next round of edits. > >> Here's something else for future edits: > >> > >> 1. When converting a string to an integer, what should the rules be? > >> Should: > >> format("0xd", "d") > >> produce "13", or should it be an error? > > > > I can't see that as anything besides an error. There should be no > > implicit conversions from strings to ints. > > OK. I had been planning on implicitly converting between strings, ints, > and floats (in all directions). The PEP doesn't really say. > > So the only implicit conversions will be: > int->float > int->string > float->int > float->string > > Now that I look at it, % doesn't support string->float or string->int > conversions. Not sure where I got the idea it was needed. I'll remove > it and update my test cases. > > Converting to strings doesn't really buy you much, since we have the !s > specifier. But I think it's needed for backward compatibility with % > formatting. Why? The conversion code can just generate !s:-20 instead of :-20s. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From eric+python-dev at trueblade.com Mon Aug 20 21:46:41 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Mon, 20 Aug 2007 15:46:41 -0400 Subject: [Python-3000] PEP 3101 clarification requests In-Reply-To: References: <46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com> <46C9E75F.2040208@trueblade.com> Message-ID: <46C9EFA1.5050906@trueblade.com> Guido van Rossum wrote: >> Converting to strings doesn't really buy you much, since we have the !s >> specifier. But I think it's needed for backward compatibility with % >> formatting. > > Why? The conversion code can just generate !s:-20 instead of :-20s. True enough. I'll take it out, too. Talin: On your list of to-do items for the PEP, could you add that the only conversions for the standard conversion specifiers are int <-> float? Thanks. Eric. From eric+python-dev at trueblade.com Tue Aug 21 02:18:42 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Mon, 20 Aug 2007 20:18:42 -0400 Subject: [Python-3000] Looking for advice on PEP 3101 implementation details In-Reply-To: References: <46C5F112.5050101@trueblade.com> <46C601A7.6050408@trueblade.com> Message-ID: <46CA2F62.8060903@trueblade.com> I'm basically done with format(), string.format(), object.__format__(), string.__format__(), long.__format__(), and float.__format__(). I have some cleanup left to do from all of the refactoring, but it's passing the vast majority of my tests. The only real remaining work is to implement string.Formatter. This is a class designed to be overridden to customize the formatting behavior. It will share much of the C code with string.format(). My plan is to write this class in Python, and put it in Lib/string.py. Given the complexities and book-keeping involved, writing it in C doesn't seem worth the hassle. In order to talk back to the C implementation code, I'll create a private module in Modules/_formatter.c. Does this seem reasonable? If so, my question is how to add module in the Modules directory. There is some logic in the top level Makefile.pre.in, but it doesn't look like it applies to all of the code in Modules, just some of the files. Modules/Setup.dist contains this comment: # This only contains the minimal set of modules required to run the # setup.py script in the root of the Python source tree. I think this applies to me, as setup.py indirectly includes string. So, is the right thing to do to insert my _formatter.c into Modules/Setup.dist? Is there anything else I need to do? Is there some existing code in Modules that I should base my approach on? I googled for help on this, but didn't get anywhere. Thanks again for your assistance. Eric. From greg.ewing at canterbury.ac.nz Tue Aug 21 02:48:25 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 21 Aug 2007 12:48:25 +1200 Subject: [Python-3000] PEP 3101 clarification requests In-Reply-To: <46C9EFA1.5050906@trueblade.com> References: <46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com> <46C9E75F.2040208@trueblade.com> <46C9EFA1.5050906@trueblade.com> Message-ID: <46CA3659.6010304@canterbury.ac.nz> Eric Smith wrote: > Guido van Rossum wrote: > > > Why? The conversion code can just generate !s:-20 instead of :-20s. > > Talin: On your list of to-do items for the PEP, could you add that the > only conversions for the standard conversion specifiers are int <-> float? Please, no! While the converter may be able to handle it, "!s:-20" is terribly ugly for humans. -- Greg From guido at python.org Tue Aug 21 03:00:55 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Aug 2007 18:00:55 -0700 Subject: [Python-3000] Looking for advice on PEP 3101 implementation details In-Reply-To: <46CA2F62.8060903@trueblade.com> References: <46C5F112.5050101@trueblade.com> <46C601A7.6050408@trueblade.com> <46CA2F62.8060903@trueblade.com> Message-ID: On 8/20/07, Eric Smith wrote: > I'm basically done with format(), string.format(), object.__format__(), > string.__format__(), long.__format__(), and float.__format__(). I have > some cleanup left to do from all of the refactoring, but it's passing > the vast majority of my tests. > > The only real remaining work is to implement string.Formatter. This is > a class designed to be overridden to customize the formatting behavior. > It will share much of the C code with string.format(). > > My plan is to write this class in Python, and put it in Lib/string.py. > Given the complexities and book-keeping involved, writing it in C > doesn't seem worth the hassle. In order to talk back to the C > implementation code, I'll create a private module in Modules/_formatter.c. > > Does this seem reasonable? Sure. > If so, my question is how to add module in the Modules directory. There > is some logic in the top level Makefile.pre.in, but it doesn't look like > it applies to all of the code in Modules, just some of the files. > > Modules/Setup.dist contains this comment: > # This only contains the minimal set of modules required to run the > # setup.py script in the root of the Python source tree. > > I think this applies to me, as setup.py indirectly includes string. > > So, is the right thing to do to insert my _formatter.c into > Modules/Setup.dist? Is there anything else I need to do? Is there some > existing code in Modules that I should base my approach on? > > I googled for help on this, but didn't get anywhere. > > Thanks again for your assistance. You can ignore Makefile* and Modules/Setup*; instead, you should be editing setup.py at the toplevel. Since your new module doesn't depend on anything external it should be a one-line change, modeled after this one: exts.append( Extension('_weakref', ['_weakref.c']) ) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 21 03:03:09 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Aug 2007 18:03:09 -0700 Subject: [Python-3000] PEP 3101 clarification requests In-Reply-To: <46CA3659.6010304@canterbury.ac.nz> References: <46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com> <46C9E75F.2040208@trueblade.com> <46C9EFA1.5050906@trueblade.com> <46CA3659.6010304@canterbury.ac.nz> Message-ID: On 8/20/07, Greg Ewing wrote: > Eric Smith wrote: > > Guido van Rossum wrote: > > > > > Why? The conversion code can just generate !s:-20 instead of :-20s. > > > > Talin: On your list of to-do items for the PEP, could you add that the > > only conversions for the standard conversion specifiers are int <-> float? > > Please, no! While the converter may be able to handle > it, "!s:-20" is terribly ugly for humans. But how often will you need this? (You only need the !s part if you don't know that the argument is a string.) The alternative would require every type's formatter to interpret -20s the same way, which goes against the idea that the conversion mini-language is an object's own business. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From eric+python-dev at trueblade.com Tue Aug 21 03:22:22 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Mon, 20 Aug 2007 21:22:22 -0400 Subject: [Python-3000] Looking for advice on PEP 3101 implementation details In-Reply-To: References: <46C5F112.5050101@trueblade.com> <46C601A7.6050408@trueblade.com> <46CA2F62.8060903@trueblade.com> Message-ID: <46CA3E4E.1020203@trueblade.com> Guido van Rossum wrote: > On 8/20/07, Eric Smith wrote: >> Modules/Setup.dist contains this comment: >> # This only contains the minimal set of modules required to run the >> # setup.py script in the root of the Python source tree. >> >> I think this applies to me, as setup.py indirectly includes string. > You can ignore Makefile* and Modules/Setup*; instead, you should be > editing setup.py at the toplevel. Since your new module doesn't depend > on anything external it should be a one-line change, modeled after > this one: > > exts.append( Extension('_weakref', ['_weakref.c']) ) But if string.py imports _formatter, then setup.py fails with being unable to "import string": $ ./python setup.py object : ImportError('No module named _formatter',) type : ImportError refcount: 4 address : 0xf6f9acac lost sys.stderr That's why I referenced the comment in Setup.dist. From guido at python.org Tue Aug 21 04:25:46 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Aug 2007 19:25:46 -0700 Subject: [Python-3000] Looking for advice on PEP 3101 implementation details In-Reply-To: <46CA3E4E.1020203@trueblade.com> References: <46C5F112.5050101@trueblade.com> <46C601A7.6050408@trueblade.com> <46CA2F62.8060903@trueblade.com> <46CA3E4E.1020203@trueblade.com> Message-ID: On 8/20/07, Eric Smith wrote: > Guido van Rossum wrote: > > On 8/20/07, Eric Smith wrote: > > >> Modules/Setup.dist contains this comment: > >> # This only contains the minimal set of modules required to run the > >> # setup.py script in the root of the Python source tree. > >> > >> I think this applies to me, as setup.py indirectly includes string. > > > You can ignore Makefile* and Modules/Setup*; instead, you should be > > editing setup.py at the toplevel. Since your new module doesn't depend > > on anything external it should be a one-line change, modeled after > > this one: > > > > exts.append( Extension('_weakref', ['_weakref.c']) ) > > But if string.py imports _formatter, then setup.py fails with being > unable to "import string": > > $ ./python setup.py > object : ImportError('No module named _formatter',) > type : ImportError > refcount: 4 > address : 0xf6f9acac > lost sys.stderr > > That's why I referenced the comment in Setup.dist. Hm, those damn dependencies. In that case I suggest adding it to sys instead of creating a new internal method. It could be sys._formatparser or whatever useful name you'd like to give it, as long as it starts with an underscore. That should solve it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From eric+python-dev at trueblade.com Tue Aug 21 04:32:49 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Mon, 20 Aug 2007 22:32:49 -0400 Subject: [Python-3000] Looking for advice on PEP 3101 implementation details In-Reply-To: References: <46C5F112.5050101@trueblade.com> <46C601A7.6050408@trueblade.com> <46CA2F62.8060903@trueblade.com> <46CA3E4E.1020203@trueblade.com> Message-ID: <46CA4ED1.5010606@trueblade.com> Guido van Rossum wrote: > Hm, those damn dependencies. In that case I suggest adding it to sys > instead of creating a new internal method. It could be > sys._formatparser or whatever useful name you'd like to give it, as > long as it starts with an underscore. That should solve it. Okay, that's much easier for me. I'll go in that direction. From greg.ewing at canterbury.ac.nz Tue Aug 21 07:46:42 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 21 Aug 2007 17:46:42 +1200 Subject: [Python-3000] PEP 3101 clarification requests In-Reply-To: References: <46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com> <46C9E75F.2040208@trueblade.com> <46C9EFA1.5050906@trueblade.com> <46CA3659.6010304@canterbury.ac.nz> Message-ID: <46CA7C42.5040007@canterbury.ac.nz> Guido van Rossum wrote: > But how often will you need this? (You only need the !s part if you > don't know that the argument is a string.) Maybe I'm confused. I thought we had agreed that most types would delegate to str if they didn't understand the format, so most of the time there wouldn't be any need to use "!s". Is that still true? If not, I think it will be very inconvenient, as I very frequently format things of all sorts of types using "%s", and rely on it doing something reasonable. -- Greg From ericsmith at windsor.com Tue Aug 21 00:54:51 2007 From: ericsmith at windsor.com (Eric V. Smith) Date: Mon, 20 Aug 2007 18:54:51 -0400 Subject: [Python-3000] Looking for advice on PEP 3101 implementation details In-Reply-To: <46C5F112.5050101@trueblade.com> References: <46C5F112.5050101@trueblade.com> Message-ID: <46CA1BBB.6070603@windsor.com> I've completed most of the implementation for PEP 3101. The only thing I have left to do is the Formatter class, which is supposed to live in the string module. My plan is to write this part in Python, and put it in Lib/string.py. Given the complexities and book-keeping involved, writing it in C doesn't seem worth the hassle. In order to talk back to the existing C implementation, I'll create a private module in Modules/_formatter.c. Does this seem reasonable? If so, my question is how to add module in the Modules directory. There is some logic in the top level Makefile.pre.in, but it doesn't look like it applies to all of the code in Modules, just some of the files. Modules/Setup.dist contains this comment: # This only contains the minimal set of modules required to run the # setup.py script in the root of the Python source tree. I think this applies to me, as setup.py indirectly includes string. So, is the right thing to do to insert my _formatter.c into Modules/Setup.dist? Is there anything else I need to do? I googled for help on this, but didn't get anywhere. Thanks again for any assistance. Eric. From g.brandl at gmx.net Tue Aug 21 08:17:30 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 21 Aug 2007 08:17:30 +0200 Subject: [Python-3000] Documentation switch imminent In-Reply-To: <740c3aec0708171731qc9324c3o17debfafe4c1530d@mail.gmail.com> References: <740c3aec0708171731qc9324c3o17debfafe4c1530d@mail.gmail.com> Message-ID: BJ?rn Lindqvist schrieb: > It is fantastic! Totally super work. I just have one small request; > pretty please do not set the font. I'm very happy with my browsers > default (Verdana), and Bitstream Vera Sans renders badly for me. Okay, I've changed the stylesheet, it should go live on docs.python.org intermittently. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From eric+python-dev at trueblade.com Tue Aug 21 11:21:09 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Tue, 21 Aug 2007 05:21:09 -0400 Subject: [Python-3000] Looking for advice on PEP 3101 implementation details In-Reply-To: <46CA1BBB.6070603@windsor.com> References: <46C5F112.5050101@trueblade.com> <46CA1BBB.6070603@windsor.com> Message-ID: <46CAAE85.1070206@trueblade.com> Eric V. Smith wrote: [a duplicate message] Please ignore this. I accidentally sent it twice. From eric+python-dev at trueblade.com Tue Aug 21 11:55:23 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Tue, 21 Aug 2007 05:55:23 -0400 Subject: [Python-3000] PEP 3101 clarification requests In-Reply-To: <46CA7C42.5040007@canterbury.ac.nz> References: <46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com> <46C9E75F.2040208@trueblade.com> <46C9EFA1.5050906@trueblade.com> <46CA3659.6010304@canterbury.ac.nz> <46CA7C42.5040007@canterbury.ac.nz> Message-ID: <46CAB68B.9060900@trueblade.com> Greg Ewing wrote: > Guido van Rossum wrote: >> But how often will you need this? (You only need the !s part if you >> don't know that the argument is a string.) > > Maybe I'm confused. I thought we had agreed that most > types would delegate to str if they didn't understand > the format, so most of the time there wouldn't be any > need to use "!s". Is that still true? Yes, it is true. Here's a working test case: # class with __str__, but no __format__ class E: def __init__(self, x): self.x = x def __str__(self): return 'E(' + self.x + ')' self.assertEqual('{0}'.format(E('data')), 'E(data)') self.assertEqual('{0:^10}'.format(E('data')), ' E(data) ') self.assertEqual('{0:^10s}'.format(E('data')), ' E(data) ') The formatting in all 3 cases is being done by string.__format__() (actually object.__format__, which calls str(o).__format__). > If not, I think it will be very inconvenient, as I > very frequently format things of all sorts of types > using "%s", and rely on it doing something reasonable. That will continue to work, for objects that don't provide a __format__ function. The problem is that if an object does does its own __format__, it either needs to understand all of the string formatting, or at least recognize a string format and send it along to string.__format__() (or object.__format__, which will convert to string for you). Another working test case: # class with __format__ that forwards to string, # for some format_spec's class G: def __init__(self, x): self.x = x def __str__(self): return "string is " + self.x def __format__(self, format_spec): if format_spec == 's': return 'G(' + self.x + ')' return object.__format__(self, format_spec) self.assertEqual('{0:s}'.format(G('data')), 'G(data)') # unknown spec, will call object.__format__, which calls str() self.assertEqual('{0:>15s}'.format(G('data')), ' string is data') # convert to string explicitely, overriding G.__format__ self.assertEqual('{0!s}'.format(G('data')), 'string is data') Note the collision with the 's' format_spec in this example. You'd have to carefully design your object's __format__ specifiers to be able to recognize string specifiers as being different from own specifiers (something that G does not cleanly do). int is like G: it defines its own __format__. "!s" says: skip the object's own __format__ function, just convert the object to a string and call string.__format__. So what Guido is saying is that for int, instead of having int.__format__ recognize string formatting specifiers and doing the conversion to string, you need to convert it to a string yourself with "!s". Whether that's better or not, I leave up to Guido. I personally think that for int and float, having them recognize "s" format specs is sufficiently handy that it's worth having, but I understand not providing that feature. Eric. From dalcinl at gmail.com Tue Aug 21 17:00:36 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 21 Aug 2007 12:00:36 -0300 Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch In-Reply-To: References: Message-ID: Travis, I had no much time to follow you on py3k-buffer branch, but now you merged in py3k, I want to make an small comment for your consideration. Pehaps the 'PyBuffer' struct could be named different, something like 'Py_buffer'. The use case has some similarites to 'Py_complex' struct. It is no related to any 'PyObject*', but to a public structure wich, (if I understand right) can be declared and used in static storage. In short, I am propossing the naming below. Note I removed 'bufferinfo' in the typedef line, as it seems to be not needed and it only appears here after grepping in sources) and could conflict with user code. /* buffer interface */ typedef struct { ..... } Py_buffer; typedef struct { PyObject_HEAD PyObject *base; Py_buffer view; } PyMemoryViewObject; Again, taking complex as an example, please note the symmetry: typedef struct { double real; double imag; } Py_complex; typedef struct { PyObject_HEAD Py_complex cval; } PyComplexObject; Regards, On 8/18/07, Travis E. Oliphant wrote: > In preparation for the sprints, I have converted all Python objects to > use the new buffer protocol PEP and implemented most of the C-API. This > work took place in the py3k-buffer branch which now passes all the tests > that py3k does. > > So, I merged the changes back to the py3k branch in hopes that others > can continue working on what I've done. The merge took place after > fully syncing the py3k-buffer branch with the current trunk. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From guido at python.org Tue Aug 21 19:06:32 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 21 Aug 2007 10:06:32 -0700 Subject: [Python-3000] PEP 3101 clarification requests In-Reply-To: <46CA7C42.5040007@canterbury.ac.nz> References: <46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com> <46C9E75F.2040208@trueblade.com> <46C9EFA1.5050906@trueblade.com> <46CA3659.6010304@canterbury.ac.nz> <46CA7C42.5040007@canterbury.ac.nz> Message-ID: On 8/20/07, Greg Ewing wrote: > Guido van Rossum wrote: > > But how often will you need this? (You only need the !s part if you > > don't know that the argument is a string.) > > Maybe I'm confused. I thought we had agreed that most > types would delegate to str if they didn't understand > the format, so most of the time there wouldn't be any > need to use "!s". Is that still true? Yes, by virtue of this being what object.__format__ does (AFAIU). > If not, I think it will be very inconvenient, as I > very frequently format things of all sorts of types > using "%s", and rely on it doing something reasonable. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From oliphant at enthought.com Tue Aug 21 18:16:15 2007 From: oliphant at enthought.com (Travis Oliphant) Date: Tue, 21 Aug 2007 10:16:15 -0600 Subject: [Python-3000] Py3k-buffer branch merged back to py3k branch In-Reply-To: References: Message-ID: <46CB0FCF.3080100@enthought.com> Lisandro Dalcin wrote: > Travis, I had no much time to follow you on py3k-buffer branch, but > now you merged in py3k, I want to make an small comment for your > consideration. > > Pehaps the 'PyBuffer' struct could be named different, something like > 'Py_buffer'. The use case has some similarites to 'Py_complex' struct. > It is no related to any 'PyObject*', but to a public structure wich, > (if I understand right) can be declared and used in static storage. > > In short, I am propossing the naming below. Note I removed > 'bufferinfo' in the typedef line, as it seems to be not needed and it > only appears here after grepping in sources) and could conflict with > user code. > I have no problems with these changes. I will be unable to do them myself though this week. -Travis > /* buffer interface */ > typedef struct { > ..... > } Py_buffer; > > typedef struct { > PyObject_HEAD > PyObject *base; > Py_buffer view; > } PyMemoryViewObject; > > > Again, taking complex as an example, please note the symmetry: > > > typedef struct { > double real; > double imag; > } Py_complex; > > typedef struct { > PyObject_HEAD > Py_complex cval; > } PyComplexObject; > > > From gvanrossum at gmail.com Tue Aug 21 19:56:50 2007 From: gvanrossum at gmail.com (gvanrossum at gmail.com) Date: Tue, 21 Aug 2007 10:56:50 -0700 Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets) Message-ID: I've shared a document with you called "Py3k Sprint Tasks": http://spreadsheets.google.com/ccc?key=pBLWM8elhFAmKbrhhh0ApQA&inv=python-3000 at python.org&t=3328567089265242420&guest It's not an attachment -- it's stored online at Google Docs & Spreadsheets. To open this document, just click the link above. (resend, I'm not sure this made it out the first time) This spreadsheet is where I'm organizing the tasks for the Google Sprint starting tomorrow. Feel free to add. If you're coming to the sprint, feel free to claim ownership of a task. --- Note: You'll need to sign into Google with this email address. To use a different email address, just reply to this message and ask me to invite your other one. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070821/b44249b1/attachment.htm From skip at pobox.com Tue Aug 21 20:23:11 2007 From: skip at pobox.com (skip at pobox.com) Date: Tue, 21 Aug 2007 13:23:11 -0500 Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets) In-Reply-To: References: Message-ID: <18123.11663.259022.539912@montanaro.dyndns.org> Guido> Feel free to add. If you're coming to the sprint, feel free to Guido> claim ownership of a task. I started to edit the spreadsheet but then held off, remembering the edit conflict problems I caused you just a few minutes earlier with the wiki page (sorry about that). Does the Google docs/spreadsheets server do a decent job handling multiple document writers? Skip From guido at python.org Tue Aug 21 20:34:08 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 21 Aug 2007 11:34:08 -0700 Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets) In-Reply-To: <18123.11663.259022.539912@montanaro.dyndns.org> References: <18123.11663.259022.539912@montanaro.dyndns.org> Message-ID: On 8/21/07, skip at pobox.com wrote: > > Guido> Feel free to add. If you're coming to the sprint, feel free to > Guido> claim ownership of a task. > > I started to edit the spreadsheet but then held off, remembering the edit > conflict problems I caused you just a few minutes earlier with the wiki page > (sorry about that). Does the Google docs/spreadsheets server do a decent > job handling multiple document writers? Yes. I think the only way you can create a conflict is by editing the same cell simultaneously; and it will show who is on which cell (at least if you open "Discuss"). So by all means go ahead! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Wed Aug 22 00:12:40 2007 From: barry at python.org (Barry Warsaw) Date: Tue, 21 Aug 2007 18:12:40 -0400 Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets) In-Reply-To: References: Message-ID: <93DBB66F-5D0D-4E46-8480-D2BFC693722A@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 21, 2007, at 1:56 PM, gvanrossum at gmail.com wrote: > I've shared a document with you called "Py3k Sprint Tasks": > http://spreadsheets.google.com/ccc? > key=pBLWM8elhFAmKbrhhh0ApQA&inv=python-3000 at python.org&t=3328567089265 > 242420&guest > > It's not an attachment -- it's stored online at Google Docs & > Spreadsheets. To open this document, just click the link above. > > (resend, I'm not sure this made it out the first time) > > This spreadsheet is where I'm organizing the tasks for the Google > Sprint starting tomorrow. > > Feel free to add. If you're coming to the sprint, feel free to claim > ownership of a task. I have approval to spend some official time at this sprint, though I'll be working from home and will be on IRC, Skype, etc. I've been spending hours of my own time on the email package for py3k this week and every time I think I'm nearing success I get defeated again. I think Victor Stinner came to similar conclusions. To put it mildly, the email package is effed up! But I'm determined to solve the worst of the problems this week. I only have Wednesday and Thursday to work on this, with most of my time available on Thursday. I'd really like to find one or two other folks to connect with to help work out the stickiest issues. Please contact me directly or on this list to arrange a time with me. I'm UTC-4 if that helps. I'll be on #python-dev (barry) too. Remember that the current code is in the python sandbox (under emailpkg/5_0-exp). I have some uncommitted code which I'll try to check in tonight, though I don't know if it will make matters better or worse. ;) Cheers, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRstjWXEjvBPtnXfVAQLQcQP+Lo/D1YH1+w/51kNyQN1+zrzu1Cov7ERk 1xtT5L2LlaPjXGeVMlc6Xz0bbLVc96kSQ4SIrkc5RRNorcYzMf8kID4rLkO6S+kU CXtpOVgmzkX9zotAL9O72v2uOHT6c0fcK8ag44EiAtWei3Tdf+R2rL6lOzo0lHgj qmVPFzlzGCA= =t1nr -----END PGP SIGNATURE----- From janssen at parc.com Wed Aug 22 02:01:28 2007 From: janssen at parc.com (Bill Janssen) Date: Tue, 21 Aug 2007 17:01:28 PDT Subject: [Python-3000] Py3k Sprint Tasks In-Reply-To: References: Message-ID: <07Aug21.170133pdt."57996"@synergy1.parc.xerox.com> I'd like to spend some time during the Sprint doing three things: 1. Doing a code review of the modified SSL C code with 2 or 3 others. Can we get a small conference room with a projector to use for an hour? If not, I can provide one at PARC. I also need a few volunteers to be the review group. 2. Working on the test cases and adding them to the standard test suite. 3. Improving the documentation. In particular there needs to be better documentation on certificates, what's in them, how to use them, what certificate validation does and does not provide, where to get standard root certificates. It would be useful to document some standard code patterns, like how to shift into TLS after a STARTTLS request has been received, etc. A fourth thing I'd like to do, which isn't strictly Sprint-related, is to learn more about distutils and packaging. Bill From greg.ewing at canterbury.ac.nz Wed Aug 22 02:11:50 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 22 Aug 2007 12:11:50 +1200 Subject: [Python-3000] PEP 3101 clarification requests In-Reply-To: <46CAB68B.9060900@trueblade.com> References: <46C65006.9030907@acm.org> <46C8E6A3.4070202@trueblade.com> <46C9E75F.2040208@trueblade.com> <46C9EFA1.5050906@trueblade.com> <46CA3659.6010304@canterbury.ac.nz> <46CA7C42.5040007@canterbury.ac.nz> <46CAB68B.9060900@trueblade.com> Message-ID: <46CB7F46.1030200@canterbury.ac.nz> Eric Smith wrote: > The problem is that if an object does does its own > __format__, it either needs to understand all of the string formatting, > or at least recognize a string format and send it along to > string.__format__() (or object.__format__, which will convert to string > for you). No, all it needs to do is tell when it *doesn't* recognise the format and call its inherited __format__ method. Eventually that will get to object.__format__ which will delegate to str. > Note the collision with the 's' format_spec in this example. I'd say you should normally design your format specs so that they don't conflict with string formats. If you want to implement the string formats your own way, that's okay, but then it's your responsibility to support all of them. -- Greg From guido at python.org Wed Aug 22 16:57:28 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 22 Aug 2007 07:57:28 -0700 Subject: [Python-3000] Py3k Sprint Tasks In-Reply-To: <9106183526398442256@unknownmsgid> References: <9106183526398442256@unknownmsgid> Message-ID: On 8/21/07, Bill Janssen wrote: > I'd like to spend some time during the Sprint doing three things: > > 1. Doing a code review of the modified SSL C code with 2 or 3 others. > Can we get a small conference room with a projector to use for an hour? > If not, I can provide one at PARC. I also need a few volunteers to be > the review group. NP, I'll try to book something. > 2. Working on the test cases and adding them to the standard test suite. > > 3. Improving the documentation. In particular there needs to be better > documentation on certificates, what's in them, how to use them, what > certificate validation does and does not provide, where to get standard > root certificates. It would be useful to document some standard code > patterns, like how to shift into TLS after a STARTTLS request has been > received, etc. > > A fourth thing I'd like to do, which isn't strictly Sprint-related, is to > learn more about distutils and packaging. Did you ever manage to view the task spreadsheet? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From eric+python-dev at trueblade.com Wed Aug 22 18:48:43 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 22 Aug 2007 12:48:43 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C2C1A0.4060002@trueblade.com> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> Message-ID: <46CC68EB.2030609@trueblade.com> Eric Smith wrote: > Talin wrote: >> A new version is up, incorporating material from the various discussions >> on this list: >> >> http://www.python.org/dev/peps/pep-3101/ > > self.assertEquals('{0[{1}]}'.format('abcdefg', 4), 'e') > self.assertEquals('{foo[{bar}]}'.format(foo='abcdefg', bar=4), 'e') I've been re-reading the PEP, in an effort to make sure everything is working. I realized that these tests should not pass. The PEP says that "Format specifiers can themselves contain replacement fields". The tests above have replacement fields in the field name, which is not allowed. I'm going to remove this functionality. I believe the intent is to support a replacement for: "%.*s" % (4, 'how now brown cow') Which would be: "{0:.{1}}".format('how now brown cow', 4) For this, there's no need for replacement on field name. I've taken it out of the code, and made these tests in to errors. Eric. From jyasskin at gmail.com Wed Aug 22 21:36:31 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Wed, 22 Aug 2007 12:36:31 -0700 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy for Numbers In-Reply-To: <5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com> References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> <5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com> Message-ID: <5d44f72f0708221236k7c3ea054k43eb237f4a3ef577@mail.gmail.com> There are still some open issues here that need answers: * Should __pos__ coerce the argument to be an instance of the type it's defined on? * Add Demo/classes/Rat.py to the stdlib? * How many of __trunc__, __floor__, __ceil__, and __round__ should be magic methods? For __round__, when do we want to return an Integral? [__properfraction__ is probably subsumed by divmod(x, 1).] * How to give the removed methods (divmod, etc. on complex) good error messages without having them show up in help(complex)? I'll look into this during the sprint. On 8/2/07, Jeffrey Yasskin wrote: > After some more discussion, I have another version of the PEP with a > draft, partial implementation. Let me know what you think. > > > > PEP: 3141 > Title: A Type Hierarchy for Numbers > Version: $Revision: 56646 $ > Last-Modified: $Date: 2007-08-01 10:11:55 -0700 (Wed, 01 Aug 2007) $ > Author: Jeffrey Yasskin > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 23-Apr-2007 > Post-History: 25-Apr-2007, 16-May-2007, 02-Aug-2007 > > > Abstract > ======== > > This proposal defines a hierarchy of Abstract Base Classes (ABCs) (PEP > 3119) to represent number-like classes. It proposes a hierarchy of > ``Number :> Complex :> Real :> Rational :> Integral`` where ``A :> B`` > means "A is a supertype of B", and a pair of ``Exact``/``Inexact`` > classes to capture the difference between ``floats`` and > ``ints``. These types are significantly inspired by Scheme's numeric > tower [#schemetower]_. > > Rationale > ========= > > Functions that take numbers as arguments should be able to determine > the properties of those numbers, and if and when overloading based on > types is added to the language, should be overloadable based on the > types of the arguments. For example, slicing requires its arguments to > be ``Integrals``, and the functions in the ``math`` module require > their arguments to be ``Real``. > > Specification > ============= > > This PEP specifies a set of Abstract Base Classes, and suggests a > general strategy for implementing some of the methods. It uses > terminology from PEP 3119, but the hierarchy is intended to be > meaningful for any systematic method of defining sets of classes. > > The type checks in the standard library should use these classes > instead of the concrete built-ins. > > > Numeric Classes > --------------- > > We begin with a Number class to make it easy for people to be fuzzy > about what kind of number they expect. This class only helps with > overloading; it doesn't provide any operations. :: > > class Number(metaclass=ABCMeta): pass > > > Most implementations of complex numbers will be hashable, but if you > need to rely on that, you'll have to check it explicitly: mutable > numbers are supported by this hierarchy. **Open issue:** Should > __pos__ coerce the argument to be an instance of the type it's defined > on? Why do the builtins do this? :: > > class Complex(Number): > """Complex defines the operations that work on the builtin complex type. > > In short, those are: a conversion to complex, .real, .imag, +, -, > *, /, abs(), .conjugate, ==, and !=. > > If it is given heterogenous arguments, and doesn't have special > knowledge about them, it should fall back to the builtin complex > type as described below. > """ > > @abstractmethod > def __complex__(self): > """Return a builtin complex instance.""" > > def __bool__(self): > """True if self != 0.""" > return self != 0 > > @abstractproperty > def real(self): > """Retrieve the real component of this number. > > This should subclass Real. > """ > raise NotImplementedError > > @abstractproperty > def imag(self): > """Retrieve the real component of this number. > > This should subclass Real. > """ > raise NotImplementedError > > @abstractmethod > def __add__(self, other): > raise NotImplementedError > > @abstractmethod > def __radd__(self, other): > raise NotImplementedError > > @abstractmethod > def __neg__(self): > raise NotImplementedError > > def __pos__(self): > return self > > def __sub__(self, other): > return self + -other > > def __rsub__(self, other): > return -self + other > > @abstractmethod > def __mul__(self, other): > raise NotImplementedError > > @abstractmethod > def __rmul__(self, other): > raise NotImplementedError > > @abstractmethod > def __div__(self, other): > raise NotImplementedError > > @abstractmethod > def __rdiv__(self, other): > raise NotImplementedError > > @abstractmethod > def __pow__(self, exponent): > """Like division, a**b should promote to complex when necessary.""" > raise NotImplementedError > > @abstractmethod > def __rpow__(self, base): > raise NotImplementedError > > @abstractmethod > def __abs__(self): > """Returns the Real distance from 0.""" > raise NotImplementedError > > @abstractmethod > def conjugate(self): > """(x+y*i).conjugate() returns (x-y*i).""" > raise NotImplementedError > > @abstractmethod > def __eq__(self, other): > raise NotImplementedError > > def __ne__(self, other): > return not (self == other) > > > The ``Real`` ABC indicates that the value is on the real line, and > supports the operations of the ``float`` builtin. Real numbers are > totally ordered except for NaNs (which this PEP basically ignores). :: > > class Real(Complex): > """To Complex, Real adds the operations that work on real numbers. > > In short, those are: a conversion to float, trunc(), divmod, > %, <, <=, >, and >=. > > Real also provides defaults for the derived operations. > """ > > @abstractmethod > def __float__(self): > """Any Real can be converted to a native float object.""" > raise NotImplementedError > > @abstractmethod > def __trunc__(self): > """Truncates self to an Integral. > > Returns an Integral i such that: > * i>0 iff self>0 > * abs(i) <= abs(self). > """ > raise NotImplementedError > > def __divmod__(self, other): > """The pair (self // other, self % other). > > Sometimes this can be computed faster than the pair of > operations. > """ > return (self // other, self % other) > > def __rdivmod__(self, other): > """The pair (self // other, self % other). > > Sometimes this can be computed faster than the pair of > operations. > """ > return (other // self, other % self) > > @abstractmethod > def __floordiv__(self, other): > """The floor() of self/other. Integral.""" > raise NotImplementedError > > @abstractmethod > def __rfloordiv__(self, other): > """The floor() of other/self.""" > raise NotImplementedError > > @abstractmethod > def __mod__(self, other): > raise NotImplementedError > > @abstractmethod > def __rmod__(self, other): > raise NotImplementedError > > @abstractmethod > def __lt__(self, other): > """< on Reals defines a total ordering, except perhaps for NaN.""" > raise NotImplementedError > > @abstractmethod > def __le__(self, other): > raise NotImplementedError > > # Concrete implementations of Complex abstract methods. > > def __complex__(self): > return complex(float(self)) > > @property > def real(self): > return self > > @property > def imag(self): > return 0 > > def conjugate(self): > """Conjugate is a no-op for Reals.""" > return self > > > There is no built-in rational type, but it's straightforward to write, > so we provide an ABC for it. **Open issue**: Add Demo/classes/Rat.py > to the stdlib? :: > > class Rational(Real, Exact): > """.numerator and .denominator should be in lowest terms.""" > > @abstractproperty > def numerator(self): > raise NotImplementedError > > @abstractproperty > def denominator(self): > raise NotImplementedError > > # Concrete implementation of Real's conversion to float. > > def __float__(self): > return self.numerator / self.denominator > > > And finally integers:: > > class Integral(Rational): > """Integral adds a conversion to int and the bit-string operations.""" > > @abstractmethod > def __int__(self): > raise NotImplementedError > > def __index__(self): > return int(self) > > @abstractmethod > def __pow__(self, exponent, modulus): > """self ** exponent % modulus, but maybe faster. > > Implement this if you want to support the 3-argument version > of pow(). Otherwise, just implement the 2-argument version > described in Complex. Raise a TypeError if exponent < 0 or any > argument isn't Integral. > """ > raise NotImplementedError > > @abstractmethod > def __lshift__(self, other): > raise NotImplementedError > > @abstractmethod > def __rlshift__(self, other): > raise NotImplementedError > > @abstractmethod > def __rshift__(self, other): > raise NotImplementedError > > @abstractmethod > def __rrshift__(self, other): > raise NotImplementedError > > @abstractmethod > def __and__(self, other): > raise NotImplementedError > > @abstractmethod > def __rand__(self, other): > raise NotImplementedError > > @abstractmethod > def __xor__(self, other): > raise NotImplementedError > > @abstractmethod > def __rxor__(self, other): > raise NotImplementedError > > @abstractmethod > def __or__(self, other): > raise NotImplementedError > > @abstractmethod > def __ror__(self, other): > raise NotImplementedError > > @abstractmethod > def __invert__(self): > raise NotImplementedError > > # Concrete implementations of Rational and Real abstract methods. > > def __float__(self): > return float(int(self)) > > @property > def numerator(self): > return self > > @property > def denominator(self): > return 1 > > > Exact vs. Inexact Classes > ------------------------- > > Floating point values may not exactly obey several of the properties > you would expect. For example, it is possible for ``(X + -X) + 3 == > 3``, but ``X + (-X + 3) == 0``. On the range of values that most > functions deal with this isn't a problem, but it is something to be > aware of. > > Therefore, I define ``Exact`` and ``Inexact`` ABCs to mark whether > types have this problem. Every instance of ``Integral`` and > ``Rational`` should be Exact, but ``Reals`` and ``Complexes`` may or > may not be. (Do we really only need one of these, and the other is > defined as ``not`` the first?) :: > > class Exact(Number): pass > class Inexact(Number): pass > > > Changes to operations and __magic__ methods > ------------------------------------------- > > To support more precise narrowing from float to int (and more > generally, from Real to Integral), I'm proposing the following new > __magic__ methods, to be called from the corresponding library > functions. All of these return Integrals rather than Reals. > > 1. ``__trunc__(self)``, called from a new builtin ``trunc(x)``, which > returns the Integral closest to ``x`` between 0 and ``x``. > > 2. ``__floor__(self)``, called from ``math.floor(x)``, which returns > the greatest Integral ``<= x``. > > 3. ``__ceil__(self)``, called from ``math.ceil(x)``, which returns the > least Integral ``>= x``. > > 4. ``__round__(self)``, called from ``round(x)``, with returns the > Integral closest to ``x``, rounding half toward even. **Open > issue:** We could support the 2-argument version, but then we'd > only return an Integral if the second argument were ``<= 0``. > > 5. ``__properfraction__(self)``, called from a new function, > ``math.properfraction(x)``, which resembles C's ``modf()``: returns > a pair ``(n:Integral, r:Real)`` where ``x == n + r``, both ``n`` > and ``r`` have the same sign as ``x``, and ``abs(r) < 1``. **Open > issue:** Oh, we already have ``math.modf``. What name do we want > for this? Should we use divmod(x, 1) instead? > > Because the ``int()`` conversion from ``float`` is equivalent to but > less explicit than ``trunc()``, let's remove it. (Or, if that breaks > too much, just add a deprecation warning.) > > ``complex.__{divmod,mod,floordiv,int,float}__`` should also go > away. These should continue to raise ``TypeError`` to help confused > porters, but should not appear in ``help(complex)`` to avoid confusing > more people. **Open issue:** This is difficult to do with the > ``PyNumberMethods`` struct. What's the best way to accomplish it? > > > Notes for type implementors > --------------------------- > > Implementors should be careful to make equal numbers equal and > hash them to the same values. This may be subtle if there are two > different extensions of the real numbers. For example, a complex type > could reasonably implement hash() as follows:: > > def __hash__(self): > return hash(complex(self)) > > but should be careful of any values that fall outside of the built in > complex's range or precision. > > Adding More Numeric ABCs > ~~~~~~~~~~~~~~~~~~~~~~~~ > > There are, of course, more possible ABCs for numbers, and this would > be a poor hierarchy if it precluded the possibility of adding > those. You can add ``MyFoo`` between ``Complex`` and ``Real`` with:: > > class MyFoo(Complex): ... > MyFoo.register(Real) > > Implementing the arithmetic operations > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > We want to implement the arithmetic operations so that mixed-mode > operations either call an implementation whose author knew about the > types of both arguments, or convert both to the nearest built in type > and do the operation there. For subtypes of Integral, this means that > __add__ and __radd__ should be defined as:: > > class MyIntegral(Integral): > > def __add__(self, other): > if isinstance(other, MyIntegral): > return do_my_adding_stuff(self, other) > elif isinstance(other, OtherTypeIKnowAbout): > return do_my_other_adding_stuff(self, other) > else: > return NotImplemented > > def __radd__(self, other): > if isinstance(other, MyIntegral): > return do_my_adding_stuff(other, self) > elif isinstance(other, OtherTypeIKnowAbout): > return do_my_other_adding_stuff(other, self) > elif isinstance(other, Integral): > return int(other) + int(self) > elif isinstance(other, Real): > return float(other) + float(self) > elif isinstance(other, Complex): > return complex(other) + complex(self) > else: > return NotImplemented > > > There are 5 different cases for a mixed-type operation on subclasses > of Complex. I'll refer to all of the above code that doesn't refer to > MyIntegral and OtherTypeIKnowAbout as "boilerplate". ``a`` will be an > instance of ``A``, which is a subtype of ``Complex`` (``a : A <: > Complex``), and ``b : B <: Complex``. I'll consider ``a + b``: > > 1. If A defines an __add__ which accepts b, all is well. > 2. If A falls back to the boilerplate code, and it were to return > a value from __add__, we'd miss the possibility that B defines > a more intelligent __radd__, so the boilerplate should return > NotImplemented from __add__. (Or A may not implement __add__ at > all.) > 3. Then B's __radd__ gets a chance. If it accepts a, all is well. > 4. If it falls back to the boilerplate, there are no more possible > methods to try, so this is where the default implementation > should live. > 5. If B <: A, Python tries B.__radd__ before A.__add__. This is > ok, because it was implemented with knowledge of A, so it can > handle those instances before delegating to Complex. > > If ``A<:Complex`` and ``B<:Real`` without sharing any other knowledge, > then the appropriate shared operation is the one involving the built > in complex, and both __radd__s land there, so ``a+b == b+a``. > > > Rejected Alternatives > ===================== > > The initial version of this PEP defined an algebraic hierarchy > inspired by a Haskell Numeric Prelude [#numericprelude]_ including > MonoidUnderPlus, AdditiveGroup, Ring, and Field, and mentioned several > other possible algebraic types before getting to the numbers. I had > expected this to be useful to people using vectors and matrices, but > the NumPy community really wasn't interested, and we ran into the > issue that even if ``x`` is an instance of ``X <: MonoidUnderPlus`` > and ``y`` is an instance of ``Y <: MonoidUnderPlus``, ``x + y`` may > still not make sense. > > Then I gave the numbers a much more branching structure to include > things like the Gaussian Integers and Z/nZ, which could be Complex but > wouldn't necessarily support things like division. The community > decided that this was too much complication for Python, so I've now > scaled back the proposal to resemble the Scheme numeric tower much > more closely. > > > References > ========== > > .. [#pep3119] Introducing Abstract Base Classes > (http://www.python.org/dev/peps/pep-3119/) > > .. [#classtree] Possible Python 3K Class Tree?, wiki page created by > Bill Janssen > (http://wiki.python.org/moin/AbstractBaseClasses) > > .. [#numericprelude] NumericPrelude: An experimental alternative > hierarchy of numeric type classes > (http://darcs.haskell.org/numericprelude/docs/html/index.html) > > .. [#schemetower] The Scheme numerical tower > (http://www.swiss.ai.mit.edu/ftpdir/scheme-reports/r5rs-html/r5rs_8.html#SEC50) > > > Acknowledgements > ================ > > Thanks to Neil Norwitz for encouraging me to write this PEP in the > first place, to Travis Oliphant for pointing out that the numpy people > didn't really care about the algebraic concepts, to Alan Isaac for > reminding me that Scheme had already done this, and to Guido van > Rossum and lots of other people on the mailing list for refining the > concept. > > Copyright > ========= > > This document has been placed in the public domain. > > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > -- Namast?, Jeffrey Yasskin http://jeffrey.yasskin.info/ "Religion is an improper response to the Divine." ? "Skinny Legs and All", by Tom Robbins From skip at pobox.com Wed Aug 22 21:55:06 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 22 Aug 2007 14:55:06 -0500 Subject: [Python-3000] Str v. Unicode in C? Message-ID: <18124.38042.622272.863273@montanaro.dyndns.org> If I want to check an object for stringedness in py3k do I use PyString_Check or PyUnicode_Check? Thx, Skip From guido at python.org Wed Aug 22 21:57:32 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 22 Aug 2007 12:57:32 -0700 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy for Numbers In-Reply-To: <5d44f72f0708221236k7c3ea054k43eb237f4a3ef577@mail.gmail.com> References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> <5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com> <5d44f72f0708221236k7c3ea054k43eb237f4a3ef577@mail.gmail.com> Message-ID: On 8/22/07, Jeffrey Yasskin wrote: > There are still some open issues here that need answers: > > * Should __pos__ coerce the argument to be an instance of the type > it's defined on? Yes, I think so. That's what the built-in types do (in case the object is an instance of a subclass). It makes sense because all other operators do this too (unless overridden). > * Add Demo/classes/Rat.py to the stdlib? Yes, but it needs a makeover. At the very least I'd propose the module name to be rational. The code is really old. > * How many of __trunc__, __floor__, __ceil__, and __round__ should be > magic methods? I'm okay with all of these. > For __round__, when do we want to return an Integral? When the second argument is absent only. > [__properfraction__ is probably subsumed by divmod(x, 1).] Probably, but see PEP 3100, which still lists __mod__ and __divmod__ as to be deleted. > * How to give the removed methods (divmod, etc. on complex) good error > messages without having them show up in help(complex)? If Complex doesn't define them, they'll be TypeErrors, and that's good enough IMO. > I'll look into this during the sprint. > > On 8/2/07, Jeffrey Yasskin wrote: > > After some more discussion, I have another version of the PEP with a > > draft, partial implementation. Let me know what you think. > > > > > > > > PEP: 3141 > > Title: A Type Hierarchy for Numbers > > Version: $Revision: 56646 $ > > Last-Modified: $Date: 2007-08-01 10:11:55 -0700 (Wed, 01 Aug 2007) $ > > Author: Jeffrey Yasskin > > Status: Draft > > Type: Standards Track > > Content-Type: text/x-rst > > Created: 23-Apr-2007 > > Post-History: 25-Apr-2007, 16-May-2007, 02-Aug-2007 > > > > > > Abstract > > ======== > > > > This proposal defines a hierarchy of Abstract Base Classes (ABCs) (PEP > > 3119) to represent number-like classes. It proposes a hierarchy of > > ``Number :> Complex :> Real :> Rational :> Integral`` where ``A :> B`` > > means "A is a supertype of B", and a pair of ``Exact``/``Inexact`` > > classes to capture the difference between ``floats`` and > > ``ints``. These types are significantly inspired by Scheme's numeric > > tower [#schemetower]_. > > > > Rationale > > ========= > > > > Functions that take numbers as arguments should be able to determine > > the properties of those numbers, and if and when overloading based on > > types is added to the language, should be overloadable based on the > > types of the arguments. For example, slicing requires its arguments to > > be ``Integrals``, and the functions in the ``math`` module require > > their arguments to be ``Real``. > > > > Specification > > ============= > > > > This PEP specifies a set of Abstract Base Classes, and suggests a > > general strategy for implementing some of the methods. It uses > > terminology from PEP 3119, but the hierarchy is intended to be > > meaningful for any systematic method of defining sets of classes. > > > > The type checks in the standard library should use these classes > > instead of the concrete built-ins. > > > > > > Numeric Classes > > --------------- > > > > We begin with a Number class to make it easy for people to be fuzzy > > about what kind of number they expect. This class only helps with > > overloading; it doesn't provide any operations. :: > > > > class Number(metaclass=ABCMeta): pass > > > > > > Most implementations of complex numbers will be hashable, but if you > > need to rely on that, you'll have to check it explicitly: mutable > > numbers are supported by this hierarchy. **Open issue:** Should > > __pos__ coerce the argument to be an instance of the type it's defined > > on? Why do the builtins do this? :: > > > > class Complex(Number): > > """Complex defines the operations that work on the builtin complex type. > > > > In short, those are: a conversion to complex, .real, .imag, +, -, > > *, /, abs(), .conjugate, ==, and !=. > > > > If it is given heterogenous arguments, and doesn't have special > > knowledge about them, it should fall back to the builtin complex > > type as described below. > > """ > > > > @abstractmethod > > def __complex__(self): > > """Return a builtin complex instance.""" > > > > def __bool__(self): > > """True if self != 0.""" > > return self != 0 > > > > @abstractproperty > > def real(self): > > """Retrieve the real component of this number. > > > > This should subclass Real. > > """ > > raise NotImplementedError > > > > @abstractproperty > > def imag(self): > > """Retrieve the real component of this number. > > > > This should subclass Real. > > """ > > raise NotImplementedError > > > > @abstractmethod > > def __add__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __radd__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __neg__(self): > > raise NotImplementedError > > > > def __pos__(self): > > return self > > > > def __sub__(self, other): > > return self + -other > > > > def __rsub__(self, other): > > return -self + other > > > > @abstractmethod > > def __mul__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __rmul__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __div__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __rdiv__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __pow__(self, exponent): > > """Like division, a**b should promote to complex when necessary.""" > > raise NotImplementedError > > > > @abstractmethod > > def __rpow__(self, base): > > raise NotImplementedError > > > > @abstractmethod > > def __abs__(self): > > """Returns the Real distance from 0.""" > > raise NotImplementedError > > > > @abstractmethod > > def conjugate(self): > > """(x+y*i).conjugate() returns (x-y*i).""" > > raise NotImplementedError > > > > @abstractmethod > > def __eq__(self, other): > > raise NotImplementedError > > > > def __ne__(self, other): > > return not (self == other) > > > > > > The ``Real`` ABC indicates that the value is on the real line, and > > supports the operations of the ``float`` builtin. Real numbers are > > totally ordered except for NaNs (which this PEP basically ignores). :: > > > > class Real(Complex): > > """To Complex, Real adds the operations that work on real numbers. > > > > In short, those are: a conversion to float, trunc(), divmod, > > %, <, <=, >, and >=. > > > > Real also provides defaults for the derived operations. > > """ > > > > @abstractmethod > > def __float__(self): > > """Any Real can be converted to a native float object.""" > > raise NotImplementedError > > > > @abstractmethod > > def __trunc__(self): > > """Truncates self to an Integral. > > > > Returns an Integral i such that: > > * i>0 iff self>0 > > * abs(i) <= abs(self). > > """ > > raise NotImplementedError > > > > def __divmod__(self, other): > > """The pair (self // other, self % other). > > > > Sometimes this can be computed faster than the pair of > > operations. > > """ > > return (self // other, self % other) > > > > def __rdivmod__(self, other): > > """The pair (self // other, self % other). > > > > Sometimes this can be computed faster than the pair of > > operations. > > """ > > return (other // self, other % self) > > > > @abstractmethod > > def __floordiv__(self, other): > > """The floor() of self/other. Integral.""" > > raise NotImplementedError > > > > @abstractmethod > > def __rfloordiv__(self, other): > > """The floor() of other/self.""" > > raise NotImplementedError > > > > @abstractmethod > > def __mod__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __rmod__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __lt__(self, other): > > """< on Reals defines a total ordering, except perhaps for NaN.""" > > raise NotImplementedError > > > > @abstractmethod > > def __le__(self, other): > > raise NotImplementedError > > > > # Concrete implementations of Complex abstract methods. > > > > def __complex__(self): > > return complex(float(self)) > > > > @property > > def real(self): > > return self > > > > @property > > def imag(self): > > return 0 > > > > def conjugate(self): > > """Conjugate is a no-op for Reals.""" > > return self > > > > > > There is no built-in rational type, but it's straightforward to write, > > so we provide an ABC for it. **Open issue**: Add Demo/classes/Rat.py > > to the stdlib? :: > > > > class Rational(Real, Exact): > > """.numerator and .denominator should be in lowest terms.""" > > > > @abstractproperty > > def numerator(self): > > raise NotImplementedError > > > > @abstractproperty > > def denominator(self): > > raise NotImplementedError > > > > # Concrete implementation of Real's conversion to float. > > > > def __float__(self): > > return self.numerator / self.denominator > > > > > > And finally integers:: > > > > class Integral(Rational): > > """Integral adds a conversion to int and the bit-string operations.""" > > > > @abstractmethod > > def __int__(self): > > raise NotImplementedError > > > > def __index__(self): > > return int(self) > > > > @abstractmethod > > def __pow__(self, exponent, modulus): > > """self ** exponent % modulus, but maybe faster. > > > > Implement this if you want to support the 3-argument version > > of pow(). Otherwise, just implement the 2-argument version > > described in Complex. Raise a TypeError if exponent < 0 or any > > argument isn't Integral. > > """ > > raise NotImplementedError > > > > @abstractmethod > > def __lshift__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __rlshift__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __rshift__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __rrshift__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __and__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __rand__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __xor__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __rxor__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __or__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __ror__(self, other): > > raise NotImplementedError > > > > @abstractmethod > > def __invert__(self): > > raise NotImplementedError > > > > # Concrete implementations of Rational and Real abstract methods. > > > > def __float__(self): > > return float(int(self)) > > > > @property > > def numerator(self): > > return self > > > > @property > > def denominator(self): > > return 1 > > > > > > Exact vs. Inexact Classes > > ------------------------- > > > > Floating point values may not exactly obey several of the properties > > you would expect. For example, it is possible for ``(X + -X) + 3 == > > 3``, but ``X + (-X + 3) == 0``. On the range of values that most > > functions deal with this isn't a problem, but it is something to be > > aware of. > > > > Therefore, I define ``Exact`` and ``Inexact`` ABCs to mark whether > > types have this problem. Every instance of ``Integral`` and > > ``Rational`` should be Exact, but ``Reals`` and ``Complexes`` may or > > may not be. (Do we really only need one of these, and the other is > > defined as ``not`` the first?) :: > > > > class Exact(Number): pass > > class Inexact(Number): pass > > > > > > Changes to operations and __magic__ methods > > ------------------------------------------- > > > > To support more precise narrowing from float to int (and more > > generally, from Real to Integral), I'm proposing the following new > > __magic__ methods, to be called from the corresponding library > > functions. All of these return Integrals rather than Reals. > > > > 1. ``__trunc__(self)``, called from a new builtin ``trunc(x)``, which > > returns the Integral closest to ``x`` between 0 and ``x``. > > > > 2. ``__floor__(self)``, called from ``math.floor(x)``, which returns > > the greatest Integral ``<= x``. > > > > 3. ``__ceil__(self)``, called from ``math.ceil(x)``, which returns the > > least Integral ``>= x``. > > > > 4. ``__round__(self)``, called from ``round(x)``, with returns the > > Integral closest to ``x``, rounding half toward even. **Open > > issue:** We could support the 2-argument version, but then we'd > > only return an Integral if the second argument were ``<= 0``. > > > > 5. ``__properfraction__(self)``, called from a new function, > > ``math.properfraction(x)``, which resembles C's ``modf()``: returns > > a pair ``(n:Integral, r:Real)`` where ``x == n + r``, both ``n`` > > and ``r`` have the same sign as ``x``, and ``abs(r) < 1``. **Open > > issue:** Oh, we already have ``math.modf``. What name do we want > > for this? Should we use divmod(x, 1) instead? > > > > Because the ``int()`` conversion from ``float`` is equivalent to but > > less explicit than ``trunc()``, let's remove it. (Or, if that breaks > > too much, just add a deprecation warning.) > > > > ``complex.__{divmod,mod,floordiv,int,float}__`` should also go > > away. These should continue to raise ``TypeError`` to help confused > > porters, but should not appear in ``help(complex)`` to avoid confusing > > more people. **Open issue:** This is difficult to do with the > > ``PyNumberMethods`` struct. What's the best way to accomplish it? > > > > > > Notes for type implementors > > --------------------------- > > > > Implementors should be careful to make equal numbers equal and > > hash them to the same values. This may be subtle if there are two > > different extensions of the real numbers. For example, a complex type > > could reasonably implement hash() as follows:: > > > > def __hash__(self): > > return hash(complex(self)) > > > > but should be careful of any values that fall outside of the built in > > complex's range or precision. > > > > Adding More Numeric ABCs > > ~~~~~~~~~~~~~~~~~~~~~~~~ > > > > There are, of course, more possible ABCs for numbers, and this would > > be a poor hierarchy if it precluded the possibility of adding > > those. You can add ``MyFoo`` between ``Complex`` and ``Real`` with:: > > > > class MyFoo(Complex): ... > > MyFoo.register(Real) > > > > Implementing the arithmetic operations > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > We want to implement the arithmetic operations so that mixed-mode > > operations either call an implementation whose author knew about the > > types of both arguments, or convert both to the nearest built in type > > and do the operation there. For subtypes of Integral, this means that > > __add__ and __radd__ should be defined as:: > > > > class MyIntegral(Integral): > > > > def __add__(self, other): > > if isinstance(other, MyIntegral): > > return do_my_adding_stuff(self, other) > > elif isinstance(other, OtherTypeIKnowAbout): > > return do_my_other_adding_stuff(self, other) > > else: > > return NotImplemented > > > > def __radd__(self, other): > > if isinstance(other, MyIntegral): > > return do_my_adding_stuff(other, self) > > elif isinstance(other, OtherTypeIKnowAbout): > > return do_my_other_adding_stuff(other, self) > > elif isinstance(other, Integral): > > return int(other) + int(self) > > elif isinstance(other, Real): > > return float(other) + float(self) > > elif isinstance(other, Complex): > > return complex(other) + complex(self) > > else: > > return NotImplemented > > > > > > There are 5 different cases for a mixed-type operation on subclasses > > of Complex. I'll refer to all of the above code that doesn't refer to > > MyIntegral and OtherTypeIKnowAbout as "boilerplate". ``a`` will be an > > instance of ``A``, which is a subtype of ``Complex`` (``a : A <: > > Complex``), and ``b : B <: Complex``. I'll consider ``a + b``: > > > > 1. If A defines an __add__ which accepts b, all is well. > > 2. If A falls back to the boilerplate code, and it were to return > > a value from __add__, we'd miss the possibility that B defines > > a more intelligent __radd__, so the boilerplate should return > > NotImplemented from __add__. (Or A may not implement __add__ at > > all.) > > 3. Then B's __radd__ gets a chance. If it accepts a, all is well. > > 4. If it falls back to the boilerplate, there are no more possible > > methods to try, so this is where the default implementation > > should live. > > 5. If B <: A, Python tries B.__radd__ before A.__add__. This is > > ok, because it was implemented with knowledge of A, so it can > > handle those instances before delegating to Complex. > > > > If ``A<:Complex`` and ``B<:Real`` without sharing any other knowledge, > > then the appropriate shared operation is the one involving the built > > in complex, and both __radd__s land there, so ``a+b == b+a``. > > > > > > Rejected Alternatives > > ===================== > > > > The initial version of this PEP defined an algebraic hierarchy > > inspired by a Haskell Numeric Prelude [#numericprelude]_ including > > MonoidUnderPlus, AdditiveGroup, Ring, and Field, and mentioned several > > other possible algebraic types before getting to the numbers. I had > > expected this to be useful to people using vectors and matrices, but > > the NumPy community really wasn't interested, and we ran into the > > issue that even if ``x`` is an instance of ``X <: MonoidUnderPlus`` > > and ``y`` is an instance of ``Y <: MonoidUnderPlus``, ``x + y`` may > > still not make sense. > > > > Then I gave the numbers a much more branching structure to include > > things like the Gaussian Integers and Z/nZ, which could be Complex but > > wouldn't necessarily support things like division. The community > > decided that this was too much complication for Python, so I've now > > scaled back the proposal to resemble the Scheme numeric tower much > > more closely. > > > > > > References > > ========== > > > > .. [#pep3119] Introducing Abstract Base Classes > > (http://www.python.org/dev/peps/pep-3119/) > > > > .. [#classtree] Possible Python 3K Class Tree?, wiki page created by > > Bill Janssen > > (http://wiki.python.org/moin/AbstractBaseClasses) > > > > .. [#numericprelude] NumericPrelude: An experimental alternative > > hierarchy of numeric type classes > > (http://darcs.haskell.org/numericprelude/docs/html/index.html) > > > > .. [#schemetower] The Scheme numerical tower > > (http://www.swiss.ai.mit.edu/ftpdir/scheme-reports/r5rs-html/r5rs_8.html#SEC50) > > > > > > Acknowledgements > > ================ > > > > Thanks to Neil Norwitz for encouraging me to write this PEP in the > > first place, to Travis Oliphant for pointing out that the numpy people > > didn't really care about the algebraic concepts, to Alan Isaac for > > reminding me that Scheme had already done this, and to Guido van > > Rossum and lots of other people on the mailing list for refining the > > concept. > > > > Copyright > > ========= > > > > This document has been placed in the public domain. > > > > > > > > .. > > Local Variables: > > mode: indented-text > > indent-tabs-mode: nil > > sentence-end-double-space: t > > fill-column: 70 > > coding: utf-8 > > End: > > > > > > > -- > Namast?, > Jeffrey Yasskin > http://jeffrey.yasskin.info/ > > "Religion is an improper response to the Divine." ? "Skinny Legs and > All", by Tom Robbins > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Wed Aug 22 22:28:09 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 22 Aug 2007 22:28:09 +0200 Subject: [Python-3000] Str v. Unicode in C? In-Reply-To: <18124.38042.622272.863273@montanaro.dyndns.org> References: <18124.38042.622272.863273@montanaro.dyndns.org> Message-ID: <46CC9C59.1000306@v.loewis.de> skip at pobox.com schrieb: > If I want to check an object for stringedness in py3k do I use > PyString_Check or PyUnicode_Check? In the medium term, you should use PyUnicode_Check. In the short term, additionally, do PyString_Check as well if you want to support str8 (your choice). In the long term, it might be that PyUnicode_Check gets renamed to PyString_Check, provided that str8 is removed from the code base. Regards, Martin From rrr at ronadam.com Thu Aug 23 00:43:03 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 22 Aug 2007 17:43:03 -0500 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46CC68EB.2030609@trueblade.com> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> <46CC68EB.2030609@trueblade.com> Message-ID: <46CCBBF7.3060201@ronadam.com> Eric Smith wrote: > Eric Smith wrote: >> Talin wrote: >>> A new version is up, incorporating material from the various discussions >>> on this list: >>> >>> http://www.python.org/dev/peps/pep-3101/ >> self.assertEquals('{0[{1}]}'.format('abcdefg', 4), 'e') >> self.assertEquals('{foo[{bar}]}'.format(foo='abcdefg', bar=4), 'e') > > I've been re-reading the PEP, in an effort to make sure everything is > working. I realized that these tests should not pass. The PEP says > that "Format specifiers can themselves contain replacement fields". The > tests above have replacement fields in the field name, which is not > allowed. I'm going to remove this functionality. > > I believe the intent is to support a replacement for: > "%.*s" % (4, 'how now brown cow') > > Which would be: > "{0:.{1}}".format('how now brown cow', 4) > > For this, there's no need for replacement on field name. I've taken it > out of the code, and made these tests in to errors. > > Eric. I think it should work myself, but it could be added back in later if there is a need to. I'm still concerned about the choice of {{ and }} as escaped brackets. What does the following do? "{0:{{^{1}}".format('Python', '12') "{{{{Python}}}}" "{{{0:{{^{1}}}}".format('Python', '12') "{{{{{Python}}}}}" class ShowSpec(str): def __format__(self, spec): return spec ShowSpec("{0:{{{1}}}}").format('abc', 'xyz') "{{xyz}}" "{0}".format('{value:{{^{width}}', width='10', value='Python') "{{Python}}" _RON From john.reese at gmail.com Thu Aug 23 00:43:35 2007 From: john.reese at gmail.com (John Reese) Date: Wed, 22 Aug 2007 15:43:35 -0700 Subject: [Python-3000] proposed fix for test_xmlrpc.py in py3k Message-ID: Good afternoon. I'm in the Google Python Sprint working on getting the test_xmlrpc unittest to pass. The following patch was prepared by Jacques Frechet and me. We'd appreciate feedback on the attached patch. What was broken: 1. BaseHTTPServer attempts to parse the http headers with an rfc822.Message class. This was changed in r56905 by Jeremy Hylton to use the new io library instead of stringio as before. Unfortunately Jeremy's change resulted in TextIOWrapper stealing part of the HTTP request body, due to its buffering quantum. This was not seen in normal tests because GET requests have no body, but xmlrpc uses POSTs. We fixed this by doing the equivalent of what was done before, but using io.StringIO instead of the old cStringIO class: we pull out just the header using a sequence of readlines. 2. Once this was fixed, a second error asserted: test_xmlrpc.test_with{_no,}_info call .get on the headers object from xmlrpclib.ProtocolError. This fails because the headers object became a list in r57194. The story behind this is somewhat complicated: - xmlrpclib used to use httplib.HTTP, which is old and deprecated - r57024 Jeremy Hylton switched py3k to use more modern httplib infrastructure, but broke xmlrpclib.Transport.request; the "headers" variable was now referenced without being set - r57194 Hyeshik Chang fixed xmlrpclib.Transport.request to get the headers in a way that didn't explode; unfortunately, it now returned a list instead of a dict, but there were no tests to catch this - r57221 Guido integrated xmlrpc changes from the trunk, including r57158, which added tests that relied on headers being a dict. Unfortunately, it no longer was. 3. test_xmlrpc.test_fail_with_info was failing because the ValueError string of int('nonintegralstring') in py3k currently has an "s". This is presumably going away soon; the test now uses a regular expression with an optional leading "s", which is a little silly, but r56209 is prior art. >>> int('z') Traceback (most recent call last): File "", line 1, in ValueError: invalid literal for int() with base 10: s'z' -------------- next part -------------- A non-text attachment was scrubbed... Name: xmlrpc.patch Type: application/octet-stream Size: 3372 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070822/aa87a546/attachment.obj From python3now at gmail.com Thu Aug 23 01:23:07 2007 From: python3now at gmail.com (James Thiele) Date: Wed, 22 Aug 2007 16:23:07 -0700 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46C2809C.3000806@acm.org> References: <46C2809C.3000806@acm.org> Message-ID: <8f01efd00708221623w1694fc18if330452c64e76bea@mail.gmail.com> In the section "Explicit Conversion Flag" of PEP 3101 it says: Currently, two explicit conversion flags are recognized: !r - convert the value to a string using repr(). !s - convert the value to a string using str(). -- It does not say what action is taken if an unrecognized explicit conversion flag is found. Later in the PEP the pseudocode for vformat() silently ignores the case of an unrecognized explicit conversion flag. This seems unPythonic to me but if this is the desired behavior please make it clear in the PEP. On 8/14/07, Talin wrote: > A new version is up, incorporating material from the various discussions > on this list: > > http://www.python.org/dev/peps/pep-3101/ > > Diffs are here: > > http://svn.python.org/view/peps/trunk/pep-3101.txt?rev=57044&r1=56535&r2=57044 > > > -- Talin > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/python3now%40gmail.com > From guido at python.org Thu Aug 23 01:42:39 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 22 Aug 2007 16:42:39 -0700 Subject: [Python-3000] proposed fix for test_xmlrpc.py in py3k In-Reply-To: References: Message-ID: Thanks! I've checked the bulk of this in, excepting the fix for #3, which I fixed at the source in longobject.c. Also, I changed the call to io.StringIO() to first convert the bytes to characters, using the same encoding as used for the HTTP request header line (Latin-1). --Guido On 8/22/07, John Reese wrote: > Good afternoon. I'm in the Google Python Sprint working on getting > the test_xmlrpc unittest to pass. The following patch was prepared by > Jacques Frechet and me. We'd appreciate feedback on the attached > patch. > > What was broken: > > > 1. BaseHTTPServer attempts to parse the http headers with an > rfc822.Message class. This was changed in r56905 by Jeremy Hylton to > use the new io library instead of stringio as before. Unfortunately > Jeremy's change resulted in TextIOWrapper stealing part of the HTTP > request body, due to its buffering quantum. This was not seen in > normal tests because GET requests have no body, but xmlrpc uses POSTs. > We fixed this by doing the equivalent of what was done before, but > using io.StringIO instead of the old cStringIO class: we pull out just > the header using a sequence of readlines. > > > 2. Once this was fixed, a second error asserted: > test_xmlrpc.test_with{_no,}_info call .get on the headers object from > xmlrpclib.ProtocolError. This fails because the headers object became > a list in r57194. The story behind this is somewhat complicated: > - xmlrpclib used to use httplib.HTTP, which is old and deprecated > - r57024 Jeremy Hylton switched py3k to use more modern httplib > infrastructure, but broke xmlrpclib.Transport.request; the "headers" > variable was now referenced without being set > - r57194 Hyeshik Chang fixed xmlrpclib.Transport.request to get the > headers in a way that didn't explode; unfortunately, it now returned a > list instead of a dict, but there were no tests to catch this > - r57221 Guido integrated xmlrpc changes from the trunk, including > r57158, which added tests that relied on headers being a dict. > Unfortunately, it no longer was. > > > 3. test_xmlrpc.test_fail_with_info was failing because the ValueError > string of int('nonintegralstring') in py3k currently has an "s". This > is presumably going away soon; the test now uses a regular expression > with an optional leading "s", which is a little silly, but r56209 is > prior art. > > >>> int('z') > Traceback (most recent call last): > File "", line 1, in > ValueError: invalid literal for int() with base 10: s'z' > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From eric+python-dev at trueblade.com Thu Aug 23 01:46:34 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 22 Aug 2007 19:46:34 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <8f01efd00708221623w1694fc18if330452c64e76bea@mail.gmail.com> References: <46C2809C.3000806@acm.org> <8f01efd00708221623w1694fc18if330452c64e76bea@mail.gmail.com> Message-ID: <46CCCADA.1060206@trueblade.com> James Thiele wrote: > In the section "Explicit Conversion Flag" of PEP 3101 it says: > > Currently, two explicit conversion flags are recognized: > > !r - convert the value to a string using repr(). > !s - convert the value to a string using str(). > -- > It does not say what action is taken if an unrecognized explicit > conversion flag is found. My implementation raises a ValueError, which I think is the desired behavior: >>> "{0!x}".format(1) Traceback (most recent call last): File "", line 1, in ValueError: Unknown converion specifier x I agree the PEP should be explicit about this. From guido at python.org Thu Aug 23 01:52:20 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 22 Aug 2007 16:52:20 -0700 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46CCCADA.1060206@trueblade.com> References: <46C2809C.3000806@acm.org> <8f01efd00708221623w1694fc18if330452c64e76bea@mail.gmail.com> <46CCCADA.1060206@trueblade.com> Message-ID: On 8/22/07, Eric Smith wrote: > James Thiele wrote: > > In the section "Explicit Conversion Flag" of PEP 3101 it says: > > > > Currently, two explicit conversion flags are recognized: > > > > !r - convert the value to a string using repr(). > > !s - convert the value to a string using str(). > > -- > > It does not say what action is taken if an unrecognized explicit > > conversion flag is found. > > My implementation raises a ValueError, which I think is the desired > behavior: > > >>> "{0!x}".format(1) > Traceback (most recent call last): > File "", line 1, in > ValueError: Unknown converion specifier x You raise ValueErrors for other errors with the format, right? If there's a reason to be more lenient, the best approach would probably be to interpret it as !r. > I agree the PEP should be explicit about this. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From eric+python-dev at trueblade.com Thu Aug 23 02:00:57 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 22 Aug 2007 20:00:57 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: References: <46C2809C.3000806@acm.org> <8f01efd00708221623w1694fc18if330452c64e76bea@mail.gmail.com> <46CCCADA.1060206@trueblade.com> Message-ID: <46CCCE39.5060301@trueblade.com> Guido van Rossum wrote: > On 8/22/07, Eric Smith wrote: >> James Thiele wrote: >>> In the section "Explicit Conversion Flag" of PEP 3101 it says: >>> >>> Currently, two explicit conversion flags are recognized: >>> >>> !r - convert the value to a string using repr(). >>> !s - convert the value to a string using str(). >>> -- >>> It does not say what action is taken if an unrecognized explicit >>> conversion flag is found. >> My implementation raises a ValueError, which I think is the desired >> behavior: >> >> >>> "{0!x}".format(1) >> Traceback (most recent call last): >> File "", line 1, in >> ValueError: Unknown converion specifier x > > You raise ValueErrors for other errors with the format, right? If > there's a reason to be more lenient, the best approach would probably > be to interpret it as !r. Yes, ValueError gets raised for other errors with the format specifiers. My concern is that if we silently treat unknown conversion specifiers as !r, we can't add other specifiers in the future without breaking existing code. From mierle at gmail.com Thu Aug 23 01:58:28 2007 From: mierle at gmail.com (Keir Mierle) Date: Wed, 22 Aug 2007 16:58:28 -0700 Subject: [Python-3000] [PATCH] Fix math.ceil() behaviour for PEP3141 Message-ID: The attached patch fixes math.ceil to delegate to x.__ceil__() if it is defined, according to PEP 3141, and adds tests to cover the new cases. Patch is against r57303. No new test failures are introduced. Keir -------------- next part -------------- A non-text attachment was scrubbed... Name: ceil.diff Type: text/x-patch Size: 1880 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070822/6654ccee/attachment-0001.bin From eric+python-dev at trueblade.com Thu Aug 23 02:10:11 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 22 Aug 2007 20:10:11 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46CCBBF7.3060201@ronadam.com> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> <46CC68EB.2030609@trueblade.com> <46CCBBF7.3060201@ronadam.com> Message-ID: <46CCD063.2060007@trueblade.com> Ron Adam wrote: >> I've been re-reading the PEP, in an effort to make sure everything is >> working. I realized that these tests should not pass. The PEP says >> that "Format specifiers can themselves contain replacement fields". >> The tests above have replacement fields in the field name, which is >> not allowed. I'm going to remove this functionality. >> >> I believe the intent is to support a replacement for: >> "%.*s" % (4, 'how now brown cow') >> >> Which would be: >> "{0:.{1}}".format('how now brown cow', 4) >> >> For this, there's no need for replacement on field name. I've taken >> it out of the code, and made these tests in to errors. > > I think it should work myself, but it could be added back in later if > there is a need to. > > > I'm still concerned about the choice of {{ and }} as escaped brackets. > > What does the following do? > > > "{0:{{^{1}}".format('Python', '12') >>> "{0:{{^{1}}".format('Python', '12') Traceback (most recent call last): File "", line 1, in ValueError: unterminated replacement field But: >>> "{0:^{1}}".format('Python', '12') ' Python ' > "{{{0:{{^{1}}}}".format('Python', '12') >>> "{{{0:{{^{1}}}}".format('Python', '12') Traceback (most recent call last): File "", line 1, in ValueError: Unknown conversion type } But, >>> "{{{0:^{1}}".format('Python', '12') '{ Python ' > class ShowSpec(str): > > return spec > > ShowSpec("{0:{{{1}}}}").format('abc', 'xyz') > >>> ShowSpec("{0:{{{1}}}}").format('abc', 'xyz') Traceback (most recent call last): File "", line 1, in ValueError: Invalid conversion specification I think you mean: ShowSpec("{0:{1}}").format('abc', 'xyz') But I have some error with that. I'm looking into it. > "{0}".format('{value:{{^{width}}', width='10', value='Python') >>> "{0}".format('{value:{{^{width}}', width='10', value='Python') '{value:{{^{width}}' From greg at electricrain.com Thu Aug 23 01:59:29 2007 From: greg at electricrain.com (Gregory P. Smith) Date: Wed, 22 Aug 2007 16:59:29 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46B7FACC.8030503@v.loewis.de> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> Message-ID: <20070822235929.GA12780@electricrain.com> On Tue, Aug 07, 2007 at 06:53:32AM +0200, "Martin v. L?wis" wrote: > > I guess we have to rethink our use of these databases somewhat. > > Ok. In the interest of progress, I'll be looking at coming up with > some fixes for the code base right now; as we agree that the > underlying semantics is bytes:bytes, any encoding wrappers on > top of it can be added later. The underlying Modules/_bsddb.c today uses PyArg_Parse(..., "s#", ...) which if i read Python/getargs.c correctly is very lenient on the input types it accepts. It appears to accept anything with a buffer API, auto-converting unicode to the default encoding as needed. IMHO all of that is desirable in many situations but it is not strict. bytes:bytes or int:bytes (depending on the database type) are fundamentally all the C berkeleydb library knows. Attaching meaning to the keys and values is up to the user. I'm about to try a _bsddb.c that strictly enforces bytes as values for the underlying bsddb.db API provided by _bsddb in my sandbox under the assumption that being strict about bytes is desired at that level there. I predict lots of Lib/bsddb/test/ edits. > My concern is that people need to access existing databases. It's > all fine that the code accessing them breaks, and that they have > to actively port to Py3k. However, telling them that they have to > represent the keys in their dbm disk files in a different manner > might cause a revolt... agreed. thus the importance of allowing bytes:bytes. From greg at electricrain.com Thu Aug 23 02:41:55 2007 From: greg at electricrain.com (Gregory P. Smith) Date: Wed, 22 Aug 2007 17:41:55 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: <20070823004155.GB12780@electricrain.com> > > There are currently about 7 failing unit tests left: > > > > test_bsddb > > test_bsddb3 ... fyi these two pass for me on the current py3k branch on ubuntu linux and mac os x 10.4.9. -greg From rrr at ronadam.com Thu Aug 23 03:08:35 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 22 Aug 2007 20:08:35 -0500 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46CCD063.2060007@trueblade.com> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> <46CC68EB.2030609@trueblade.com> <46CCBBF7.3060201@ronadam.com> <46CCD063.2060007@trueblade.com> Message-ID: <46CCDE13.60502@ronadam.com> Eric Smith wrote: > Ron Adam wrote: >>> I've been re-reading the PEP, in an effort to make sure everything is >>> working. I realized that these tests should not pass. The PEP says >>> that "Format specifiers can themselves contain replacement fields". >>> The tests above have replacement fields in the field name, which is >>> not allowed. I'm going to remove this functionality. >>> >>> I believe the intent is to support a replacement for: >>> "%.*s" % (4, 'how now brown cow') >>> >>> Which would be: >>> "{0:.{1}}".format('how now brown cow', 4) >>> >>> For this, there's no need for replacement on field name. I've taken >>> it out of the code, and made these tests in to errors. >> >> I think it should work myself, but it could be added back in later if >> there is a need to. >> >> >> I'm still concerned about the choice of {{ and }} as escaped brackets. >> >> What does the following do? >> >> >> "{0:{{^{1}}".format('Python', '12') > > >>> "{0:{{^{1}}".format('Python', '12') > Traceback (most recent call last): > File "", line 1, in > ValueError: unterminated replacement field When are the "{{" and "}}" escape characters replaced with '{' and '}'? > But: > >>> "{0:^{1}}".format('Python', '12') > ' Python ' > >> "{{{0:{{^{1}}}}".format('Python', '12') > >>> "{{{0:{{^{1}}}}".format('Python', '12') > Traceback (most recent call last): > File "", line 1, in > ValueError: Unknown conversion type } > > But, > >>> "{{{0:^{1}}".format('Python', '12') > '{ Python ' So escaping '{' with '{{' and '}' with '}}' doesn't work inside of format expressions? That would mean there is no way to pass a brace to a __format__ method. >> class ShowSpec(str): >> >> return spec >> >> ShowSpec("{0:{{{1}}}}").format('abc', 'xyz') >> > > >>> ShowSpec("{0:{{{1}}}}").format('abc', 'xyz') > Traceback (most recent call last): > File "", line 1, in > ValueError: Invalid conversion specification > I think you mean: > ShowSpec("{0:{1}}").format('abc', 'xyz') No, because you may need to be able to pass the '{' and '}' character to the format specifier in some way. The standard specifiers don't use them, but custom specifiers may need them. > But I have some error with that. I'm looking into it. > >> "{0}".format('{value:{{^{width}}', width='10', value='Python') > > >>> "{0}".format('{value:{{^{width}}', width='10', value='Python') > '{value:{{^{width}}' Depending on weather or not the evaluation is recursive this may or may not be correct. I think it's actually easier to do it recursively and not put limits on where format specifiers can be used or not. _RON From eric+python-dev at trueblade.com Thu Aug 23 03:33:19 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 22 Aug 2007 21:33:19 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46CCDE13.60502@ronadam.com> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> <46CC68EB.2030609@trueblade.com> <46CCBBF7.3060201@ronadam.com> <46CCD063.2060007@trueblade.com> <46CCDE13.60502@ronadam.com> Message-ID: <46CCE3DF.1090004@trueblade.com> Ron Adam wrote: > > > Eric Smith wrote: >> Ron Adam wrote: >>>> I've been re-reading the PEP, in an effort to make sure everything >>>> is working. I realized that these tests should not pass. The PEP >>>> says that "Format specifiers can themselves contain replacement >>>> fields". The tests above have replacement fields in the field name, >>>> which is not allowed. I'm going to remove this functionality. >>>> >>>> I believe the intent is to support a replacement for: >>>> "%.*s" % (4, 'how now brown cow') >>>> >>>> Which would be: >>>> "{0:.{1}}".format('how now brown cow', 4) >>>> >>>> For this, there's no need for replacement on field name. I've taken >>>> it out of the code, and made these tests in to errors. >>> >>> I think it should work myself, but it could be added back in later if >>> there is a need to. >>> >>> >>> I'm still concerned about the choice of {{ and }} as escaped brackets. >>> >>> What does the following do? >>> >>> >>> "{0:{{^{1}}".format('Python', '12') >> >> >>> "{0:{{^{1}}".format('Python', '12') >> Traceback (most recent call last): >> File "", line 1, in >> ValueError: unterminated replacement field > > When are the "{{" and "}}" escape characters replaced with '{' and '}'? While parsing for the starting '{'. I'm not saying this is the best or only or even PEP-specified way of doing it, but that's how the sample implementation does it (and the way the sandbox version has done it for many months). >> But, >> >>> "{{{0:^{1}}".format('Python', '12') >> '{ Python ' > > So escaping '{' with '{{' and '}' with '}}' doesn't work inside of > format expressions? As I have it implemented, yes. > That would mean there is no way to pass a brace to a __format__ method. No way using string.format, correct. You could pass it in using the builtin format(), or by calling __format__ directly. But you're correct, for the most part if string.format doesn't accept it, it's not practical. >> I think you mean: >> ShowSpec("{0:{1}}").format('abc', 'xyz') > > No, because you may need to be able to pass the '{' and '}' character to > the format specifier in some way. The standard specifiers don't use > them, but custom specifiers may need them. Also true. >>> "{0}".format('{value:{{^{width}}', width='10', value='Python') >> >> >>> "{0}".format('{value:{{^{width}}', width='10', value='Python') >> '{value:{{^{width}}' > > Depending on weather or not the evaluation is recursive this may or may > not be correct. > > I think it's actually easier to do it recursively and not put limits on > where format specifiers can be used or not. But then you'd always have to worry that some replaced string looks like something that could be interpreted as a field, even if that's not what you want. What if "{value}" came from user supplied input? I don't think you'd want (or expect) any string you output that contains braces to be expanded. From guido at python.org Thu Aug 23 04:06:59 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 22 Aug 2007 19:06:59 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: <20070823004155.GB12780@electricrain.com> References: <20070823004155.GB12780@electricrain.com> Message-ID: For me too. Great work whoever fixed these (and other tests, like xmlrpc). All I've got left failing is the three email tests, and I know Barry Warsaw is working on those. (Although he used some choice swearwords to describe the current state. :-) --Guido On 8/22/07, Gregory P. Smith wrote: > > > There are currently about 7 failing unit tests left: > > > > > > test_bsddb > > > test_bsddb3 > ... > > fyi these two pass for me on the current py3k branch on ubuntu linux > and mac os x 10.4.9. > > -greg > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Aug 23 04:07:51 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 22 Aug 2007 19:07:51 -0700 Subject: [Python-3000] proposed fix for test_xmlrpc.py in py3k In-Reply-To: References: Message-ID: My mistake, out of habit I limited the submit to the Lib subdirectory. Will do later. On 8/22/07, John Reese wrote: > Thanks, sounds good. I'm curious why you left out the change to > Doc/library/xmlrpclib.rst -- the documentation of the type of the > parameter was out-of-date, if it was ever right. > > On 8/22/07, Guido van Rossum wrote: > > Thanks! I've checked the bulk of this in, excepting the fix for #3, > > which I fixed at the source in longobject.c. Also, I changed the call > > to io.StringIO() to first convert the bytes to characters, using the > > same encoding as used for the HTTP request header line (Latin-1). > > > > --Guido > > > > On 8/22/07, John Reese wrote: > > > Good afternoon. I'm in the Google Python Sprint working on getting > > > the test_xmlrpc unittest to pass. The following patch was prepared by > > > Jacques Frechet and me. We'd appreciate feedback on the attached > > > patch. > > > > > > What was broken: > > > > > > > > > 1. BaseHTTPServer attempts to parse the http headers with an > > > rfc822.Message class. This was changed in r56905 by Jeremy Hylton to > > > use the new io library instead of stringio as before. Unfortunately > > > Jeremy's change resulted in TextIOWrapper stealing part of the HTTP > > > request body, due to its buffering quantum. This was not seen in > > > normal tests because GET requests have no body, but xmlrpc uses POSTs. > > > We fixed this by doing the equivalent of what was done before, but > > > using io.StringIO instead of the old cStringIO class: we pull out just > > > the header using a sequence of readlines. > > > > > > > > > 2. Once this was fixed, a second error asserted: > > > test_xmlrpc.test_with{_no,}_info call .get on the headers object from > > > xmlrpclib.ProtocolError. This fails because the headers object became > > > a list in r57194. The story behind this is somewhat complicated: > > > - xmlrpclib used to use httplib.HTTP, which is old and deprecated > > > - r57024 Jeremy Hylton switched py3k to use more modern httplib > > > infrastructure, but broke xmlrpclib.Transport.request; the "headers" > > > variable was now referenced without being set > > > - r57194 Hyeshik Chang fixed xmlrpclib.Transport.request to get the > > > headers in a way that didn't explode; unfortunately, it now returned a > > > list instead of a dict, but there were no tests to catch this > > > - r57221 Guido integrated xmlrpc changes from the trunk, including > > > r57158, which added tests that relied on headers being a dict. > > > Unfortunately, it no longer was. > > > > > > > > > 3. test_xmlrpc.test_fail_with_info was failing because the ValueError > > > string of int('nonintegralstring') in py3k currently has an "s". This > > > is presumably going away soon; the test now uses a regular expression > > > with an optional leading "s", which is a little silly, but r56209 is > > > prior art. > > > > > > >>> int('z') > > > Traceback (most recent call last): > > > File "", line 1, in > > > ValueError: invalid literal for int() with base 10: s'z' > > > > > > _______________________________________________ > > > Python-3000 mailing list > > > Python-3000 at python.org > > > http://mail.python.org/mailman/listinfo/python-3000 > > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > > > > > > > > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From andrew.j.wade at gmail.com Thu Aug 23 05:15:56 2007 From: andrew.j.wade at gmail.com (Andrew James Wade) Date: Wed, 22 Aug 2007 23:15:56 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46CCE3DF.1090004@trueblade.com> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> <46CC68EB.2030609@trueblade.com> <46CCBBF7.3060201@ronadam.com> <46CCD063.2060007@trueblade.com> <46CCDE13.60502@ronadam.com> <46CCE3DF.1090004@trueblade.com> Message-ID: <20070822231556.c1f9f647.ajwade+py3k@andrew.wade.networklinux.net> On Wed, 22 Aug 2007 21:33:19 -0400 Eric Smith wrote: > Ron Adam wrote: ... > > That would mean there is no way to pass a brace to a __format__ method. > > No way using string.format, correct. You could pass it in using the > builtin format(), or by calling __format__ directly. But you're > correct, for the most part if string.format doesn't accept it, it's not > practical. What about: >>> "{0:{lb}{1}{lb}}".format(ShowSpec(), 'abc', lb='{', rb='}') '{abc}' Ugly, but better than nothing. > > I think it's actually easier to do it recursively and not put limits on > > where format specifiers can be used or not. > > But then you'd always have to worry that some replaced string looks like > something that could be interpreted as a field, even if that's not what > you want. > > What if "{value}" came from user supplied input? I don't think you'd > want (or expect) any string you output that contains braces to be expanded. Not a problem with recursion: $ echo $(echo $(pwd)) /home/ajwade $ a='echo $(pwd)' $ echo $a echo $(pwd) $ echo $($a) $(pwd) $ echo $($($a)) bash: $(pwd): command not found The key is to do substitution only once at each level of recursion; which is what a naive recursive algorithm would do anyway. And I'd do the recursive substitution before even starting to parse the field: it's simple and powerful. -- Andrew From andrew.j.wade at gmail.com Thu Aug 23 06:56:19 2007 From: andrew.j.wade at gmail.com (Andrew James Wade) Date: Thu, 23 Aug 2007 00:56:19 -0400 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <20070822231556.c1f9f647.ajwade+py3k@andrew.wade.networklinux.net> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> <46CC68EB.2030609@trueblade.com> <46CCBBF7.3060201@ronadam.com> <46CCD063.2060007@trueblade.com> <46CCDE13.60502@ronadam.com> <46CCE3DF.1090004@trueblade.com> <20070822231556.c1f9f647.ajwade+py3k@andrew.wade.networklinux.net> Message-ID: <20070823005619.bdc3c1e1.ajwade+py3k@andrew.wade.networklinux.net> On Wed, 22 Aug 2007 23:15:56 -0400 Andrew James Wade wrote: > And I'd do > the recursive substitution before even starting to parse the field: > it's simple and powerful. Scratch that suggestion; the implications need to be thought through. If we allow recursive substitution only in the format specifier that decision can always be re-visited at a later date. -- Andrew From jtr at miskatonic.nu Thu Aug 23 03:28:50 2007 From: jtr at miskatonic.nu (John Reese) Date: Wed, 22 Aug 2007 18:28:50 -0700 Subject: [Python-3000] proposed fix for test_xmlrpc.py in py3k In-Reply-To: References: Message-ID: Thanks, sounds good. I'm curious why you left out the change to Doc/library/xmlrpclib.rst -- the documentation of the type of the parameter was out-of-date, if it was ever right. On 8/22/07, Guido van Rossum wrote: > Thanks! I've checked the bulk of this in, excepting the fix for #3, > which I fixed at the source in longobject.c. Also, I changed the call > to io.StringIO() to first convert the bytes to characters, using the > same encoding as used for the HTTP request header line (Latin-1). > > --Guido > > On 8/22/07, John Reese wrote: > > Good afternoon. I'm in the Google Python Sprint working on getting > > the test_xmlrpc unittest to pass. The following patch was prepared by > > Jacques Frechet and me. We'd appreciate feedback on the attached > > patch. > > > > What was broken: > > > > > > 1. BaseHTTPServer attempts to parse the http headers with an > > rfc822.Message class. This was changed in r56905 by Jeremy Hylton to > > use the new io library instead of stringio as before. Unfortunately > > Jeremy's change resulted in TextIOWrapper stealing part of the HTTP > > request body, due to its buffering quantum. This was not seen in > > normal tests because GET requests have no body, but xmlrpc uses POSTs. > > We fixed this by doing the equivalent of what was done before, but > > using io.StringIO instead of the old cStringIO class: we pull out just > > the header using a sequence of readlines. > > > > > > 2. Once this was fixed, a second error asserted: > > test_xmlrpc.test_with{_no,}_info call .get on the headers object from > > xmlrpclib.ProtocolError. This fails because the headers object became > > a list in r57194. The story behind this is somewhat complicated: > > - xmlrpclib used to use httplib.HTTP, which is old and deprecated > > - r57024 Jeremy Hylton switched py3k to use more modern httplib > > infrastructure, but broke xmlrpclib.Transport.request; the "headers" > > variable was now referenced without being set > > - r57194 Hyeshik Chang fixed xmlrpclib.Transport.request to get the > > headers in a way that didn't explode; unfortunately, it now returned a > > list instead of a dict, but there were no tests to catch this > > - r57221 Guido integrated xmlrpc changes from the trunk, including > > r57158, which added tests that relied on headers being a dict. > > Unfortunately, it no longer was. > > > > > > 3. test_xmlrpc.test_fail_with_info was failing because the ValueError > > string of int('nonintegralstring') in py3k currently has an "s". This > > is presumably going away soon; the test now uses a regular expression > > with an optional leading "s", which is a little silly, but r56209 is > > prior art. > > > > >>> int('z') > > Traceback (most recent call last): > > File "", line 1, in > > ValueError: invalid literal for int() with base 10: s'z' > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From keir at google.com Thu Aug 23 03:34:20 2007 From: keir at google.com (Keir Mierle) Date: Wed, 22 Aug 2007 18:34:20 -0700 Subject: [Python-3000] [PATCH] Fix broken round and truncate behaviour (PEP3141) Message-ID: This patch fixes the previously added truncate, and also fixes round behavior. The two argument version of round is not currently handling the round toward even case. Keir -------------- next part -------------- A non-text attachment was scrubbed... Name: round_truncate_fix.diff Type: text/x-patch Size: 6257 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070822/757bdb72/attachment.bin From rrr at ronadam.com Thu Aug 23 07:31:04 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 23 Aug 2007 00:31:04 -0500 Subject: [Python-3000] PEP 3101 Updated In-Reply-To: <46CCE3DF.1090004@trueblade.com> References: <46C2809C.3000806@acm.org> <46C2C1A0.4060002@trueblade.com> <46CC68EB.2030609@trueblade.com> <46CCBBF7.3060201@ronadam.com> <46CCD063.2060007@trueblade.com> <46CCDE13.60502@ronadam.com> <46CCE3DF.1090004@trueblade.com> Message-ID: <46CD1B98.1090008@ronadam.com> Eric Smith wrote: > Ron Adam wrote: >> >> >> Eric Smith wrote: >>> Ron Adam wrote: >>>>> I've been re-reading the PEP, in an effort to make sure everything >>>>> is working. I realized that these tests should not pass. The PEP >>>>> says that "Format specifiers can themselves contain replacement >>>>> fields". The tests above have replacement fields in the field >>>>> name, which is not allowed. I'm going to remove this functionality. >>>>> >>>>> I believe the intent is to support a replacement for: >>>>> "%.*s" % (4, 'how now brown cow') >>>>> >>>>> Which would be: >>>>> "{0:.{1}}".format('how now brown cow', 4) >>>>> >>>>> For this, there's no need for replacement on field name. I've >>>>> taken it out of the code, and made these tests in to errors. >>>> >>>> I think it should work myself, but it could be added back in later >>>> if there is a need to. >>>> >>>> >>>> I'm still concerned about the choice of {{ and }} as escaped brackets. >>>> >>>> What does the following do? >>>> >>>> >>>> "{0:{{^{1}}".format('Python', '12') >>> >>> >>> "{0:{{^{1}}".format('Python', '12') >>> Traceback (most recent call last): >>> File "", line 1, in >>> ValueError: unterminated replacement field >> >> When are the "{{" and "}}" escape characters replaced with '{' and '}'? > > While parsing for the starting '{'. I'm not saying this is the best or > only or even PEP-specified way of doing it, but that's how the sample > implementation does it (and the way the sandbox version has done it for > many months). Any problems can be fixed of course once the desired behavior is decided on. These are just some loose ends that still need to be spelled out in the PEP. >>> But, >>> >>> "{{{0:^{1}}".format('Python', '12') >>> '{ Python ' >> >> So escaping '{' with '{{' and '}' with '}}' doesn't work inside of >> format expressions? > > As I have it implemented, yes. > >> That would mean there is no way to pass a brace to a __format__ method. > > No way using string.format, correct. You could pass it in using the > builtin format(), or by calling __format__ directly. But you're > correct, for the most part if string.format doesn't accept it, it's not > practical. See the suggestions below. >>>> "{0}".format('{value:{{^{width}}', width='10', value='Python') >>> >>> >>> "{0}".format('{value:{{^{width}}', width='10', value='Python') >>> '{value:{{^{width}}' >> >> Depending on weather or not the evaluation is recursive this may or >> may not be correct. >> >> I think it's actually easier to do it recursively and not put limits >> on where format specifiers can be used or not. > > But then you'd always have to worry that some replaced string looks like > something that could be interpreted as a field, even if that's not what > you want. > > What if "{value}" came from user supplied input? I don't think you'd > want (or expect) any string you output that contains braces to be expanded. Ok, after thinking about it for a while... Then maybe it's best not to use any recursion, not even at the top level. The above expressions would then need to spelled: "{{0:.{0}}}".format(4).format('how now brown cow') "{{value:^{width}}}".format(width='10').format(value='Python') "{{0:{{^{0}}}".format(12).format('Python') Do those work? This is not that different to how '%' formatting already works. Either way I'd like to see an unambiguous escapes used for braces. For both ease of implementation and ease of reading. Maybe we can re-use the '%' for escaping characters. ['%%', '%{', '%}'] "%{0:.{0}%}".format('4').format('how now brown cow') "%{value:^{width}%}".format(width='10').format(value='Python') "%{0:%%%{^{0}%}".format('12').format('Python') The only draw back of this is the '%' specifier type needs to be expressed as either '%%' or maybe by another letter, 'p'? Which is a minor issue I think. Reasons for doing this... - It makes determining where fields start and stop easier than using '{{' and '}}' when other braces are in the string. (Both for humans and for code.) - It's a better alternative than '\{' and '\}' because it doesn't have any issues with back slashes and raw strings or cause excessive '\'s in strings. - It doesn't collide with regular expressions. - It looks familiar in the context that it's used in. - Everyone is already familiar with '%%'. So adding '%{' and '%}' would not seem out of place. Although a recursive solution is neat and doesn't require typing '.format' as much, these two suggestions together are easy to understand and avoid all of the above issues. (As near as I can tell.) Cheers, _RON From martin at v.loewis.de Thu Aug 23 07:58:33 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 23 Aug 2007 07:58:33 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <20070822235929.GA12780@electricrain.com> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <20070822235929.GA12780@electricrain.com> Message-ID: <46CD2209.8000408@v.loewis.de> > IMHO all of that is desirable in many situations but it is not strict. > bytes:bytes or int:bytes (depending on the database type) are > fundamentally all the C berkeleydb library knows. Attaching meaning > to the keys and values is up to the user. I'm about to try a _bsddb.c > that strictly enforces bytes as values for the underlying bsddb.db API > provided by _bsddb in my sandbox under the assumption that being > strict about bytes is desired at that level there. I predict lots of > Lib/bsddb/test/ edits. I fixed it all a few weeks ago, in revisions r56754, r56840, r56890, r56892, r56914. I predict you'll find that most of the edits are already committed. Regards, Martin From greg at electricrain.com Thu Aug 23 08:12:32 2007 From: greg at electricrain.com (Gregory P. Smith) Date: Wed, 22 Aug 2007 23:12:32 -0700 Subject: [Python-3000] easy int to bytes conversion similar to chr? Message-ID: <20070823061232.GA4405@electricrain.com> Is there anything similar to chr(65) for creating a single byte string that doesn't involve creating an intermediate string or tuple object? bytes(chr(65)) bytes((65,)) both seem slightly weird. Greg From greg at electricrain.com Thu Aug 23 08:54:08 2007 From: greg at electricrain.com (Gregory P. Smith) Date: Wed, 22 Aug 2007 23:54:08 -0700 Subject: [Python-3000] easy int to bytes conversion similar to chr? In-Reply-To: <20070823061232.GA4405@electricrain.com> References: <20070823061232.GA4405@electricrain.com> Message-ID: <20070823065408.GB4405@electricrain.com> On Wed, Aug 22, 2007 at 11:12:32PM -0700, Gregory P. Smith wrote: > Is there anything similar to chr(65) for creating a single byte string > that doesn't involve creating an intermediate string or tuple object? > > bytes(chr(65)) > bytes((65,)) > > both seem slightly weird. > > Greg yes i know.. bad example. b'\x41' works for that. pretend i used an integer variable not an up front constant. bytes(chr(my_int)) # not strictly correct unless 0<=my_int<=255 bytes((my_int,)) struct.pack('B', my_int) This came up as being useful in unittests for the bsddb bytes:bytes changes i'm making but at the moment I'm not coming up with practical examples where its important. maybe this is a nonissue. -gps From martin at v.loewis.de Thu Aug 23 09:27:37 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 23 Aug 2007 09:27:37 +0200 Subject: [Python-3000] easy int to bytes conversion similar to chr? In-Reply-To: <20070823061232.GA4405@electricrain.com> References: <20070823061232.GA4405@electricrain.com> Message-ID: <46CD36E9.1090609@v.loewis.de> Gregory P. Smith schrieb: > Is there anything similar to chr(65) for creating a single byte string > that doesn't involve creating an intermediate string or tuple object? > > bytes(chr(65)) > bytes((65,)) > > both seem slightly weird. b = bytes(1) b[0] = 65 doesn't create an intermediate string or tuple object. Regards, Martin From greg at electricrain.com Thu Aug 23 09:38:38 2007 From: greg at electricrain.com (Gregory P. Smith) Date: Thu, 23 Aug 2007 00:38:38 -0700 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <46CD2209.8000408@v.loewis.de> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> Message-ID: <20070823073837.GA14725@electricrain.com> On Thu, Aug 23, 2007 at 07:58:33AM +0200, "Martin v. L?wis" wrote: > > IMHO all of that is desirable in many situations but it is not strict. > > bytes:bytes or int:bytes (depending on the database type) are > > fundamentally all the C berkeleydb library knows. Attaching meaning > > to the keys and values is up to the user. I'm about to try a _bsddb.c > > that strictly enforces bytes as values for the underlying bsddb.db API > > provided by _bsddb in my sandbox under the assumption that being > > strict about bytes is desired at that level there. I predict lots of > > Lib/bsddb/test/ edits. > > I fixed it all a few weeks ago, in revisions r56754, r56840, r56890, > r56892, r56914. I predict you'll find that most of the edits are > already committed. > > Regards, > Martin Yeah you did the keys (good!). I just checked in a change to require values to also by bytes. Maybe that goes so far as to be inconvenient? Its accurate. All retreived data comes back as bytes. Greg From martin at v.loewis.de Thu Aug 23 09:49:19 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 23 Aug 2007 09:49:19 +0200 Subject: [Python-3000] Immutable bytes type and dbm modules In-Reply-To: <20070823073837.GA14725@electricrain.com> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> Message-ID: <46CD3BFF.5080904@v.loewis.de> > Yeah you did the keys (good!). I just checked in a change to require > values to also by bytes. Maybe that goes so far as to be inconvenient? Ah, ok. I think it is fine. We still need to discuss what the best way is to do string:string databases, or string:bytes databases. I added StringKeys and StringValues to allow for such cases, and I also changed shelve to use string keys (not bytes keys), as this is really a dictionary-like application; this all needs to be discussed. Regards, Martin From barry at python.org Thu Aug 23 13:16:52 2007 From: barry at python.org (Barry Warsaw) Date: Thu, 23 Aug 2007 07:16:52 -0400 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: References: <20070823004155.GB12780@electricrain.com> Message-ID: <352F453B-81BD-4AE1-AA1B-08B325601172@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 22, 2007, at 10:06 PM, Guido van Rossum wrote: > For me too. Great work whoever fixed these (and other tests, like > xmlrpc). > > All I've got left failing is the three email tests, and I know Barry > Warsaw is working on those. (Although he used some choice swearwords > to describe the current state. :-) I plan to spend some Real Time on those today and I think Bill is going to meet up with me on #python-dev when the Californians wake up. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRs1spHEjvBPtnXfVAQLepwQArE3MEL1ygNiEvfHa1uBShfUyRwjT/JyI WzPv8pVWUumwdSqgzj0CW1iyAqV1dUtm9MoRgImyJQu7rowtPnDyOutdJJSyo9xN y/oUSj6pRPftu785u6ZcbOWA34ROjmbv8R4wFvfFHs2fBnX18OosfSLoR9rWqSlM ae2kv0maDFw= =WzNl -----END PGP SIGNATURE----- From aahz at pythoncraft.com Thu Aug 23 16:28:59 2007 From: aahz at pythoncraft.com (Aahz) Date: Thu, 23 Aug 2007 07:28:59 -0700 Subject: [Python-3000] [PATCH] Fix math.ceil() behaviour for PEP3141 In-Reply-To: References: Message-ID: <20070823142859.GA16448@panix.com> On Wed, Aug 22, 2007, Keir Mierle wrote: > > The attached patch fixes math.ceil to delegate to x.__ceil__() if it > is defined, according to PEP 3141, and adds tests to cover the new > cases. Patch is against r57303. Please wait until the new python.org bug tracker is up and post the patch there. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you don't know what your program is supposed to do, you'd better not start writing it." --Dijkstra From jeremy at alum.mit.edu Thu Aug 23 16:37:35 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 23 Aug 2007 10:37:35 -0400 Subject: [Python-3000] proposed fix for test_xmlrpc.py in py3k In-Reply-To: References: Message-ID: On 8/22/07, John Reese wrote: > Good afternoon. I'm in the Google Python Sprint working on getting > the test_xmlrpc unittest to pass. The following patch was prepared by > Jacques Frechet and me. We'd appreciate feedback on the attached > patch. > > What was broken: > > > 1. BaseHTTPServer attempts to parse the http headers with an > rfc822.Message class. This was changed in r56905 by Jeremy Hylton to > use the new io library instead of stringio as before. Unfortunately > Jeremy's change resulted in TextIOWrapper stealing part of the HTTP > request body, due to its buffering quantum. This was not seen in > normal tests because GET requests have no body, but xmlrpc uses POSTs. > We fixed this by doing the equivalent of what was done before, but > using io.StringIO instead of the old cStringIO class: we pull out just > the header using a sequence of readlines. Thanks for the fix. Are there any tests you can add to test_xmlrpc_net that would have caught this error? There was some non-unittest test code in xmlrpc but it seemed to use servers or requests that don't work anymore. I couldn't find any xmlrpc servers that I could use to test more than the getCurrentTime() test that we currently have. Jeremy > > > 2. Once this was fixed, a second error asserted: > test_xmlrpc.test_with{_no,}_info call .get on the headers object from > xmlrpclib.ProtocolError. This fails because the headers object became > a list in r57194. The story behind this is somewhat complicated: > - xmlrpclib used to use httplib.HTTP, which is old and deprecated > - r57024 Jeremy Hylton switched py3k to use more modern httplib > infrastructure, but broke xmlrpclib.Transport.request; the "headers" > variable was now referenced without being set > - r57194 Hyeshik Chang fixed xmlrpclib.Transport.request to get the > headers in a way that didn't explode; unfortunately, it now returned a > list instead of a dict, but there were no tests to catch this > - r57221 Guido integrated xmlrpc changes from the trunk, including > r57158, which added tests that relied on headers being a dict. > Unfortunately, it no longer was. > > > 3. test_xmlrpc.test_fail_with_info was failing because the ValueError > string of int('nonintegralstring') in py3k currently has an "s". This > is presumably going away soon; the test now uses a regular expression > with an optional leading "s", which is a little silly, but r56209 is > prior art. > > >>> int('z') > Traceback (most recent call last): > File "", line 1, in > ValueError: invalid literal for int() with base 10: s'z' > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu > > > From guido at python.org Thu Aug 23 16:38:05 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Aug 2007 07:38:05 -0700 Subject: [Python-3000] [PATCH] Fix math.ceil() behaviour for PEP3141 In-Reply-To: <20070823142859.GA16448@panix.com> References: <20070823142859.GA16448@panix.com> Message-ID: Aahz, While the sprint is going on (and because I knew about the tracker move) I've encouraged people at the sprint to post their patches directly to python-3000 -- most likely the review feedback will be given in person and then the patch will be checked in. This is quicker than using a tracker for this particular purpose. Patches that are left hanging past the sprint will have to be uploaded to the (new) tracker. --Guido On 8/23/07, Aahz wrote: > On Wed, Aug 22, 2007, Keir Mierle wrote: > > > > The attached patch fixes math.ceil to delegate to x.__ceil__() if it > > is defined, according to PEP 3141, and adds tests to cover the new > > cases. Patch is against r57303. > > Please wait until the new python.org bug tracker is up and post the > patch there. > -- > Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ > > "If you don't know what your program is supposed to do, you'd better not > start writing it." --Dijkstra > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From p.f.moore at gmail.com Thu Aug 23 17:36:35 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 23 Aug 2007 16:36:35 +0100 Subject: [Python-3000] Is __cmp__ deprecated? Message-ID: <79990c6b0708230836i78fdd6e9w23f75eb7be639371@mail.gmail.com> Can I just check - is __cmp__ due for removal in Py3K? There's no mention of it in PEP 3100, but its status seems unclear from references I've found. Actually, is __coerce__ still around, as well? Again, I can't see a clear answer in the PEPs or list discussions. Paul. From guido at python.org Thu Aug 23 17:43:52 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Aug 2007 08:43:52 -0700 Subject: [Python-3000] Is __cmp__ deprecated? In-Reply-To: <79990c6b0708230836i78fdd6e9w23f75eb7be639371@mail.gmail.com> References: <79990c6b0708230836i78fdd6e9w23f75eb7be639371@mail.gmail.com> Message-ID: Coerce is definitely dead. cmp() is still alive and __cmp__ is used to overload it; on the one hand I'd like to get rid of it but OTOH it's occasionally useful. So it'll probably stay. However, to overload <, == etc. you *have* to overload __lt__ and friends. On 8/23/07, Paul Moore wrote: > Can I just check - is __cmp__ due for removal in Py3K? There's no > mention of it in PEP 3100, but its status seems unclear from > references I've found. > > Actually, is __coerce__ still around, as well? Again, I can't see a > clear answer in the PEPs or list discussions. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From p.f.moore at gmail.com Thu Aug 23 17:55:08 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 23 Aug 2007 16:55:08 +0100 Subject: [Python-3000] Is __cmp__ deprecated? In-Reply-To: References: <79990c6b0708230836i78fdd6e9w23f75eb7be639371@mail.gmail.com> Message-ID: <79990c6b0708230855k720ecff7n45a2b570a823daa3@mail.gmail.com> On 23/08/07, Guido van Rossum wrote: > Coerce is definitely dead. > > cmp() is still alive and __cmp__ is used to overload it; on the one > hand I'd like to get rid of it but OTOH it's occasionally useful. So > it'll probably stay. However, to overload <, == etc. you *have* to > overload __lt__ and friends. Thanks. In particular, thanks for the comment about overloading < etc - that's what I was looking at, and I was wondering about using __cmp__ to save some boilerplate. You saved me some headaches! Paul. From p.f.moore at gmail.com Thu Aug 23 18:20:27 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 23 Aug 2007 17:20:27 +0100 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy for Numbers In-Reply-To: References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> <5d44f72f0708021153u7ea1f443jfdee3c167b011011@mail.gmail.com> <5d44f72f0708221236k7c3ea054k43eb237f4a3ef577@mail.gmail.com> Message-ID: <79990c6b0708230920x323dd369h9369e33f29989517@mail.gmail.com> On 22/08/07, Guido van Rossum wrote: > > * Add Demo/classes/Rat.py to the stdlib? > > Yes, but it needs a makeover. At the very least I'd propose the module > name to be rational. If no-one else gets to this, I might take a look. But I'm not likely to make fast progress as I don't have a lot of free time... (And I don't have a Windows compiler, so I'll need to set up a Linux VM and find out how to build Py3K on that!) > The code is really old. Too right - it's riddled with "isinstance" calls which probably aren't very flexible, and it seems to try to handle mixed-mode operations with float and complex, which I can't see the use for... Given that the basic algorithms are pretty trivial, the major part of any makeover would be rewriting the special method implementations to conform to the Rational ABC. A makeover is probably more or less a rewrite. I wrote a rational number class in C++ for Boost once, it wouldn't be too hard to port. Paul. From greg at electricrain.com Thu Aug 23 19:18:38 2007 From: greg at electricrain.com (Gregory P. Smith) Date: Thu, 23 Aug 2007 10:18:38 -0700 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: <46CD3BFF.5080904@v.loewis.de> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> Message-ID: <20070823171837.GI24059@electricrain.com> On Thu, Aug 23, 2007 at 09:49:19AM +0200, "Martin v. L?wis" wrote: > > Yeah you did the keys (good!). I just checked in a change to require > > values to also by bytes. Maybe that goes so far as to be inconvenient? > > Ah, ok. I think it is fine. We still need to discuss what the best > way is to do string:string databases, or string:bytes databases. > > I added StringKeys and StringValues to allow for such cases, and I > also changed shelve to use string keys (not bytes keys), as this > is really a dictionary-like application; this all needs to be > discussed. > > Regards, > Martin Alright, regarding bytes being mutable. I realized this morning that things just won't work with the database libraries that way. PyBytes_AS_STRING() returns a the bytesobjects char *ob_bytes pointer. But database operations occur with the GIL released so that mutable string is free to change out from underneath it. I -detest- the idea of making another temporary copy of the data just to allow the GIL to be released during IO. data copies == bad. Wasn't a past mailing list thread claiming the bytes type was supposed to be great for IO? How's that possible unless we add a lock to the bytesobject? (Its not -likely- that bytes objects will be modified while in use for IO in most circumstances but just the possibility that it could be is a problem) I don't have much sprint time available today but I'll stop by to talk about this one a bit. -greg From janssen at parc.com Thu Aug 23 20:41:09 2007 From: janssen at parc.com (Bill Janssen) Date: Thu, 23 Aug 2007 11:41:09 PDT Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: <352F453B-81BD-4AE1-AA1B-08B325601172@python.org> References: <20070823004155.GB12780@electricrain.com> <352F453B-81BD-4AE1-AA1B-08B325601172@python.org> Message-ID: <07Aug23.114119pdt."57996"@synergy1.parc.xerox.com> > I plan to spend some Real Time on those today and I think Bill is > going to meet up with me on #python-dev when the Californians wake up. That's not working real well right now... IRC seems wedged for me. Probably a firewall issue. I've got to try a different location, and we'll try to connect again when I'm there. Bill From g.brandl at gmx.net Thu Aug 23 20:44:11 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 23 Aug 2007 20:44:11 +0200 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: <07Aug23.114119pdt."57996"@synergy1.parc.xerox.com> References: <20070823004155.GB12780@electricrain.com> <352F453B-81BD-4AE1-AA1B-08B325601172@python.org> <07Aug23.114119pdt."57996"@synergy1.parc.xerox.com> Message-ID: Bill Janssen schrieb: >> I plan to spend some Real Time on those today and I think Bill is >> going to meet up with me on #python-dev when the Californians wake up. > > That's not working real well right now... IRC seems wedged for me. > > Probably a firewall issue. > > I've got to try a different location, and we'll try to connect again > when I'm there. Note that if it's a simple blocked port issue, you can also connect to freenode on port 8000. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From collinw at gmail.com Thu Aug 23 20:47:24 2007 From: collinw at gmail.com (Collin Winter) Date: Thu, 23 Aug 2007 11:47:24 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: <20070823004155.GB12780@electricrain.com> References: <20070823004155.GB12780@electricrain.com> Message-ID: <43aa6ff70708231147h673bf14evbb87ef262094a7af@mail.gmail.com> On 8/22/07, Gregory P. Smith wrote: > > > There are currently about 7 failing unit tests left: > > > > > > test_bsddb > > > test_bsddb3 > ... > > fyi these two pass for me on the current py3k branch on ubuntu linux > and mac os x 10.4.9. test_bsddb works for me on Ubuntu, and test_bsddb3 was working for me yesterday, but now fails with python: /home/collinwinter/src/python/py3k/Modules/_bsddb.c:388: make_dbt: Assertion `((((PyObject*)(obj))->ob_type) == (&PyBytes_Type) || PyType_IsSubtype((((PyObject*)(obj))->ob_type), (&PyBytes_Type)))' failed. The failure occurs after this line is emitted test02_cursors (bsddb.test.test_dbshelve.EnvThreadHashShelveTestCase) ... ok Collin Winter From martin at v.loewis.de Thu Aug 23 21:18:59 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 23 Aug 2007 21:18:59 +0200 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: <20070823171837.GI24059@electricrain.com> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> Message-ID: <46CDDDA3.1050906@v.loewis.de> > I -detest- the idea of making another temporary copy of the data just > to allow the GIL to be released during IO. data copies == bad. > Wasn't a past mailing list thread claiming the bytes type was supposed > to be great for IO? How's that possible unless we add a lock to the > bytesobject? (Its not -likely- that bytes objects will be modified > while in use for IO in most circumstances but just the possibility > that it could be is a problem) I agree. There must be a way to lock a bytes object from modification, preferably not by locking an attempt to modify it, but by raising an exception when a locked bytes object is modified. (I do realise that this gives something very close to immutable bytes objects). Regards, Martin From greg at electricrain.com Thu Aug 23 21:24:08 2007 From: greg at electricrain.com (Gregory P. Smith) Date: Thu, 23 Aug 2007 12:24:08 -0700 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: <46CDDDA3.1050906@v.loewis.de> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> <46CDDDA3.1050906@v.loewis.de> Message-ID: <20070823192408.GJ24059@electricrain.com> On Thu, Aug 23, 2007 at 09:18:59PM +0200, "Martin v. L?wis" wrote: > > I -detest- the idea of making another temporary copy of the data just > > to allow the GIL to be released during IO. data copies == bad. > > Wasn't a past mailing list thread claiming the bytes type was supposed > > to be great for IO? How's that possible unless we add a lock to the > > bytesobject? (Its not -likely- that bytes objects will be modified > > while in use for IO in most circumstances but just the possibility > > that it could be is a problem) > > I agree. There must be a way to lock a bytes object from modification, > preferably not by locking an attempt to modify it, but by raising an > exception when a locked bytes object is modified. > > (I do realise that this gives something very close to immutable bytes > objects). > > Regards, > Martin I like that idea. Its simple and leaves any actual locking up to a subclass or other wrapper. -gps From pfdubois at gmail.com Thu Aug 23 23:06:40 2007 From: pfdubois at gmail.com (Paul Dubois) Date: Thu, 23 Aug 2007 14:06:40 -0700 Subject: [Python-3000] document processing tools conversion report Message-ID: FYI: docutils will require some modification of at least the io module, to the extent that the ideal mode of fixing the current sources so that a subsequent pass of 2to3 will do the job, is probably not possible (but may be outside of this one file). I've made a report to the docutils tracker to that effect; will see what dialog ensues. There are things to handle images etc. so I think fixing this is above my paygrade. pygments converts ok but uses cPickle; changing to pickle is easy. Since pygments uses docutils.io, it isn't possible to run it further. sphinx converts ok, and can be imported. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070823/e64731f8/attachment.htm From g.brandl at gmx.net Thu Aug 23 23:59:50 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 23 Aug 2007 23:59:50 +0200 Subject: [Python-3000] document processing tools conversion report In-Reply-To: References: Message-ID: Paul Dubois schrieb: > FYI: docutils will require some modification of at least the io module, > to the extent that the ideal mode of fixing the current sources so that > a subsequent pass of 2to3 will do the job, is probably not possible (but > may be outside of this one file). I've made a report to the docutils > tracker to that effect; will see what dialog ensues. There are things to > handle images etc. so I think fixing this is above my paygrade. Thanks for looking into this! If just the one file is problematic and the rest can be handled by 2to3, we might be able to set up a way to do this automatically once people want to build the docs with 3k. I'll monitor the docutils tracker issue in any case. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From guido at python.org Fri Aug 24 00:07:55 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Aug 2007 15:07:55 -0700 Subject: [Python-3000] [PATCH] Fix broken round and truncate behaviour (PEP3141) In-Reply-To: References: Message-ID: Alex and I did a bunch more work based on this patch and checked it in: Committed revision 57359. On 8/22/07, Keir Mierle wrote: > This patch fixes the previously added truncate, and also fixes round > behavior. The two argument version of round is not currently handling > the round toward even case. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mierle at gmail.com Fri Aug 24 01:45:15 2007 From: mierle at gmail.com (Keir Mierle) Date: Thu, 23 Aug 2007 16:45:15 -0700 Subject: [Python-3000] [PATCH] Fix rich set comparison Message-ID: This patch fixes rich set comparison so that x < y works when x is a set and y is something which implements the corresponding comparison. Keir -------------- next part -------------- A non-text attachment was scrubbed... Name: richsetcmp.diff Type: text/x-patch Size: 2070 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070823/989ac442/attachment.bin From larry at hastings.org Fri Aug 24 02:03:21 2007 From: larry at hastings.org (Larry Hastings) Date: Thu, 23 Aug 2007 17:03:21 -0700 Subject: [Python-3000] [PATCH] Fix dumbdbm, which fixes test_shelve (for me); instrument other tests so we catch this sooner (and more directly) Message-ID: <46CE2049.8020802@hastings.org> Attached is a patch for review. As of revision 57341 (only a couple hours old as of this writing), test_shelve was failing on my machine. This was because I didn't have any swell databases available, so anydbm was falling back to dumbdbm, and dumbdbm had a bug. In Py3k, dumbdbm's dict-like interface now requires byte objects, which it internally encodes to "latin-1" then uses with a real dict. But dumbdbm.__contains__ was missing the conversion, so it was trying to use a bytes object with a real dict, and that failed with an error (as bytes objects are not hashable). This patch fixes dumbdbm.__contains__ so it encodes the key, fixing test_shelve on my machine. But there's more! Neil Norvitz pointed out that test_shelve /didn't/ fail on his machine. That's because dumbdbm is the last resort of anydbm, and he had a superior database module available. So the regression test suite was really missing two things: * test_dumbdbm should test dumbdbm.__contains__. * test_anydbm should test all the database modules available, not merely its first choice. So this patch also adds test_write_contains() to test_dumbdbm, and a new external function to test_anydbm: dbm_iterate(), which returns an iterator over all database modules available to anydbm, /and/ internally forces anydbm to use that database module, restoring anydbm to its first choice when it finishes iteration. I also renamed _delete_files() to delete_files() so it could be the canonical dbm cleanup function for other tests. While I was at it, I noticed that test_whichdbm.py did a good job of testing all the databases available, but with a slightly odd approach: it iterated over all the possible databases, and created new test methods--inserting them into the class object--for each one that was available. I changed it to use dbm_iterate() and delete_files() from test.test_anydbm, so that that logic can live in only one place. I didn't preserve the setattr() approach; I simply iterate over all the modules and run the tests inside one conventional method. One final thought, for the folks who defined this "in Py3k, database-backed dict-like objects use byte objects as keys" interface. dumbdbm.keys() returns self._index.keys(), which means that it's serving up real strings, not byte objects. Shouldn't it return [k.encode("latin-1") for k in self._index.keys()] ? (Or perhaps change iterkeys to return that as a generator expression, and change keys() to return list(self.iterkeys())?) Thanks, /larry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070823/d08bb467/attachment-0001.htm -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lch.py3k.dumbdb.contains.diff.1.txt Url: http://mail.python.org/pipermail/python-3000/attachments/20070823/d08bb467/attachment-0001.txt From mierle at gmail.com Fri Aug 24 02:23:47 2007 From: mierle at gmail.com (Keir Mierle) Date: Thu, 23 Aug 2007 17:23:47 -0700 Subject: [Python-3000] Strange method resolution problem with __trunc__, __round__ on floats Message-ID: The newly introduced trunc() and round() have the following odd behavior: $ ./python Python 3.0x (py3k, Aug 23 2007, 17:15:22) [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> trunc(3.14) Traceback (most recent call last): File "", line 1, in TypeError: type float doesn't define __trunc__ method [36040 refs] >>> 3.14.__trunc__ [36230 refs] >>> trunc(3.14) 3 [36230 refs] >>> It looks like builtin_trunc() is failing at the call to _PyType_Lookup(), which must be returning NULL to get the above behavior. I'm not sure what's causing this; perhaps someone more experienced than me has an idea? Keir From larry at hastings.org Fri Aug 24 02:50:31 2007 From: larry at hastings.org (Larry Hastings) Date: Thu, 23 Aug 2007 17:50:31 -0700 Subject: [Python-3000] [PATCH] Fix dumbdbm, which fixes test_shelve (for me); instrument other tests so we catch this sooner (and more directly) In-Reply-To: <46CE2049.8020802@hastings.org> References: <46CE2049.8020802@hastings.org> Message-ID: <46CE2B57.2040909@hastings.org> Patch submitted to Roundup; it's issue #1007: http://bugs.python.org/issue1007 (It's listed under Python 2.6 as there's currently no appropriate choice in the "Versions" list.) /larry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070823/b9c2aa29/attachment.htm From ero.carrera at gmail.com Fri Aug 24 02:55:03 2007 From: ero.carrera at gmail.com (Ero Carrera) Date: Thu, 23 Aug 2007 17:55:03 -0700 Subject: [Python-3000] String to unicode fixes in time and datetime Message-ID: <883A2C41-5CCB-4C2A-97D6-E5ACE5DEA46F@gmail.com> Hi, I'm attaching a small patch result of attempting to tackle part of one of the tasks in the Google Sprint. The patch removes most of the references of PyString_* calls in the "time" and "datetime" modules and adds Unicode support instead. There's a problem in "datetime" with "_PyUnicode_AsDefaultEncodedString". As there's no current equivalent that would provide an object of type "bytes", there are two occurrences of PyString_* functions to handle the returned "default encoded string" and convert it into bytes. cheers, -- Ero -------------- next part -------------- A non-text attachment was scrubbed... Name: time_datetime_pystring_patch.diff Type: application/octet-stream Size: 9162 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070823/8b0f0b34/attachment.obj From guido at python.org Fri Aug 24 04:02:23 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Aug 2007 19:02:23 -0700 Subject: [Python-3000] Strange method resolution problem with __trunc__, __round__ on floats In-Reply-To: References: Message-ID: I figured it out by stepping through builtin_trunc and into _PyType_Lookup for a bit. The type is not completely initialized; apparently fundamental types like float get initialized lazily *really* late. Inserting this block of code before the _PyType_Lookup call fixes things: if (Py_Type(number)->tp_dict == NULL) { if (PyType_Ready(Py_Type(number)) < 0) return NULL; } I'll check in a change ASAP. (Eric: this applies to the code I mailed you for format() earlier too!) --Guido On 8/23/07, Keir Mierle wrote: > The newly introduced trunc() and round() have the following odd behavior: > > $ ./python > Python 3.0x (py3k, Aug 23 2007, 17:15:22) > [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> trunc(3.14) > Traceback (most recent call last): > File "", line 1, in > TypeError: type float doesn't define __trunc__ method > [36040 refs] > >>> 3.14.__trunc__ > > [36230 refs] > >>> trunc(3.14) > 3 > [36230 refs] > >>> > > It looks like builtin_trunc() is failing at the call to > _PyType_Lookup(), which must be returning NULL to get the above > behavior. I'm not sure what's causing this; perhaps someone more > experienced than me has an idea? > > Keir > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From adrian at holovaty.com Fri Aug 24 04:50:16 2007 From: adrian at holovaty.com (Adrian Holovaty) Date: Thu, 23 Aug 2007 21:50:16 -0500 Subject: [Python-3000] Should 2to3 point out *possible*, but not definite changes? Message-ID: As part of the Python 3000 sprint (at Google's Chicago office), I've been working on the documentation for 2to3. I'm publishing updates at http://red-bean.com/~adrian/2to3.rst and will submit this as a documentation patch when it's completed. (I didn't get as much done today as I would have liked, but I'll be back at it Friday.) In my research of the 2to3 utility, I've been thinking about whether it should be expanded to include the equivalent of "warnings." I know one of its design goals has been to be "dumb but correct," but I propose that including optional warnings would be a bit smarter/helpful, without risking the tool's correctness. Specifically, I propose: * 2to3 gains either an "--include-warnings" option or an "--exclude-warnings" option, depending on which behavior is decided to be default. * If this option is set, the utility would search for an *additional* set of fixes -- fixes that *might* need to be made to the code but cannot be determined with certainty. An example of this is noted in the "Limitations" section of the 2to3 README: a = apply a(f, *args) (2to3 cannot handle this because it cannot detect reassignment.) Under my proposal, the utility would notice that "apply" is a builtin whose behavior is changing, and that this is a situation in which the correct 2to3 porting is ambiguous. The utility would designate this in the output with a Python comment on the previous line: # 2to3note: The semantics of apply() have changed. a = apply a(f, *args) Each comment would have a common prefix such as "2to3note" for easy grepping. Given the enormity of the Python 3000 syntax change, I think that the 2to3 utility should provide as much guidance as possible. What it does currently is extremely cool (I daresay miraculous), but I think we can get closer to 100% coverage if we take into account the ambiguous changes. Oh, and I'm happy to (attempt to) write this addition to the tool, as long as the powers at be deem it worthwhile. Thoughts? Adrian -- Adrian Holovaty holovaty.com | djangoproject.com From yginsburg at gmail.com Fri Aug 24 04:24:02 2007 From: yginsburg at gmail.com (Yuri Ginsburg) Date: Thu, 23 Aug 2007 19:24:02 -0700 Subject: [Python-3000] make uuid.py creation threadsafe Message-ID: <3343b3d90708231924rac129a5p2da2cd03a274dfed@mail.gmail.com> The attached small patch makes output buffer thus making uuid.py thread-safe. -- Yuri Ginsburg (YG10) -------------- next part -------------- A non-text attachment was scrubbed... Name: uuid.py.diff Type: application/octet-stream Size: 1570 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070823/3d6eabfd/attachment.obj From greg.ewing at canterbury.ac.nz Fri Aug 24 05:36:25 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 24 Aug 2007 15:36:25 +1200 Subject: [Python-3000] Is __cmp__ deprecated? In-Reply-To: References: <79990c6b0708230836i78fdd6e9w23f75eb7be639371@mail.gmail.com> Message-ID: <46CE5239.1000506@canterbury.ac.nz> Guido van Rossum wrote: > cmp() is still alive and __cmp__ is used to overload it; on the one > hand I'd like to get rid of it but OTOH it's occasionally useful. Maybe you could keep cmp() but implement it in terms of, say, __lt__ and __eq__? -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Fri Aug 24 05:40:54 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 24 Aug 2007 15:40:54 +1200 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: <20070823171837.GI24059@electricrain.com> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> Message-ID: <46CE5346.10301@canterbury.ac.nz> Gregory P. Smith wrote: > Wasn't a past mailing list thread claiming the bytes type was supposed > to be great for IO? How's that possible unless we add a lock to the > bytesobject? Doesn't the new buffer protocol provide something for getting a locked view of the data? If so, it seems like bytes should implement that. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From eric+python-dev at trueblade.com Fri Aug 24 05:57:04 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 23 Aug 2007 23:57:04 -0400 Subject: [Python-3000] PEP 3101 implementation uploaded to the tracker. Message-ID: <46CE5710.2030907@trueblade.com> There are a handful of remaining issues, but it works for the most part. http://bugs.python.org/issue1009 Thanks to Guido and Talin for all of their help the last few days, and thanks to Patrick Maupin for help with the initial implementation. Known issues: Better error handling, per the PEP. Need to write Formatter class. test_long is failing, but I don't think it's my doing. Need to fix this warning that I introduced when compiling Python/formatter_unicode.c: Objects/stringlib/unicodedefs.h:26: warning: `STRINGLIB_CMP' defined but not used Need more tests for sign handling for int and float. It still supports "()" sign formatting from an earlier PEP version. Eric. From mierle at gmail.com Fri Aug 24 06:03:39 2007 From: mierle at gmail.com (Keir Mierle) Date: Thu, 23 Aug 2007 21:03:39 -0700 Subject: [Python-3000] [PATCH] Implement remaining rich comparison operations on dictviews Message-ID: This patch implements rich comparisons with dict views such that dict().keys() can be compared like a set (i.e. < is subset, etc). Keir -------------- next part -------------- A non-text attachment was scrubbed... Name: dictview_richcompare.diff Type: text/x-patch Size: 4420 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070823/9623468b/attachment.bin From guido at python.org Fri Aug 24 06:10:25 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Aug 2007 21:10:25 -0700 Subject: [Python-3000] String to unicode fixes in time and datetime In-Reply-To: <883A2C41-5CCB-4C2A-97D6-E5ACE5DEA46F@gmail.com> References: <883A2C41-5CCB-4C2A-97D6-E5ACE5DEA46F@gmail.com> Message-ID: Hi Ero, Thanks for these! I checked them in. The datetime patch had a few problems (did you run the unit test?) that I got rid of. The function you were looking for does exist, it's PyUnicode_AsUTF8String(). (which returns a new object instead of a borrowed reference). I changed your code to use this. I changed a few places from PyBuffer_FromStringAndSize("", 1) to ("", 0) -- the bytes object always allocates an extra null byte that isn't included in the count. I changed a few places from using strlen() to using the PyBuffer_GET_SIZE() macro. PyBuffer_AS_STRING() can be NULL if the size is 0; I rearranged some code to avoid asserts triggering in this case. There are still two remaining problems: test_datetime leaks a bit (49 references) and test_strptime ands test_strftime leak a lot (over 2000 references!). We can hunt these down tomorrow. --Guido On 8/23/07, Ero Carrera wrote: > > Hi, > > I'm attaching a small patch result of attempting to tackle part of > one of the tasks in the Google Sprint. > The patch removes most of the references of PyString_* calls in the > "time" and "datetime" modules and adds Unicode support instead. > > There's a problem in "datetime" with > "_PyUnicode_AsDefaultEncodedString". As there's no current equivalent > that would provide an object of type "bytes", there are two > occurrences of PyString_* functions to handle the returned "default > encoded string" and convert it into bytes. > > cheers, > -- > Ero > > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 24 06:13:51 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Aug 2007 21:13:51 -0700 Subject: [Python-3000] make uuid.py creation threadsafe In-Reply-To: <3343b3d90708231924rac129a5p2da2cd03a274dfed@mail.gmail.com> References: <3343b3d90708231924rac129a5p2da2cd03a274dfed@mail.gmail.com> Message-ID: Thanks! Committed revision 57375. On 8/23/07, Yuri Ginsburg wrote: > The attached small patch makes output buffer thus making uuid.py thread-safe. > > -- > Yuri Ginsburg (YG10) > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Fri Aug 24 06:15:26 2007 From: janssen at parc.com (Bill Janssen) Date: Thu, 23 Aug 2007 21:15:26 PDT Subject: [Python-3000] sprint patch for server-side SSL Message-ID: <46CE5B5E.8030005@parc.com> Here's the final form of the SSL patch. Now includes a test file. All bugs discovered on Wednesday have been fixed. Bill -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ssl-update-diff Url: http://mail.python.org/pipermail/python-3000/attachments/20070823/0c1c0c37/attachment-0001.txt From guido at python.org Fri Aug 24 06:17:04 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Aug 2007 21:17:04 -0700 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: <46CE5346.10301@canterbury.ac.nz> References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> <46CE5346.10301@canterbury.ac.nz> Message-ID: On 8/23/07, Greg Ewing wrote: > Gregory P. Smith wrote: > > Wasn't a past mailing list thread claiming the bytes type was supposed > > to be great for IO? How's that possible unless we add a lock to the > > bytesobject? > > Doesn't the new buffer protocol provide something for > getting a locked view of the data? If so, it seems like > bytes should implement that. It *does* implement that! So there's the solution: these APIs should not insist on bytes but use the buffer API. It's quite a bit of work I suspect (especially since you can't use PyArg_ParseTuple with y# any more) but worth it. BTW PyUnicode should *not* support the buffer API. I'll add both of these to the task spreadsheet. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 24 06:36:45 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Aug 2007 21:36:45 -0700 Subject: [Python-3000] Should 2to3 point out *possible*, but not definite changes? In-Reply-To: References: Message-ID: Yes, I think this would be way cool! I believe there are already a few fixers that print messages about things they know are wrong but don't know how to fix, those could also be integrated (although arguably you'd want those messages to be treated as more severe). Does this mean that Django is committing to converting to Py3k? :-) --Guido On 8/23/07, Adrian Holovaty wrote: > As part of the Python 3000 sprint (at Google's Chicago office), I've > been working on the documentation for 2to3. I'm publishing updates at > http://red-bean.com/~adrian/2to3.rst and will submit this as a > documentation patch when it's completed. (I didn't get as much done > today as I would have liked, but I'll be back at it Friday.) > > In my research of the 2to3 utility, I've been thinking about whether > it should be expanded to include the equivalent of "warnings." I know > one of its design goals has been to be "dumb but correct," but I > propose that including optional warnings would be a bit > smarter/helpful, without risking the tool's correctness. > > Specifically, I propose: > > * 2to3 gains either an "--include-warnings" option or an > "--exclude-warnings" option, depending on which behavior is decided to > be default. > > * If this option is set, the utility would search for an *additional* > set of fixes -- fixes that *might* need to be made to the code but > cannot be determined with certainty. An example of this is noted in > the "Limitations" section of the 2to3 README: > > a = apply > a(f, *args) > > (2to3 cannot handle this because it cannot detect reassignment.) > > Under my proposal, the utility would notice that "apply" is a builtin > whose behavior is changing, and that this is a situation in which the > correct 2to3 porting is ambiguous. The utility would designate this in > the output with a Python comment on the previous line: > > # 2to3note: The semantics of apply() have changed. > a = apply > a(f, *args) > > Each comment would have a common prefix such as "2to3note" for easy grepping. > > Given the enormity of the Python 3000 syntax change, I think that the > 2to3 utility should provide as much guidance as possible. What it does > currently is extremely cool (I daresay miraculous), but I think we can > get closer to 100% coverage if we take into account the ambiguous > changes. > > Oh, and I'm happy to (attempt to) write this addition to the tool, as > long as the powers at be deem it worthwhile. > > Thoughts? > > Adrian > > -- > Adrian Holovaty > holovaty.com | djangoproject.com > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 24 07:03:57 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Aug 2007 22:03:57 -0700 Subject: [Python-3000] uuid creation not thread-safe? In-Reply-To: <1185671069.839769.274620@z28g2000prd.googlegroups.com> References: <1185671069.839769.274620@z28g2000prd.googlegroups.com> Message-ID: This was now fixed in 3.0. Somebody might want to backport. On 7/28/07, lcaamano wrote: > > On Jul 20, 1:52 pm, "Guido van Rossum" wrote: > > I discovered what appears to be a thread-unsafety inuuid.py. This is > > in the trunk as well as in 3.x; I'm using the trunk here for easy > > reference. There's some code around like 395: > > > > import ctypes, ctypes.util > > _buffer = ctypes.create_string_buffer(16) > > > > This creates a *global* buffer which is used as the output parameter > > to later calls to _uuid_generate_random() and _uuid_generate_time(). > > For example, around line 481, in uuid1(): > > > > _uuid_generate_time(_buffer) > > returnUUID(bytes=_buffer.raw) > > > > Clearly if two threads do this simultaneously they are overwriting > > _buffer in unpredictable order. There are a few other occurrences of > > this too. > > > > I find it somewhat disturbing that what seems a fairly innocent > > function that doesn't *appear* to have global state is nevertheless > > not thread-safe. Would it be wise to fix this, e.g. by allocating a > > fresh output buffer inside uuid1() and other callers? > > > > > I didn't find any reply to this, which is odd, so forgive me if it's > old news. > > I agree with you that it's not thread safe and that a local buffer in > the stack should fix it. > > Just for reference, the thread-safe uuid extension we've been using > since python 2.1, which I don't recall where we borrow it from, uses a > local buffer in the stack. It looks like this: > > -----begin uuid.c-------------- > > static char uuid__doc__ [] = > "DCE compatible Universally Unique Identifier module"; > > #include "Python.h" > #include > > static char uuidgen__doc__ [] = > "Create a new DCE compatible UUID value"; > > static PyObject * > uuidgen(void) > { > uuid_t out; > char buf[48]; > > uuid_generate(out); > uuid_unparse(out, buf); > return PyString_FromString(buf); > } > > static PyMethodDef uuid_methods[] = { > {"uuidgen", uuidgen, 0, uuidgen__doc__}, > {NULL, NULL} /* Sentinel */ > }; > > DL_EXPORT(void) > inituuid(void) > { > Py_InitModule4("uuid", > uuid_methods, > uuid__doc__, > (PyObject *)NULL, > PYTHON_API_VERSION); > } > > -----end uuid.c-------------- > > > It also seems that using uuid_generate()/uuid_unparse() should be > faster than using uuid_generate_random() and then creating a python > object to call its __str__ method. If so, it would be nice if the > uuid.py module also provided equivalent fast versions that returned > strings instead of objects. > > > -- > Luis P Caamano > Atlanta, GA, USA > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 24 07:08:20 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Aug 2007 22:08:20 -0700 Subject: [Python-3000] Move to a "py3k" branch *DONE* In-Reply-To: <43aa6ff70708231147h673bf14evbb87ef262094a7af@mail.gmail.com> References: <20070823004155.GB12780@electricrain.com> <43aa6ff70708231147h673bf14evbb87ef262094a7af@mail.gmail.com> Message-ID: That looks like a simple logic bug in the routine where the assert is failing (should return 1 when detecting None). I'll check in a fix momentarily. --Guido On 8/23/07, Collin Winter wrote: > On 8/22/07, Gregory P. Smith wrote: > > > > There are currently about 7 failing unit tests left: > > > > > > > > test_bsddb > > > > test_bsddb3 > > ... > > > > fyi these two pass for me on the current py3k branch on ubuntu linux > > and mac os x 10.4.9. > > test_bsddb works for me on Ubuntu, and test_bsddb3 was working for me > yesterday, but now fails with > > python: /home/collinwinter/src/python/py3k/Modules/_bsddb.c:388: > make_dbt: Assertion `((((PyObject*)(obj))->ob_type) == (&PyBytes_Type) > || PyType_IsSubtype((((PyObject*)(obj))->ob_type), (&PyBytes_Type)))' > failed. > > The failure occurs after this line is emitted > > test02_cursors (bsddb.test.test_dbshelve.EnvThreadHashShelveTestCase) ... ok > > Collin Winter > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Fri Aug 24 07:09:28 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 24 Aug 2007 07:09:28 +0200 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> <46CE5346.10301@canterbury.ac.nz> Message-ID: <46CE6808.1070007@v.loewis.de> > It *does* implement that! So there's the solution: these APIs should > not insist on bytes but use the buffer API. It's quite a bit of work I > suspect (especially since you can't use PyArg_ParseTuple with y# any > more) but worth it. I think there could be another code for PyArg_ParseTuple (or the meaning of y# be changed): that code would not only return char* and Py_ssize_t, but also a PyObject* and fill a PyBuffer b to be passed to PyObject_ReleaseBuffer(o, &b). > BTW PyUnicode should *not* support the buffer API. Why not? It should set readonly to 1, and format to "u" or "w". Regards, Martin From guido at python.org Fri Aug 24 07:19:49 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Aug 2007 22:19:49 -0700 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: <46CE6808.1070007@v.loewis.de> References: <46B7FACC.8030503@v.loewis.de> <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> <46CE5346.10301@canterbury.ac.nz> <46CE6808.1070007@v.loewis.de> Message-ID: On 8/23/07, "Martin v. L?wis" wrote: > > It *does* implement that! So there's the solution: these APIs should > > not insist on bytes but use the buffer API. It's quite a bit of work I > > suspect (especially since you can't use PyArg_ParseTuple with y# any > > more) but worth it. > > I think there could be another code for PyArg_ParseTuple (or the meaning > of y# be changed): that code would not only return char* and Py_ssize_t, > but also a PyObject* and fill a PyBuffer b to be passed to > PyObject_ReleaseBuffer(o, &b). That hardly saves any work compared to O though. > > BTW PyUnicode should *not* support the buffer API. > > Why not? It should set readonly to 1, and format to "u" or "w". Because the read() method of binary files (and similar places, like socket.send() and in the future probably various database objects) accept anything that supports the buffer API, but writing a (text) string to these is almost certainly a bug. Not supporting the buffer API in PyUnicode is IMO preferable to making explicit exceptions for PyUnicode in all those places. I don't think that the savings possible when writing to a text file using the UTF-16 or -32 encoding (whichever matches Py_UNICODE_SIZE) in the native byte order are worth leaving that bug unchecked. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at pythoncraft.com Fri Aug 24 15:02:52 2007 From: aahz at pythoncraft.com (Aahz) Date: Fri, 24 Aug 2007 06:02:52 -0700 Subject: [Python-3000] Is __cmp__ deprecated? In-Reply-To: <46CE5239.1000506@canterbury.ac.nz> References: <79990c6b0708230836i78fdd6e9w23f75eb7be639371@mail.gmail.com> <46CE5239.1000506@canterbury.ac.nz> Message-ID: <20070824130252.GB18456@panix.com> On Fri, Aug 24, 2007, Greg Ewing wrote: > Guido van Rossum wrote: >> >> cmp() is still alive and __cmp__ is used to overload it; on the one >> hand I'd like to get rid of it but OTOH it's occasionally useful. > > Maybe you could keep cmp() but implement it in terms of, say, __lt__ > and __eq__? No! The whole point of cmp() is to be able to make *one* call; this is especially important for things like Decimal and NumPy. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you don't know what your program is supposed to do, you'd better not start writing it." --Dijkstra From thomas at python.org Fri Aug 24 16:33:47 2007 From: thomas at python.org (Thomas Wouters) Date: Fri, 24 Aug 2007 16:33:47 +0200 Subject: [Python-3000] Removing simple slicing Message-ID: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com> I did some work at last year's Google sprint on removing the simple slicing API (__getslice__, tp_as_sequence->sq_slice) in favour of the more flexible sliceobject API (__getitem__ and tp_as_mapping->mp_subscript using slice objects as index.) For some more detail, see the semi-PEP below. (I hesitate to call it a PEP because it's way past the Py3k PEP deadline, but the email I was originally going to send on this subject grew in such a size that I figured I might as well use PEP layout and use the opportunity to record some best practices and behaviour. And the change should probably be recorded in a PEP anyway, even though it has never been formally proposed, just taken as a given.) If anyone is bored and/or interested in doing some complicated work, there is still a bit of (optional) work to be done in this area: I uploaded patches to be applied to the trunk SF 8 months ago -- extended slicing support for a bunch of types. Some of that extended slicing support is limited to step-1 slices, though, most notably UserString.MutableString and ctypes. I can guarantee adding non-step-1 support to them is a challenging and fulfilling exercise, having done it for several types, but I can't muster the intellectual stamina to do it for these (to me) fringe types. The patches can be found in Roundup: http://bugs.python.org/issue?%40search_text=&title=&%40columns=title&id=&%40columns=id&creation=&creator=twouters&activity=&%40columns=activity&%40sort=activity&actor=&type=&components=&versions=&severity=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&%40pagesize=50&%40startwith=0&%40action=search(there doesn't seem to be a shorter URL; just search for issues created by 'twouters' instead.) If nobody cares, I will be checking these patches into the trunk this weekend (after updating them), and then update and check in the rest of the p3yk-noslice branch into the py3k branch. Abstract ======== This proposal discusses getting rid of the two types of slicing Python uses, ``simple`` and ``extended``. Extended slicing was added later, and uses a different API at both the C and the Python level for backward compatibility. Extended slicing can express everything simple slicing can express, however, making the simple slicing API practically redundant. A Tale of Two APIs ================== Simple slicing is a slice operation without a step, Ellipsis or tuple of slices -- the archetypical slice of just `start` and/or `stop`, with a single colon separating them and both sides being optional:: L[1:3] L[2:] L[:-5] L[:] An extended slice is any slice that isn't simple:: L[1:5:2] L[1:3, 8:10] L[1, ..., 5:-2] L[1:3:] (Note that the presence of an extra colon in the last example makes the very first simple slice an extended slice, but otherwise expresses the exact same slicing operation.) In applying a simple slice, Python does the work of translating omitted, out of bounds or negative indices into the appropriate actual indices, based on the length of the sequence. The normalized ``start`` and ``stop`` indices are then passed to the appropriate method: ``__getslice__``, ``__setslice__`` or ``__delslice__`` for Python classes, ``tp_as_sequence``'s ``sq_slice`` or ``sq_ass_slice`` for C types. For extended slicing, no special handling of slice indices is done. The indices in ``start:stop:step`` are wrapped in a ``slice`` object, with missing indices represented as None. The indices are otherwise taken as-is. The sequence object is then indexed with the slice object as if it were a mapping: ``__getitem__``,`` __setitem__`` or ``__delitem__`` for Python classes, ``tp_as_mapping``'s ``mp_subscript`` or ``mp_ass_subscript``. It is entirely up to the sequence to interpret the meaning of missing, out of bounds or negative indices, let alone non-numerical indices like tuples or Ellipsis or arbitrary objects. Since at least Python 2.1, applying a simple slice to an object that does not implement the simple slicing API will fall back to using extended slicing, calling __getitem__ (or mp_subscript) instead of __getslice__ (or sq_slice), and similarly for slice assignment/deletion. Problems ======== Aside from the obvious disadvantage of having two ways to do the same thing, simple slicing is an inconvenient wart for several reasons: 1) It (passively) promotes supporting only simple slicing, as observed by the builtin types only supporting extended slicing many years after extended slicing was introduced. 2) The Python VM dedicates 12 of its opcodes, about 11%, to support simple slicing, and effectively reserves another 13 for code convenience. Reducing the Big Switch in the bytecode interpreter would certainly not hurt Python performance. 5) The same goes for the number of functions, macros and function-pointers supporting simple slicing, although the impact would be maintainability and readability of the source rather than performance. Proposed Solution ================= The proposed solution, as implemented in the p3yk-noslice SVN branch, gets rid of the simple slicing methods and PyType entries. The simple C API (using ``Py_ssize_t`` for start and stop) remains, but creates a slice object as necessary instead. Various types had to be updated to support slice objects, or improve the simple slicing case of extended slicing. The result is that ``__getslice__``, ``__setslice__`` and ``__delslice__`` are no longer called in any situation. Classes that delegate ``__getitem__`` (or the C equivalent) to a sequence type get any slicing behaviour of that type for free. Classes that implement their own slicing will have to be modified to accept slice objects and process the indices themselves. This means that at the C level, like is already the case at the Python level, the same method is used for mapping-like access as for slicing. C types will still want to implement ``tp_as_sequence->sq_item``, but that function will only be called when using the ``PySequence_*Item()`` API. Those API functions do not (yet) fall back to using ``tp_as_mapping->mp_subscript``, although they possibly should. A casualty of this change is ``PyMapping_Check()``. It used to check for ``tp_as_mapping`` being available, and was modified to check for ``tp_as_mapping`` but *not* ``tp_as_sequence->sq_slice`` when extended slicing was added to the builtin types. It could conceivably check for ``tp_as_sequence->sq_item`` instead of ``sq_slice``, but the added value is unclear (especially considering ABCs.) In the standard library and CPython itself, ``PyMapping_Check()`` is used mostly to provide early errors, for instance by checking the arguments to ``exec()``. Alternate Solution ------------------ A possible alternative to removing simple slicing completely, would be to introduce a new typestruct hook, with the same signature as ``tp_as_mapping->mp_subscript``, which would be called for slicing operations. All as-mapping index operations would have to fall back to this new ``sq_extended_slice`` hook, in order for ``seq[slice(...)]`` to work as expected. For some added efficiency and error-checking, expressions using actual slice syntax could compile into bytecodes specific for slicing (of which there would only be three, instead of twelve.) This approach would simplify C types wanting to support extended slicing but not arbitrary-object indexing (and vice-versa) somewhat, but the benefit seems too small to warrant the added complexity in the CPython runtime itself. Implementing Extended Slicing ============================= Supporting extended slicing in C types is not as easily done as supporting simple slicing. There are a number of edgecases in interpreting the odder combinations of ``start``, ``stop`` and ``step``. This section tries to give some explanations and best practices. Extended Slicing in C --------------------- Because the mapping API takes precedence over the sequence API, any ``tp_as_mapping->mp_subscript`` and ``tp_as_mapping->mp_ass_subscript`` functions need to proper typechecks on their argument. In Python 2.5 and later, this is best done using ``PyIndex_Check()`` and ``PySlice_Check()`` (and possibly ``PyTuple_Check()`` and comparison against ``Py_Ellipsis``.) For compatibility with Python 2.4 and earlier, ``PyIndex_Check()`` would have to be replaced with ``PyInt_Check()`` and ``PyLong_Check()``. Indices that pass ``PyIndex_Check()`` should be converted to a ``Py_ssize_t`` using ``PyIndex_AsSsizeT()`` and delegated to ``tp_as_sequence->sq_item``. (For compatibility with Python 2.4, use ``PyNumber_AsLong()`` and downcast to an ``int`` instead.) The exact meaning of tuples of slices, and of Ellipsis, is up to the type, as no standard-library types support it. It may be useful to use the same convention as the Numpy package. Slices inside tuples, if supported, should probably follow the same rules as direct slices. >From slice objects, correct indices can be extracted with ``PySlice_GetIndicesEx()``. Negative and out-of-bounds indices will be adjusted based on the provided length, but a negative ``step``, and a ``stop`` before a ``step`` are kept as-is. This means that, for a getslice operation, a simple for-loop can be used to visit the correct items in the correct order:: for (cur = start, i = 0; i < slicelength; cur += step, i++) dest[i] = src[cur]; If ``PySlice_GetIndicesEx()`` is not appropriate, the individual indices can be extracted from the ``PySlice`` object. If the indices are to be converted to C types, that should be done using ``PyIndex_Check()``, ``PyIndex_AsSsizeT()`` and the ``Py_ssize_t`` type, except that ``None`` should be accepted as the default value for the index. For deleting slices (``mp_ass_subscript`` called with ``NULL`` as value) where the order does not matter, a reverse slice can be turned into the equivalent forward slice with:: if (step < 0) { stop = start + 1; start = stop + step*(slicelength - 1) - 1; step = -step; } For slice assignment with a ``step`` other than 1, it's usually necessary to require the source iterable to have the same length as the slice. When assigning to a slice of length 0, care needs to be taken to select the right insertion point. For a slice S[5:2], the correct insertion point is before index 5, not before index 2. For both deleting slice and slice assignment, it is important to remember arbitrary Python code may be executed when calling Py_DECREF() or otherwise interacting with arbitrary objects. Because of that, it's important your datatype stays consistent throughout the operation. Either operate on a copy of your datatype, or delay (for instance) Py_DECREF() calls until the datatype is updated. The latter is usually done by keeping a scratchpad of to-be-DECREF'ed items. Extended slicing in Python -------------------------- The simplest way to support extended slicing in Python is by delegating to an underlying type that already supports extended slicing. The class can simply index the underlying type with the slice object (or tuple) it was indexed with. Barring that, the Python code will have to pretty much apply the same logic as the C type. ``PyIndex_AsSsizeT()`` is available as ``operator.index()``, with a ``try/except`` block replacing ``PyIndex_Check()``. ``isinstance(o, slice)`` and ``sliceobj.indices()`` replace ``PySlice_Check()`` and ``PySlice_GetIndices()``, but the slicelength (which is provided by ``PySlice_GetIndicesEx()``) has to be calculated manually. Testing extended slicing ------------------------ Proper tests of extended slicing capabilities should at least include the following (if the operations are supported), assuming a sequence of length 10. Triple-colon notation is used everywhere so it uses extended slicing even in Python 2.5 and earlier:: S[2:5:] (same as S[2:5]) S[5:2:] (same as S[5:2], an empty slice) S[::] (same as S[:], a copy of the sequence) S[:2:] (same as S[:2]) S[:11:] (same as S[:11], a copy of the sequence) S[5::] (same as S[5:]) S[-11::] (same as S[-11:], a copy of the sequence) S[-5:2:1] (same as S[:2]) S[-5:-2:2] (same as S[-5:-2], an empty slice) S[5:2:-1] (the reverse of S[2:4]) S[-2:-5:-1] (the reverse of S[-4:-1]) S[:5:2] ([ S[0], S[2], S[4] ])) S[9::2] ([ S[9] ]) S[8::2] ([ S[8] ]) S[7::2] ([ S[7], S[9]]) S[1::-1] ([ S[1], S[0] ]) S[1:0:-1] ([ S[1] ], does not include S[0]!) S[1:-1:-1] (an empty slice) S[::10] ([ S[0] ]) S[::-10] ([ S[9] ]) S[2:5:] = [1, 2, 3] ([ S[2], S[3], S[4] ] become [1, 2, 3]) S[2:5:] = [1] (S[2] becomes 1, S[3] and S[4] are deleted) S[5:2:] = [1, 2, 3] ([1, 2, 3] inserted before S[5]) S[2:5:2] = [1, 2] ([ S[2], S[4] ] become [1, 2]) S[5:2:-2] = [1, 2] ([ S[3], S[5] ] become [2, 1]) S[3::3] = [1, 2, 3] ([ S[3], S[6], S[9] ] become [1, 2, 3]) S[:-5:-2] = [1, 2] ([ S[7], S[9] ] become [2, 1]) S[::-1] = S (reverse S in-place awkwardly) S[:5:] = S (replaces S[:5] with a copy of S) S[2:5:2] = [1, 2, 3] (error: assigning length-3 to slicelength-2) S[2:5:2] = None (error: need iterable) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070824/779e59cb/attachment-0001.htm From collinw at gmail.com Fri Aug 24 18:46:29 2007 From: collinw at gmail.com (Collin Winter) Date: Fri, 24 Aug 2007 09:46:29 -0700 Subject: [Python-3000] Should 2to3 point out *possible*, but not definite changes? In-Reply-To: References: Message-ID: <43aa6ff70708240946s4fb506a6o53bf1de54b4c96d6@mail.gmail.com> On 8/23/07, Guido van Rossum wrote: > Yes, I think this would be way cool! I believe there are already a few > fixers that print messages about things they know are wrong but don't > know how to fix, those could also be integrated (although arguably > you'd want those messages to be treated as more severe). Adrian and I talked about this this morning, and he said he's going to go ahead with an implementation. The original warning messages were a good idea, but they tend to get lost when converting large projects. Collin Winter > On 8/23/07, Adrian Holovaty wrote: > > As part of the Python 3000 sprint (at Google's Chicago office), I've > > been working on the documentation for 2to3. I'm publishing updates at > > http://red-bean.com/~adrian/2to3.rst and will submit this as a > > documentation patch when it's completed. (I didn't get as much done > > today as I would have liked, but I'll be back at it Friday.) > > > > In my research of the 2to3 utility, I've been thinking about whether > > it should be expanded to include the equivalent of "warnings." I know > > one of its design goals has been to be "dumb but correct," but I > > propose that including optional warnings would be a bit > > smarter/helpful, without risking the tool's correctness. > > > > Specifically, I propose: > > > > * 2to3 gains either an "--include-warnings" option or an > > "--exclude-warnings" option, depending on which behavior is decided to > > be default. > > > > * If this option is set, the utility would search for an *additional* > > set of fixes -- fixes that *might* need to be made to the code but > > cannot be determined with certainty. An example of this is noted in > > the "Limitations" section of the 2to3 README: > > > > a = apply > > a(f, *args) > > > > (2to3 cannot handle this because it cannot detect reassignment.) > > > > Under my proposal, the utility would notice that "apply" is a builtin > > whose behavior is changing, and that this is a situation in which the > > correct 2to3 porting is ambiguous. The utility would designate this in > > the output with a Python comment on the previous line: > > > > # 2to3note: The semantics of apply() have changed. > > a = apply > > a(f, *args) > > > > Each comment would have a common prefix such as "2to3note" for easy grepping. > > > > Given the enormity of the Python 3000 syntax change, I think that the > > 2to3 utility should provide as much guidance as possible. What it does > > currently is extremely cool (I daresay miraculous), but I think we can > > get closer to 100% coverage if we take into account the ambiguous > > changes. > > > > Oh, and I'm happy to (attempt to) write this addition to the tool, as > > long as the powers at be deem it worthwhile. > > > > Thoughts? > > > > Adrian > > > > -- > > Adrian Holovaty > > holovaty.com | djangoproject.com > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/collinw%40gmail.com > From fumanchu at aminus.org Fri Aug 24 18:35:10 2007 From: fumanchu at aminus.org (Robert Brewer) Date: Fri, 24 Aug 2007 09:35:10 -0700 Subject: [Python-3000] Removing simple slicing References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com> Message-ID: <9BBC2D2B2CCF7E4DA0E212D1BD0CC6FA1FEECA@ex10.hostedexchange.local> Thomas Wouters wrote: > 1) It (passively) promotes supporting only simple slicing, > as observed by the builtin types only supporting extended > slicing many years after extended slicing was introduced Should that read "...only supporting simple slicing..."? > The proposed solution, as implemented in the p3yk-noslice > SVN branch, gets rid of the simple slicing methods and > PyType entries. The simple C API (using ``Py_ssize_t`` > for start and stop) remains, but creates a slice object > as necessary instead. Various types had to be updated to > support slice objects, or improve the simple slicing case > of extended slicing. Am I reading this correctly, that: since the "simple C API remains", one can still write S[3:8] with only one colon and have it work as before? Or would it have to be rewritten to include two colons? Robert Brewer fumanchu at aminus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070824/1aa6ddd6/attachment.htm From guido at python.org Fri Aug 24 18:51:02 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Aug 2007 09:51:02 -0700 Subject: [Python-3000] Removing simple slicing In-Reply-To: <9BBC2D2B2CCF7E4DA0E212D1BD0CC6FA1FEECA@ex10.hostedexchange.local> References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com> <9BBC2D2B2CCF7E4DA0E212D1BD0CC6FA1FEECA@ex10.hostedexchange.local> Message-ID: On 8/24/07, Robert Brewer wrote: > Thomas Wouters wrote: > > The proposed solution, as implemented in the p3yk-noslice > > SVN branch, gets rid of the simple slicing methods and > > PyType entries. The simple C API (using ``Py_ssize_t`` > > for start and stop) remains, but creates a slice object > > as necessary instead. Various types had to be updated to > > support slice objects, or improve the simple slicing case > > of extended slicing. > > Am I reading this correctly, that: since the "simple C API > remains", one can still write S[3:8] with only one colon > and have it work as before? Or would it have to be rewritten > to include two colons? Don't worry, this syntax won't go away; it will be executed differently. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at electricrain.com Fri Aug 24 18:58:24 2007 From: greg at electricrain.com (Gregory P. Smith) Date: Fri, 24 Aug 2007 09:58:24 -0700 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: References: <46B7EA06.5040106@v.loewis.de> <46B7FACC.8030503@v.loewis.de> <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> <46CE5346.10301@canterbury.ac.nz> Message-ID: <20070824165823.GM24059@electricrain.com> On Thu, Aug 23, 2007 at 09:17:04PM -0700, Guido van Rossum wrote: > On 8/23/07, Greg Ewing wrote: > > Gregory P. Smith wrote: > > > Wasn't a past mailing list thread claiming the bytes type was supposed > > > to be great for IO? How's that possible unless we add a lock to the > > > bytesobject? > > > > Doesn't the new buffer protocol provide something for > > getting a locked view of the data? If so, it seems like > > bytes should implement that. > > It *does* implement that! So there's the solution: these APIs should > not insist on bytes but use the buffer API. It's quite a bit of work I > suspect (especially since you can't use PyArg_ParseTuple with y# any > more) but worth it. > > BTW PyUnicode should *not* support the buffer API. > > I'll add both of these to the task spreadsheet. this sounds good, i'll work on it today for bsddb and hashlib. -greg From guido at python.org Fri Aug 24 19:02:26 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Aug 2007 10:02:26 -0700 Subject: [Python-3000] Removing simple slicing In-Reply-To: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com> References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com> Message-ID: On 8/24/07, Thomas Wouters wrote: > If nobody cares, I will be checking these patches into the trunk this > weekend (after updating them), and then update and check in the rest of the > p3yk-noslice branch into the py3k branch. In the trunk? I'm concerned that that might make it (ever so slightly) incompatible with 2.5, and we're trying to make it as easy as possible to migrate to 2.6. Or perhaps you're just proposing to change the standard builtin types to always use the extended API, without removing the possibility of user types (either in C or in Python) using the simple API, at least in 2.6? I think in 2.6, if a class defines __{get,set,del}slice__, that should still be called when simple slice syntax is used in preference of __{get,set,del}item__. I'm less sure that this is relevant for the C API; perhaps someone more familiar with numpy could comment. In 3.0 this should all be gone of course. Apart from that, I'm looking forward to getting this over with, and checked in to both 2.6 and 3.0! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 24 19:26:16 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Aug 2007 10:26:16 -0700 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: References: <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> <46CE5346.10301@canterbury.ac.nz> <46CE6808.1070007@v.loewis.de> Message-ID: On 8/23/07, Guido van Rossum wrote: > > > BTW PyUnicode should *not* support the buffer API. > On 8/23/07, "Martin v. L?wis" wrote: > > Why not? It should set readonly to 1, and format to "u" or "w". [me again] > Because the read() method of binary files (and similar places, like > socket.send() and in the future probably various database objects) > accept anything that supports the buffer API, but writing a (text) > string to these is almost certainly a bug. Not supporting the buffer > API in PyUnicode is IMO preferable to making explicit exceptions for > PyUnicode in all those places. > > I don't think that the savings possible when writing to a text file > using the UTF-16 or -32 encoding (whichever matches Py_UNICODE_SIZE) > in the native byte order are worth leaving that bug unchecked. I looked at the code, and it's even more complicated than that. The new buffer API continues to make a distinction between binary and character data, and there's collusion between the bytes and unicode types so that this works: b = b"abc" b[1:2] = "X" even though these things all fail: b.extend("XYZ") b += "ZYX" Unfortunately taking the buffer API away from unicode makes things fail early (before sys.std{in,out,err} are set), so apparently the I/O library or something else somehow depends on this. I'll investigate. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at python.org Fri Aug 24 19:39:54 2007 From: thomas at python.org (Thomas Wouters) Date: Fri, 24 Aug 2007 19:39:54 +0200 Subject: [Python-3000] Removing simple slicing In-Reply-To: References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com> Message-ID: <9e804ac0708241039x502989f5tfe01d4f1318b6082@mail.gmail.com> On 8/24/07, Guido van Rossum wrote: > > On 8/24/07, Thomas Wouters wrote: > > If nobody cares, I will be checking these patches into the trunk this > > weekend (after updating them), and then update and check in the rest of > the > > p3yk-noslice branch into the py3k branch. > > In the trunk? I'm concerned that that might make it (ever so slightly) > incompatible with 2.5, and we're trying to make it as easy as possible > to migrate to 2.6. Or perhaps you're just proposing to change the > standard builtin types to always use the extended API, without > removing the possibility of user types (either in C or in Python) > using the simple API, at least in 2.6? The changes I uploaded only implement (and in some cases, fix some bugs in) extended slicing support in various builtin types. None of the API changes would be backported (although 2.6 in py3k-warning-mode should obviously tell people to not define __getslice__, and instead accept slice objects in __getitem__. Perhaps even when not in py3k-warnings-mode.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070824/df1d8e14/attachment.htm From guido at python.org Fri Aug 24 19:46:46 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Aug 2007 10:46:46 -0700 Subject: [Python-3000] Removing simple slicing In-Reply-To: <9e804ac0708241039x502989f5tfe01d4f1318b6082@mail.gmail.com> References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com> <9e804ac0708241039x502989f5tfe01d4f1318b6082@mail.gmail.com> Message-ID: Oh, good! Forget what I said about 2.6 then. :-) On 8/24/07, Thomas Wouters wrote: > > > On 8/24/07, Guido van Rossum wrote: > > On 8/24/07, Thomas Wouters wrote: > > > If nobody cares, I will be checking these patches into the trunk this > > > weekend (after updating them), and then update and check in the rest of > the > > > p3yk-noslice branch into the py3k branch. > > > > In the trunk? I'm concerned that that might make it (ever so slightly) > > incompatible with 2.5, and we're trying to make it as easy as possible > > to migrate to 2.6. Or perhaps you're just proposing to change the > > standard builtin types to always use the extended API, without > > removing the possibility of user types (either in C or in Python) > > using the simple API, at least in 2.6? > > The changes I uploaded only implement (and in some cases, fix some bugs in) > extended slicing support in various builtin types. None of the API changes > would be backported (although 2.6 in py3k-warning-mode should obviously tell > people to not define __getslice__, and instead accept slice objects in > __getitem__. Perhaps even when not in py3k-warnings-mode.) > > -- > Thomas Wouters > > Hi! I'm a .signature virus! copy me into your .signature file to help me > spread! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at python.org Fri Aug 24 19:47:42 2007 From: thomas at python.org (Thomas Wouters) Date: Fri, 24 Aug 2007 19:47:42 +0200 Subject: [Python-3000] Removing simple slicing In-Reply-To: <9BBC2D2B2CCF7E4DA0E212D1BD0CC6FA1FEECA@ex10.hostedexchange.local> References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com> <9BBC2D2B2CCF7E4DA0E212D1BD0CC6FA1FEECA@ex10.hostedexchange.local> Message-ID: <9e804ac0708241047s970fe75m9d2310686b5db096@mail.gmail.com> On 8/24/07, Robert Brewer wrote: > > Thomas Wouters wrote: > > > 1) It (passively) promotes supporting only simple slicing, > > as observed by the builtin types only supporting extended > > slicing many years after extended slicing was introduced > > Should that read "...only supporting simple slicing..."? > Yes :) > The proposed solution, as implemented in the p3yk-noslice > > SVN branch, gets rid of the simple slicing methods and > > PyType entries. The simple C API (using ``Py_ssize_t`` > > for start and stop) remains, but creates a slice object > > as necessary instead. Various types had to be updated to > > support slice objects, or improve the simple slicing case > > of extended slicing. > > Am I reading this correctly, that: since the "simple C API > remains", one can still write S[3:8] with only one colon > and have it work as before? Or would it have to be rewritten > to include two colons? > No. We're just talking about the underlying object API. The methods on objects that get called. The changes just mean that S[3:8] will behave exactly like S[3:8:]. Currently, the former calls __getslice__ or __getitem__ (if __getslice__ does not exist), the latter always calls __getitem__. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070824/5da6a97e/attachment.htm From skip at pobox.com Fri Aug 24 23:00:41 2007 From: skip at pobox.com (skip at pobox.com) Date: Fri, 24 Aug 2007 16:00:41 -0500 Subject: [Python-3000] Removal of PyArg_Parse() Message-ID: <18127.18169.547364.692145@montanaro.dyndns.org> I started in looking at removing PyArg_Parse. The first module I tackled was the time module. That was harder than I thought it would be (PyArg_Parse is only called from one place), in large part I think because it can take a number of different types of arguments. Is there some recommended way of getting rid of it? I think I can simply replace it with PyArg_ParseTuple if the format string is enclosed in parens, but is there a reasonably mechanical approach if the format string doesn't state that the argument must be a tuple? Thx, Skip From skip at pobox.com Fri Aug 24 23:20:50 2007 From: skip at pobox.com (skip at pobox.com) Date: Fri, 24 Aug 2007 16:20:50 -0500 Subject: [Python-3000] Removal of PyArg_Parse() In-Reply-To: <18127.18169.547364.692145@montanaro.dyndns.org> References: <18127.18169.547364.692145@montanaro.dyndns.org> Message-ID: <18127.19378.582174.753256@montanaro.dyndns.org> skip> I started in looking at removing PyArg_Parse. Before I go any farther, perhaps I should ask: Is PyArg_Parse going away or just its use as the argument parser for METH_OLDARGS functions? Skip From mierle at gmail.com Fri Aug 24 23:22:30 2007 From: mierle at gmail.com (Keir Mierle) Date: Fri, 24 Aug 2007 14:22:30 -0700 Subject: [Python-3000] Fwd: [issue1015] [PATCH] Updated patch for rich dict view (dict().keys()) comparisons In-Reply-To: <1187990408.37.0.755658114648.issue1015@psf.upfronthosting.co.za> References: <1187990408.37.0.755658114648.issue1015@psf.upfronthosting.co.za> Message-ID: I'm sending this to the py3k list to make sure the old patch is not used. Keir ---------- Forwarded message ---------- From: Keir Mierle Date: Aug 24, 2007 2:20 PM Subject: [issue1015] [PATCH] Updated patch for rich dict view (dict().keys()) comparisons To: mierle at gmail.com New submission from Keir Mierle: This an updated version of the patch I submitted earlier to python-3000; it is almost identical except it extends the test case to cover more of the code. ---------- components: Interpreter Core files: dictview_richcompare_ver2.diff messages: 55275 nosy: keir severity: normal status: open title: [PATCH] Updated patch for rich dict view (dict().keys()) comparisons versions: Python 3.0 __________________________________ Tracker __________________________________ -------------- next part -------------- A non-text attachment was scrubbed... Name: dictview_richcompare_ver2.diff Type: text/x-patch Size: 4662 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070824/06740548/attachment.bin From guido at python.org Fri Aug 24 23:48:04 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Aug 2007 14:48:04 -0700 Subject: [Python-3000] Removal of PyArg_Parse() In-Reply-To: <18127.19378.582174.753256@montanaro.dyndns.org> References: <18127.18169.547364.692145@montanaro.dyndns.org> <18127.19378.582174.753256@montanaro.dyndns.org> Message-ID: I think that's a question for Martin von Loewis. Are there any existing uses (in the core) that are hard to replace with PyArg_ParseTuple()? On 8/24/07, skip at pobox.com wrote: > > skip> I started in looking at removing PyArg_Parse. > > Before I go any farther, perhaps I should ask: Is PyArg_Parse going away or > just its use as the argument parser for METH_OLDARGS functions? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Sat Aug 25 00:32:01 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Fri, 24 Aug 2007 15:32:01 -0700 Subject: [Python-3000] what to do with profilers in the stdlib Message-ID: We ought to clean up the profiling modules. There was a long discussion about this here: http://mail.python.org/pipermail/python-dev/2005-November/058212.html Much of the discussion revolved around whether to add lsprof in the stdlib. That's been resolved. It was added. Now what do we do? I suggest merging profile and cProfile (which uses _lsprof) similar to how stringio and pickle are being merged. This leaves hotshot as odd man out. We should remove it. If we don't remove it, we should try to merge these modules so they have the same API and capabilities as much as possible, even if they work in different ways. The hotshot doc states: Note The hotshot module focuses on minimizing the overhead while profiling, at the expense of long data post-processing times. For common usages it is recommended to use cProfile instead. hotshot is not maintained and might be removed from the standard library in the future. Caveat The hotshot profiler does not yet work well with threads. It is useful to use an unthreaded script to run the profiler over the code you're interested in measuring if at all possible. n From guido at python.org Sat Aug 25 01:05:30 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Aug 2007 16:05:30 -0700 Subject: [Python-3000] what to do with profilers in the stdlib In-Reply-To: References: Message-ID: I'm still a happy user of profile.py, so I'm probably not the right one to drive this discussion. :-) On 8/24/07, Neal Norwitz wrote: > We ought to clean up the profiling modules. There was a long > discussion about this here: > > http://mail.python.org/pipermail/python-dev/2005-November/058212.html > > Much of the discussion revolved around whether to add lsprof in the > stdlib. That's been resolved. It was added. Now what do we do? > > I suggest merging profile and cProfile (which uses _lsprof) similar to > how stringio and pickle are being merged. This leaves hotshot as odd > man out. We should remove it. If we don't remove it, we should try > to merge these modules so they have the same API and capabilities as > much as possible, even if they work in different ways. > > The hotshot doc states: > > Note > > The hotshot module focuses on minimizing the overhead while profiling, > at the expense of long data post-processing times. For common usages > it is recommended to use cProfile instead. hotshot is not maintained > and might be removed from the standard library in the future. > > Caveat > > The hotshot profiler does not yet work well with threads. It is useful > to use an unthreaded script to run the profiler over the code you're > interested in measuring if at all possible. > > n > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Sat Aug 25 02:14:32 2007 From: skip at pobox.com (skip at pobox.com) Date: Fri, 24 Aug 2007 19:14:32 -0500 Subject: [Python-3000] Removal of PyArg_Parse() In-Reply-To: References: <18127.18169.547364.692145@montanaro.dyndns.org> <18127.19378.582174.753256@montanaro.dyndns.org> Message-ID: <18127.29800.339928.420183@montanaro.dyndns.org> Guido> Are there any existing uses (in the core) that are hard to Guido> replace with PyArg_ParseTuple()? There are lots of uses where the arguments aren't tuples. I was particularly vexed by the time module because it was used to extract arguments both from tuples and from time.struct_time objects. I suspect most of the low-hanging fruit (PyArg_Parse used to parse tuples) has already been plucked. Skip From skip at pobox.com Sat Aug 25 02:16:53 2007 From: skip at pobox.com (skip at pobox.com) Date: Fri, 24 Aug 2007 19:16:53 -0500 Subject: [Python-3000] what to do with profilers in the stdlib In-Reply-To: References: Message-ID: <18127.29941.900987.118459@montanaro.dyndns.org> Neal> The hotshot doc states: Neal> Note Neal> The hotshot module focuses on minimizing the overhead while Neal> profiling, at the expense of long data post-processing times. For Neal> common usages it is recommended to use cProfile instead. hotshot Neal> is not maintained and might be removed from the standard library Neal> in the future. Neal> Caveat Neal> The hotshot profiler does not yet work well with threads. It is Neal> useful to use an unthreaded script to run the profiler over the Neal> code you're interested in measuring if at all possible. The cProfile module has the same benefit as hotshot (low run-time cost), without the downside of long post-processing times. On that basis alone I would argue that hotshot be dropped. Skip From alexandre at peadrop.com Sat Aug 25 02:45:53 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Fri, 24 Aug 2007 20:45:53 -0400 Subject: [Python-3000] what to do with profilers in the stdlib In-Reply-To: References: Message-ID: On 8/24/07, Neal Norwitz wrote: > I suggest merging profile and cProfile (which uses _lsprof) similar to > how stringio and pickle are being merged. cProfile and profile.py are on my merge to-do. I was supposed to merge cProfile/profile.py as part of my GSoC, but stringio and pickle have taken most of my time. So, I will merge the profile modules in my free time. > This leaves hotshot as odd man out. We should remove it. If we don't > remove it, we should try to merge these modules so they have the same API > and capabilities as much as possible, even if they work in different > ways. I don't think hotshot has any features that cProfile or profile don't (but I haven't checked throughly yet). So, I agree that it should be removed. -- Alexandre From nnorwitz at gmail.com Sat Aug 25 03:32:04 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Fri, 24 Aug 2007 18:32:04 -0700 Subject: [Python-3000] what to do with profilers in the stdlib In-Reply-To: References: Message-ID: On 8/24/07, Alexandre Vassalotti wrote: > On 8/24/07, Neal Norwitz wrote: > > I suggest merging profile and cProfile (which uses _lsprof) similar to > > how stringio and pickle are being merged. > > cProfile and profile.py are on my merge to-do. I was supposed to merge > cProfile/profile.py as part of my GSoC, but stringio and pickle have > taken most of my time. So, I will merge the profile modules in my free > time. Awesome! I was hoping you would volunteer. It looks like you've made a ton of progress on stringio and pickle so far. They are more important to get done. After they are completed, we can finish off the profile modules. n From greg.ewing at canterbury.ac.nz Sat Aug 25 03:31:07 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 25 Aug 2007 13:31:07 +1200 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: References: <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> <46CE5346.10301@canterbury.ac.nz> <46CE6808.1070007@v.loewis.de> Message-ID: <46CF865B.2050508@canterbury.ac.nz> Guido van Rossum wrote: > there's collusion between the bytes and unicode > types so that this works: > > b = b"abc" > b[1:2] = "X" Is this intentional? Doesn't it run counter to the idea that text and bytes should be clearly separated? > Unfortunately taking the buffer API away from unicode makes things > fail early If the buffer API distinguishes between text and binary buffers, then the binary streams can just accept binary buffers only, and unicode can keep its buffer API. -- Greg From guido at python.org Sat Aug 25 04:15:49 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Aug 2007 19:15:49 -0700 Subject: [Python-3000] Removal of PyArg_Parse() In-Reply-To: <18127.29800.339928.420183@montanaro.dyndns.org> References: <18127.18169.547364.692145@montanaro.dyndns.org> <18127.19378.582174.753256@montanaro.dyndns.org> <18127.29800.339928.420183@montanaro.dyndns.org> Message-ID: On 8/24/07, skip at pobox.com wrote: > > Guido> Are there any existing uses (in the core) that are hard to > Guido> replace with PyArg_ParseTuple()? > > There are lots of uses where the arguments aren't tuples. I was > particularly vexed by the time module because it was used to extract > arguments both from tuples and from time.struct_time objects. > > I suspect most of the low-hanging fruit (PyArg_Parse used to parse tuples) > has already been plucked. Then I don't think it's a priority to try to get rid of it, and maybe it should just stay. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Aug 25 04:19:12 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Aug 2007 19:19:12 -0700 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: <46CF865B.2050508@canterbury.ac.nz> References: <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> <46CE5346.10301@canterbury.ac.nz> <46CE6808.1070007@v.loewis.de> <46CF865B.2050508@canterbury.ac.nz> Message-ID: On 8/24/07, Greg Ewing wrote: > Guido van Rossum wrote: > > there's collusion between the bytes and unicode > > types so that this works: > > > > b = b"abc" > > b[1:2] = "X" > > Is this intentional? Doesn't it run counter to the idea > that text and bytes should be clearly separated? Sorry, I wasn't clear. I was describing the status quo, which I am as unhappy about as you are. > > Unfortunately taking the buffer API away from unicode makes things > > fail early > > If the buffer API distinguishes between text and binary > buffers, then the binary streams can just accept binary > buffers only, and unicode can keep its buffer API. Yes, but the bytes object is the one doing the work, and for some reason that I don't yet fathom it asks for character buffers. Probably because there are tons of places where str and bytes are still being mixed. :-( I tried to change the bytes constructor so that bytes(s) is invalid if isinstance(s, str), forcing one to use bytes(s, ). This caused many failures, some of which I could fix, others which seem to hinge on a fundamental problem (asserting that bytes objects support the string API). More work to do... :-( -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Sat Aug 25 04:23:02 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 25 Aug 2007 14:23:02 +1200 Subject: [Python-3000] Removal of PyArg_Parse() In-Reply-To: References: <18127.18169.547364.692145@montanaro.dyndns.org> <18127.19378.582174.753256@montanaro.dyndns.org> <18127.29800.339928.420183@montanaro.dyndns.org> Message-ID: <46CF9286.7040207@canterbury.ac.nz> Guido van Rossum wrote: > Then I don't think it's a priority to try to get rid of it, and maybe > it should just stay. Maybe it should be renamed to reflect the fact that it's now general-purpose and no longer used at all for argument parsing? Perhaps PyObject_Parse? -- Greg From eric+python-dev at trueblade.com Sat Aug 25 04:36:54 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Fri, 24 Aug 2007 22:36:54 -0400 Subject: [Python-3000] PEP 3101 implementation uploaded to the tracker. In-Reply-To: <46CE5710.2030907@trueblade.com> References: <46CE5710.2030907@trueblade.com> Message-ID: <46CF95C6.3020606@trueblade.com> Per Guido, I've checked a slightly different version of this patch in to the py3k branch as revision 57444. The primary difference is that I modified sysmodule.c and unicodeobject.c to start implementing the string.Formatter class. Should I mark the original patch as closed in the tracker? Eric Smith wrote: > There are a handful of remaining issues, but it works for the most part. > > http://bugs.python.org/issue1009 > > Thanks to Guido and Talin for all of their help the last few days, and > thanks to Patrick Maupin for help with the initial implementation. > > Known issues: > Better error handling, per the PEP. > > Need to write Formatter class. > > test_long is failing, but I don't think it's my doing. > > Need to fix this warning that I introduced when compiling > Python/formatter_unicode.c: > Objects/stringlib/unicodedefs.h:26: warning: `STRINGLIB_CMP' defined but > not used > > Need more tests for sign handling for int and float. > > It still supports "()" sign formatting from an earlier PEP version. > > Eric. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/eric%2Bpython-dev%40trueblade.com > From guido at python.org Sat Aug 25 05:08:15 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Aug 2007 20:08:15 -0700 Subject: [Python-3000] PEP 3101 implementation uploaded to the tracker. In-Reply-To: <46CF95C6.3020606@trueblade.com> References: <46CE5710.2030907@trueblade.com> <46CF95C6.3020606@trueblade.com> Message-ID: On 8/24/07, Eric Smith wrote: > Per Guido, I've checked a slightly different version of this patch in to > the py3k branch as revision 57444. The primary difference is that I > modified sysmodule.c and unicodeobject.c to start implementing the > string.Formatter class. Great! I'm looking forward to taking it for a spin. > Should I mark the original patch as closed in the tracker? Sure, it's served its purpose. --Guido > Eric Smith wrote: > > There are a handful of remaining issues, but it works for the most part. > > > > http://bugs.python.org/issue1009 > > > > Thanks to Guido and Talin for all of their help the last few days, and > > thanks to Patrick Maupin for help with the initial implementation. > > > > Known issues: > > Better error handling, per the PEP. > > > > Need to write Formatter class. > > > > test_long is failing, but I don't think it's my doing. > > > > Need to fix this warning that I introduced when compiling > > Python/formatter_unicode.c: > > Objects/stringlib/unicodedefs.h:26: warning: `STRINGLIB_CMP' defined but > > not used > > > > Need more tests for sign handling for int and float. > > > > It still supports "()" sign formatting from an earlier PEP version. > > > > Eric. > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/eric%2Bpython-dev%40trueblade.com > > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Sat Aug 25 05:30:48 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Fri, 24 Aug 2007 20:30:48 -0700 Subject: [Python-3000] Removal of PyArg_Parse() In-Reply-To: References: <18127.18169.547364.692145@montanaro.dyndns.org> <18127.19378.582174.753256@montanaro.dyndns.org> <18127.29800.339928.420183@montanaro.dyndns.org> Message-ID: On 8/24/07, Guido van Rossum wrote: > On 8/24/07, skip at pobox.com wrote: > > > > Guido> Are there any existing uses (in the core) that are hard to > > Guido> replace with PyArg_ParseTuple()? > > > > There are lots of uses where the arguments aren't tuples. I was > > particularly vexed by the time module because it was used to extract > > arguments both from tuples and from time.struct_time objects. There are 45 uses in */*.c spread across 9 modules: arraymodule.c, posixmodule.c, _hashopenssl.c (2), dbmmodule.c (4), gdbmmodule.c (2), mactoolboxglue.c (5), stringobject.c (2) with Python/mactoolboxglue.c looking like it's low hanging fruit, and stringobject.c will hopefully go away. Some of the others don't look bad. The bulk of the uses are in array and posixmodules. I'm not sure if those are easy to change. The remaining 65 uses are in Mac modules. I'm not sure if all of them are sticking around. (That's a separate discussion we should have--which of the mac modules should go.) > > I suspect most of the low-hanging fruit (PyArg_Parse used to parse tuples) > > has already been plucked. I think this is mostly true, but there are still some that are low-hanging. Maybe just kill the low hanging fruit for now. > Then I don't think it's a priority to try to get rid of it, and maybe > it should just stay. I agree it's not the biggest priority, but it would be nice if it was done. There's still over 500 uses of PyString which is higher priority, but also probably harder in many cases. n From nnorwitz at gmail.com Sat Aug 25 05:35:47 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Fri, 24 Aug 2007 20:35:47 -0700 Subject: [Python-3000] marshalling bytes objects Message-ID: I see in PEP 358 (bytes) http://www.python.org/dev/peps/pep-0358/ that marshalling bytes is an open issue and needs to be specified. I'm converting code objects to use bytes for the bytecode and lnotab. Is there anything special to be aware of here? It seems like it can be treated like an non-interned string. n From nnorwitz at gmail.com Sat Aug 25 05:49:10 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Fri, 24 Aug 2007 20:49:10 -0700 Subject: [Python-3000] marshalling bytes objects In-Reply-To: References: Message-ID: On 8/24/07, Neal Norwitz wrote: > I see in PEP 358 (bytes) http://www.python.org/dev/peps/pep-0358/ that > marshalling bytes is an open issue and needs to be specified. I'm > converting code objects to use bytes for the bytecode and lnotab. Is > there anything special to be aware of here? By "here" I was originally thinking about the marshaling aspect. But clearly the mutability of bytes isn't particularly good for code objects. :-) This goes back to the question of whether bytes should be able to be immutable (frozen). n From martin at v.loewis.de Sat Aug 25 06:02:32 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 25 Aug 2007 06:02:32 +0200 Subject: [Python-3000] Removal of PyArg_Parse() In-Reply-To: References: <18127.18169.547364.692145@montanaro.dyndns.org> <18127.19378.582174.753256@montanaro.dyndns.org> <18127.29800.339928.420183@montanaro.dyndns.org> Message-ID: <46CFA9D8.4020603@v.loewis.de> > Then I don't think it's a priority to try to get rid of it, and maybe > it should just stay. I think it would be desirable to get rid of METH_OLDARGS. Ideally, this should already be possible, as all modules should have been changed to be explicit about their usage of METH_OLDARGS (rather than relying on the struct field defaulting to 0), this can be "verified" by running the test suite once and checking that all ml_flags have one of METH_VARARGS, METH_NOARGS or METH_O set. Then it would be possible to drop METH_VARARGS, declaring that a 0 value of ml_flags means the default, which is "arguments are passed as a tuple". As for the remaining 50 or so PyArg_Parse calls: most of them convert a single object to some C representation; it should be possible to use the proper underlying conversion function. For example: - dbm/gdbm convert using s#; this can be replaced with the buffer API. - the array module converts the values on setitem using PyArg_Parse; these can be replaced with PyInt_AsLong, except that PyArg_Parse also does a range check, which could be moved into a range-checking function in arraymodule. As for the case of timemodule: the surprising feature is that "(ii)" uses PySequence_Getitem to access the fields, whereas PyArg_ParseTuple uses PyTuple_GET_ITEM, so it won't work for StructSequences. Regards, Martin From adrian at holovaty.com Sat Aug 25 07:58:57 2007 From: adrian at holovaty.com (Adrian Holovaty) Date: Sat, 25 Aug 2007 00:58:57 -0500 Subject: [Python-3000] [patch] Should 2to3 point out *possible*, but not definite changes? In-Reply-To: <43aa6ff70708240946s4fb506a6o53bf1de54b4c96d6@mail.gmail.com> References: <43aa6ff70708240946s4fb506a6o53bf1de54b4c96d6@mail.gmail.com> Message-ID: On 8/24/07, Collin Winter wrote: > Adrian and I talked about this this morning, and he said he's going to > go ahead with an implementation. The original warning messages were a > good idea, but they tend to get lost when converting large projects. (I assume this is the place to post patches for the 2to3 utility, but please set me straight if I should use bugs.python.org instead...) I've attached two patches that implement the 2to3 change discussed in this thread. In 2to3_insert_comment.diff -- * fixes/util.py gets an insert_comment() function. Give it a Node/Leaf and a comment message, and it will insert a Python comment before the given Node/Leaf. This takes indentation into account, such that the comment will be indented to fix the indentation of the line it is commenting. For example: if foo: # comment about bar() bar() It also handles existing comments gracefully. If a line already has a comment above it, the new comment will be added on a new line under the old one. * pytree.Base gets two new methods: get_previous_sibling() and get_previous_in_tree(). These just made it easier and clearer to implement insert_comment(). * tests/test_util.py has unit tests for insert_comment(), and tests/test_pytree.py has tests for the two new pytree.Base methods. The other patch, 2to3_comment_warnings.diff, is an example of how we could integrate this new insert_comment() method to replace the current functionality of fixes.basefix.BaseFix.warning(). To see this in action, apply these two patches and run the 2to3 script (refactor.py) on the following input: foo() map(f, x) The resulting output should display a Python comment above the map() call instead of outputting a warning to stdout, which was the previous behavior. If these patches are accepted, the next steps would be to change the behavior of warns() and warns_unchanged() in tests/test_fixers.py, so that the tests can catch the new behavior. Adrian -- Adrian Holovaty holovaty.com | djangoproject.com -------------- next part -------------- A non-text attachment was scrubbed... Name: 2to3_comment_warnings.diff Type: application/octet-stream Size: 741 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/c87ff631/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: 2to3_insert_comment.diff Type: application/octet-stream Size: 11031 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/c87ff631/attachment-0001.obj From stephen at xemacs.org Sat Aug 25 08:10:04 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 25 Aug 2007 15:10:04 +0900 Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets) In-Reply-To: <93DBB66F-5D0D-4E46-8480-D2BFC693722A@python.org> References: <93DBB66F-5D0D-4E46-8480-D2BFC693722A@python.org> Message-ID: <87y7g0401v.fsf@uwakimon.sk.tsukuba.ac.jp> Barry Warsaw writes: > I've been spending hours of my own time on the email package for py3k > this week and every time I think I'm nearing success I get defeated > again. I'm ankle deep in the Big Muddy (daughter tested positive for TB as expected -- the Japanese innoculate all children against it because of the sins of their fathers -- and school starts on Tuesday, so we need to make a bunch of extra trips to doctors and whatnot), so what thin hope I had of hanging out with the big boys at the Python-3000 sprint long since evaporated. However, starting next week I should have a day a week or so I can devote to email stuff -- if you want to send any thoughts or requisitions my way (or an URL to sprint IRC transcripts), I'd love to help. Of course you'll get it all done and leave none for me, right? > But I'm determined to solve the worst of the problems this week. Bu-wha-ha-ha! Steve From skip at pobox.com Sat Aug 25 15:31:50 2007 From: skip at pobox.com (skip at pobox.com) Date: Sat, 25 Aug 2007 08:31:50 -0500 Subject: [Python-3000] Removal of PyArg_Parse() In-Reply-To: <46CFA9D8.4020603@v.loewis.de> References: <18127.18169.547364.692145@montanaro.dyndns.org> <18127.19378.582174.753256@montanaro.dyndns.org> <18127.29800.339928.420183@montanaro.dyndns.org> <46CFA9D8.4020603@v.loewis.de> Message-ID: <18128.12102.439038.376077@montanaro.dyndns.org> Martin> As for the case of timemodule: the surprising feature is that Martin> "(ii)" uses PySequence_Getitem to access the fields, whereas Martin> PyArg_ParseTuple uses PyTuple_GET_ITEM, so it won't work for Martin> StructSequences. I believe I've already fixed this (r57416) by inserting an intermediate function to convert time.struct_time objects to tuples before PyArgParseTuple sees them. It would be nice if someone could take a minute or two and review that change. Skip From guido at python.org Sat Aug 25 15:34:01 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 25 Aug 2007 06:34:01 -0700 Subject: [Python-3000] marshalling bytes objects In-Reply-To: References: Message-ID: Can we put this decision off till after the a1 release? At this point I don't expect PyString to be removed in time for the release, which I want to be done by August 31. On 8/24/07, Neal Norwitz wrote: > On 8/24/07, Neal Norwitz wrote: > > I see in PEP 358 (bytes) http://www.python.org/dev/peps/pep-0358/ that > > marshalling bytes is an open issue and needs to be specified. I'm > > converting code objects to use bytes for the bytecode and lnotab. Is > > there anything special to be aware of here? > > By "here" I was originally thinking about the marshaling aspect. But > clearly the mutability of bytes isn't particularly good for code > objects. :-) This goes back to the question of whether bytes should > be able to be immutable (frozen). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Aug 25 15:36:01 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 25 Aug 2007 06:36:01 -0700 Subject: [Python-3000] Removing email package until it's fixed Message-ID: FYI, I'm removing the email package from the py3k branch for now. If/when Barry has a working version we'll add it back. Given that it's so close to the release I'd rather release without the email package than with a broken one. If Barry finishes it after the a1 release, people who need it can always download his version directly. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Sat Aug 25 15:44:58 2007 From: skip at pobox.com (skip at pobox.com) Date: Sat, 25 Aug 2007 08:44:58 -0500 Subject: [Python-3000] Removal of PyArg_Parse() In-Reply-To: References: <18127.18169.547364.692145@montanaro.dyndns.org> <18127.19378.582174.753256@montanaro.dyndns.org> <18127.29800.339928.420183@montanaro.dyndns.org> Message-ID: <18128.12890.680091.978246@montanaro.dyndns.org> Neal> with Python/mactoolboxglue.c looking like it's low hanging fruit, I already took care of the easy cases there, though I haven't checked it in yet. Neal> The remaining 65 uses are in Mac modules. I'm not sure if all of Neal> them are sticking around. (That's a separate discussion we should Neal> have--which of the mac modules should go.) As I understand it, these are generated by bgen. Presumably we could change that code and regenerate those modules. Skip From lists at cheimes.de Sat Aug 25 16:47:43 2007 From: lists at cheimes.de (Christian Heimes) Date: Sat, 25 Aug 2007 16:47:43 +0200 Subject: [Python-3000] [patch] roman.py Message-ID: The patch fixes roman.py for Py3k (<> and raise fixes). Christian -------------- next part -------------- A non-text attachment was scrubbed... Name: roman.diff Type: text/x-patch Size: 1213 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/f8883817/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/f8883817/attachment.pgp From guido at python.org Sat Aug 25 16:56:17 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 25 Aug 2007 07:56:17 -0700 Subject: [Python-3000] [patch] roman.py In-Reply-To: References: Message-ID: Thanks, applied. There's a lot more to bing able to run "make html PYTHON=python3.0" successfully, isn't there? On 8/25/07, Christian Heimes wrote: > The patch fixes roman.py for Py3k (<> and raise fixes). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Sat Aug 25 18:27:53 2007 From: lists at cheimes.de (Christian Heimes) Date: Sat, 25 Aug 2007 18:27:53 +0200 Subject: [Python-3000] [patch] io.py improvements Message-ID: The patch improves io.py and socket.py's SocketIO: * I've removed all asserts and replaces them by explict raises * I've added four convenient methods _check_readable, _check_writable, _check_seekable and _check_closed. The methods take an optional msg argument for future usage. * unit tests for the stdin.name ... and io.__all__. open problems: The io.__all__ tuple contains a reference to SocketIO but SocketIO is in socket.py. from io import * fails. Should the facade SocketIO class moved to io.py or should SocketIO be removed from io.__all__? The predecessor of the patch was discussed in the sf.net bug tracker http://bugs.python.org/issue1771364 Christian -------------- next part -------------- A non-text attachment was scrubbed... Name: io_assert.patch Type: text/x-patch Size: 10330 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/f76066fb/attachment-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/f76066fb/attachment-0001.pgp From nnorwitz at gmail.com Sat Aug 25 18:30:58 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sat, 25 Aug 2007 09:30:58 -0700 Subject: [Python-3000] marshalling bytes objects In-Reply-To: References: Message-ID: On 8/25/07, Guido van Rossum wrote: > Can we put this decision off till after the a1 release? Yes. > At this point > I don't expect PyString to be removed in time for the release, which I > want to be done by August 31. Agreed. I plan to make a patch for this and upload it. All tests except modulefinder pass (I'm not sure why). There is a hack in marshal to convert co_code and co_lnotab to a bytes object after reading in a string. n -- > > On 8/24/07, Neal Norwitz wrote: > > On 8/24/07, Neal Norwitz wrote: > > > I see in PEP 358 (bytes) http://www.python.org/dev/peps/pep-0358/ that > > > marshalling bytes is an open issue and needs to be specified. I'm > > > converting code objects to use bytes for the bytecode and lnotab. Is > > > there anything special to be aware of here? > > > > By "here" I was originally thinking about the marshaling aspect. But > > clearly the mutability of bytes isn't particularly good for code > > objects. :-) This goes back to the question of whether bytes should > > be able to be immutable (frozen). > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From guido at python.org Sat Aug 25 18:40:06 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 25 Aug 2007 09:40:06 -0700 Subject: [Python-3000] [patch] io.py improvements In-Reply-To: References: Message-ID: Would you mind uploading this to the new tracker at bugs.python.org? And you can close the predecessor of the patch there (unless you want to reuse that one). (If you're having trouble using the tracker, you may need to reset your password -- it'll send an email with a new password to your SF account. You can then edit your profile to change the password once again and reset the email.) --Guido On 8/25/07, Christian Heimes wrote: > The patch improves io.py and socket.py's SocketIO: > > * I've removed all asserts and replaces them by explict raises > * I've added four convenient methods _check_readable, _check_writable, > _check_seekable and _check_closed. The methods take an optional msg > argument for future usage. > * unit tests for the stdin.name ... and io.__all__. > > open problems: > > The io.__all__ tuple contains a reference to SocketIO but SocketIO is in > socket.py. from io import * fails. Should the facade SocketIO class > moved to io.py or should SocketIO be removed from io.__all__? > > The predecessor of the patch was discussed in the sf.net bug tracker > http://bugs.python.org/issue1771364 > > Christian > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Sat Aug 25 18:59:05 2007 From: lists at cheimes.de (Christian Heimes) Date: Sat, 25 Aug 2007 18:59:05 +0200 Subject: [Python-3000] [patch] io.py improvements In-Reply-To: References: Message-ID: <46D05FD9.7080607@cheimes.de> Guido van Rossum wrote: > Would you mind uploading this to the new tracker at bugs.python.org? > And you can close the predecessor of the patch there (unless you want > to reuse that one). > > (If you're having trouble using the tracker, you may need to reset > your password -- it'll send an email with a new password to your SF > account. You can then edit your profile to change the password once > again and reset the email.) I'd like to close the bug but I'm not the owner of the bug any more. In fact all my bug reports and patches aren't assigned to me any more. I thought that I'd keep the assignments after the migration. Is it a bug? Christian From fdrake at acm.org Sat Aug 25 21:12:05 2007 From: fdrake at acm.org (Fred Drake) Date: Sat, 25 Aug 2007 15:12:05 -0400 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: References: Message-ID: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> On Aug 25, 2007, at 9:36 AM, Guido van Rossum wrote: > FYI, I'm removing the email package from the py3k branch for now. > If/when Barry has a working version we'll add it back. Given that it's > so close to the release I'd rather release without the email package > than with a broken one. If Barry finishes it after the a1 release, > people who need it can always download his version directly. Alternately, we could move toward separate libraries for such components; this allows separate packages to have separate maintenance cycles, and makes it easier for applications to pick up bug fixes. -Fred -- Fred Drake From greg at electricrain.com Sat Aug 25 21:38:15 2007 From: greg at electricrain.com (Gregory P. Smith) Date: Sat, 25 Aug 2007 12:38:15 -0700 Subject: [Python-3000] Removal of PyArg_Parse() In-Reply-To: References: <18127.18169.547364.692145@montanaro.dyndns.org> <18127.19378.582174.753256@montanaro.dyndns.org> <18127.29800.339928.420183@montanaro.dyndns.org> Message-ID: <20070825193815.GO24059@electricrain.com> On Fri, Aug 24, 2007 at 08:30:48PM -0700, Neal Norwitz wrote: > On 8/24/07, Guido van Rossum wrote: > > On 8/24/07, skip at pobox.com wrote: > > > > > > Guido> Are there any existing uses (in the core) that are hard to > > > Guido> replace with PyArg_ParseTuple()? > > > > > > There are lots of uses where the arguments aren't tuples. I was > > > particularly vexed by the time module because it was used to extract > > > arguments both from tuples and from time.struct_time objects. > > There are 45 uses in */*.c spread across 9 modules: > arraymodule.c, posixmodule.c, > _hashopenssl.c (2), dbmmodule.c (4), gdbmmodule.c (2), > mactoolboxglue.c (5), stringobject.c (2) _hashopenssl.c will stop using it soon enough as I modify it to take objects supporting the buffer api. -greg From guido at python.org Sat Aug 25 22:50:06 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 25 Aug 2007 13:50:06 -0700 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> Message-ID: Works for me. Barry? On 8/25/07, Fred Drake wrote: > On Aug 25, 2007, at 9:36 AM, Guido van Rossum wrote: > > FYI, I'm removing the email package from the py3k branch for now. > > If/when Barry has a working version we'll add it back. Given that it's > > so close to the release I'd rather release without the email package > > than with a broken one. If Barry finishes it after the a1 release, > > people who need it can always download his version directly. > > Alternately, we could move toward separate libraries for such > components; this allows separate packages to have separate > maintenance cycles, and makes it easier for applications to pick up > bug fixes. > > > -Fred > > -- > Fred Drake > > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From baranguren at gmail.com Sat Aug 25 20:09:20 2007 From: baranguren at gmail.com (Benjamin Aranguren) Date: Sat, 25 Aug 2007 11:09:20 -0700 Subject: [Python-3000] backported ABC Message-ID: Worked with Alex Martelli at the Goolge Python Sprint. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070825/6486578b/attachment.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: pyABC_backport_to_2_6.patch Type: text/x-patch Size: 1521 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070825/6486578b/attachment.bin From brett at python.org Sun Aug 26 00:00:15 2007 From: brett at python.org (Brett Cannon) Date: Sat, 25 Aug 2007 15:00:15 -0700 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> Message-ID: On 8/25/07, Fred Drake wrote: > On Aug 25, 2007, at 9:36 AM, Guido van Rossum wrote: > > FYI, I'm removing the email package from the py3k branch for now. > > If/when Barry has a working version we'll add it back. Given that it's > > so close to the release I'd rather release without the email package > > than with a broken one. If Barry finishes it after the a1 release, > > people who need it can always download his version directly. > > Alternately, we could move toward separate libraries for such > components; this allows separate packages to have separate > maintenance cycles, and makes it easier for applications to pick up > bug fixes. Are you suggesting of just leaving email out of the core then and just have people download it as necessary? Or just having it developed externally and thus have its own release schedule, but then pull in the latest stable release when we do a new Python release? I don't like the former, but the latter is intriguing. If we could host large packages (e.g., email, sqlite, ctypes, etc.) on python.org by providing tracker, svn, and web space they could be developed and released on their own schedule. Then the Python release would then become a sumo release of these various packages. People could release code that still depends on a specific Python version flatly (and thus not have external dependencies), or say it needs support of Python 2.6 + email 42.2 or something if some feature is really needed). But obviously this ups the resource needs on Python's infrastructure so I don't know how reasonable it really is in the end. -Brett From greg at krypto.org Sun Aug 26 00:26:14 2007 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 25 Aug 2007 15:26:14 -0700 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> Message-ID: <20070825222614.GQ24059@electricrain.com> On Sat, Aug 25, 2007 at 03:00:15PM -0700, Brett Cannon wrote: > On 8/25/07, Fred Drake wrote: > > Alternately, we could move toward separate libraries for such > > components; this allows separate packages to have separate > > maintenance cycles, and makes it easier for applications to pick up > > bug fixes. > > Are you suggesting of just leaving email out of the core then and just > have people download it as necessary? Or just having it developed > externally and thus have its own release schedule, but then pull in > the latest stable release when we do a new Python release? > > I don't like the former, but the latter is intriguing. If we could > host large packages (e.g., email, sqlite, ctypes, etc.) on python.org > by providing tracker, svn, and web space they could be developed and > released on their own schedule. Then the Python release would then > become a sumo release of these various packages. People could release > code that still depends on a specific Python version flatly (and thus > not have external dependencies), or say it needs support of Python 2.6 > + email 42.2 or something if some feature is really needed). But > obviously this ups the resource needs on Python's infrastructure so I > don't know how reasonable it really is in the end. > > -Brett Agreed, the latter of still pulling in the latest stable release when doing a new Python release is preferred. Libraries not included with the standard library set in python distributions are much less likely to be used because not all python installs will include them by default. I think something better than 'latest stable release' of any given module would make sense. Presumably we'd want to keep up the no features/API changes within a given Python 3.x releases standard library? Or would that just become "no backwards incompatible API changes" to allow for new features; all such modules would need to include their own version info. In that case we should make it easy to specify an API version at import time causing an ImportError if the API version is not met. brainstorm: import spam api 3.0 from spam api 3.0 import eggs as chickens import spam(3.0) from spam(3.0) import eggs as chickens it could get annoying to need to think much about package versions in import statements. its much less casual, it should not be required. -gps From p.f.moore at gmail.com Sun Aug 26 00:33:16 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 25 Aug 2007 23:33:16 +0100 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> Message-ID: <79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com> On 25/08/07, Brett Cannon wrote: > On 8/25/07, Fred Drake wrote: > > On Aug 25, 2007, at 9:36 AM, Guido van Rossum wrote: > > > FYI, I'm removing the email package from the py3k branch for now. > > > If/when Barry has a working version we'll add it back. Given that it's > > > so close to the release I'd rather release without the email package > > > than with a broken one. If Barry finishes it after the a1 release, > > > people who need it can always download his version directly. > > > > Alternately, we could move toward separate libraries for such > > components; this allows separate packages to have separate > > maintenance cycles, and makes it easier for applications to pick up > > bug fixes. > > Are you suggesting of just leaving email out of the core then and just > have people download it as necessary? Or just having it developed > externally and thus have its own release schedule, but then pull in > the latest stable release when we do a new Python release? FWIW, I'm very much against moving email out of the core. This has been discussed a number of times before, and as far as I am aware, no conclusion reached. However, the "batteries included" approach of Python is a huge benefit for me. Every time I have to endure writing Perl, I find some module that I don't have available as standard. I can download it, sure, but I can't *rely* on it. No matter how good eggs and/or PyPI get, please let's keep the standard library with the "batteries included" philosophy. (Apologies if removing email permanently was never the intention - you just touched a nerve there!) Paul. From janssen at parc.com Sun Aug 26 01:00:27 2007 From: janssen at parc.com (Bill Janssen) Date: Sat, 25 Aug 2007 16:00:27 PDT Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: <79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com> References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> <79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com> Message-ID: <07Aug25.160033pdt."57996"@synergy1.parc.xerox.com> > FWIW, I'm very much against moving email out of the core. This has > been discussed a number of times before, and as far as I am aware, no > conclusion reached. However, the "batteries included" approach of > Python is a huge benefit for me. I agree. But if the current code doesn't work with 3K, not sure what else to do. I guess it could just be labelled a "show-stopper" till it's fixed. Bill From greg at krypto.org Sun Aug 26 01:30:30 2007 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 25 Aug 2007 16:30:30 -0700 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: <7829056871282917102@unknownmsgid> References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> <79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com> <7829056871282917102@unknownmsgid> Message-ID: <52dc1c820708251630u17e3e96by1c447747a5896d10@mail.gmail.com> On 8/25/07, Bill Janssen wrote: > > FWIW, I'm very much against moving email out of the core. This has > > been discussed a number of times before, and as far as I am aware, no > > conclusion reached. However, the "batteries included" approach of > > Python is a huge benefit for me. > > I agree. But if the current code doesn't work with 3K, not sure what > else to do. I guess it could just be labelled a "show-stopper" till > it's fixed. yeah, relax. its not as if its going away for good. just to get 3.0a1 out. though by the time py3k is popular maybe sms and jabber libraries would be more useful. ;) From greg at krypto.org Sun Aug 26 02:54:13 2007 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 25 Aug 2007 17:54:13 -0700 Subject: [Python-3000] PyBuffer ndim unsigned Message-ID: <52dc1c820708251754w467f207amf09c5d6deea89cb0@mail.gmail.com> Anyone mind if I do this? --- Include/object.h (revision 57412) +++ Include/object.h (working copy) @@ -148,7 +148,7 @@ Py_ssize_t itemsize; /* This is Py_ssize_t so it can be pointed to by strides in simple case.*/ int readonly; - int ndim; + unsigned int ndim; char *format; Py_ssize_t *shape; Py_ssize_t *strides; PEP 3118 and all reality as I know it says ndim must be >= 0 so it makes sense to me. From barry at python.org Sun Aug 26 03:51:23 2007 From: barry at python.org (Barry Warsaw) Date: Sat, 25 Aug 2007 21:51:23 -0400 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: <07Aug25.160033pdt."57996"@synergy1.parc.xerox.com> References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> <79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com> <07Aug25.160033pdt."57996"@synergy1.parc.xerox.com> Message-ID: <70E776FF-416F-4F40-A86C-7CDD1D76C26A@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 25, 2007, at 7:00 PM, Bill Janssen wrote: >> FWIW, I'm very much against moving email out of the core. This has >> been discussed a number of times before, and as far as I am aware, no >> conclusion reached. However, the "batteries included" approach of >> Python is a huge benefit for me. > > I agree. But if the current code doesn't work with 3K, not sure what > else to do. I guess it could just be labelled a "show-stopper" till > it's fixed. Just a quick reply for right now, more later. email /will/ be made to work with py3k and I'm against removing it permanently unless as part of a py3k-wide policy to detach large parts of the stdlib. I made a lot of progress last week and I intend to continue working on it until it passes all the tests. Please check out the temporary sandbox version until then. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtDcnHEjvBPtnXfVAQJlnQQAlQORbGXzqnhw6z+5PGkTrb2p3kpHE2rf AN/MYQ+sF7ASMHiNE9ZqKvbOjsNi7HW49LdBcJ6ySOYolzo8k1+pjh0HJCt6ROST T4hPSFBIHtOtlBtg3LAo8q+y5fAynviSE2r7jn+LyezdVD9vTJnTGJlGWtYoIHZt +LDF5uY4arc= =Z1gN -----END PGP SIGNATURE----- From guido at python.org Sun Aug 26 03:55:31 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 25 Aug 2007 18:55:31 -0700 Subject: [Python-3000] Limitations of "batteries included" Message-ID: [Subject was: Removing email package until it's fixed] I find there are pluses and minuses to the "batteries included" philosophy. Not so much in the case of the email package (which I'm sure will be back before 3.0 final is released), but in general, in order for a package to be suitable for inclusion in the core, it must have pretty much stopped evolving, or at least its evolution rate must have slowed down to the same 18-24 month feature release cycle that the core language and library experiences. Take for example GUI packages. Tkinter is far from ideal, but there are many competitors, none of them perfect (not even those packages specifically designed to be platform-neutral). We can't very well include all of the major packages (PyQt, PyGtk, wxPython, anygui) -- the release would just bloat tremendously, and getting stable versions of all of these would just be a maintenance nightmare. (I don't know how Linux distros do it, but they tend to have a large group of people *just* devoted to *bundling* stuff, and their release cycles are even slower. I don't think Python should be in that business.) Database wrappers are in the same boat, and IMO the approach of separately downloadable 3rd party wrappers (sometimes multiple competing wrappers for the same database) has served the users well. Or consider the major pain caused by PyXML (xmlplus), which tried to pre-empt the namespace of the built-in xml package, causing endless confusion and breakage. Would anyone seriously consider including something like Django, TurboGears or Pylons in a Python release? I hope not -- these all evolve at a rate about 10x that of Python, and the version included with a core distribution would be out of date (and a nuisance to replace) within months of the core release. I believe the only reasonable solution is to promote the use of package managers, and to let go of the "batteries included" philosophy where it comes to major external functionality. When it links to something that requires me to do install a pre-built external non-Python bundle anyway (e.g. Berkeley Db, Sqlite, and others), the included battery is useless until it is "charged" by installing that dependency; the Python wrapper might as well be managed by the same package manager. Now, there's plenty of pure Python (or Python-specific) functionality for which "batteries included" makes total sense, including the email package, wsgiref, XML processing, and more; it's often a judgement call. But I want to warn against the desire to include everything -- it's not going to happen, and it shouldn't. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Aug 26 04:08:03 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 25 Aug 2007 19:08:03 -0700 Subject: [Python-3000] PyBuffer ndim unsigned In-Reply-To: <52dc1c820708251754w467f207amf09c5d6deea89cb0@mail.gmail.com> References: <52dc1c820708251754w467f207amf09c5d6deea89cb0@mail.gmail.com> Message-ID: I look at it from another POV -- does anyone care about not being able to represent dimensionalities over 2 billion? I don't see the advantage of saying unsigned int here; it just means that we'll get more compiler warnings in code that is otherwise fine. After all, the previous line says 'int readonly' -- I'm sure that's meant to be a bool as well. Hey, Python sequences use Py_ssize_t to express their length, and I've never seen a string with a negative length either. :-) I could even see code computing the difference between two dimensions and checking if it is negative; don't some compilers actively work against making such code work correctly? --Guido On 8/25/07, Gregory P. Smith wrote: > Anyone mind if I do this? > > --- Include/object.h (revision 57412) > +++ Include/object.h (working copy) > @@ -148,7 +148,7 @@ > Py_ssize_t itemsize; /* This is Py_ssize_t so it can be > pointed to by strides in simple case.*/ > int readonly; > - int ndim; > + unsigned int ndim; > char *format; > Py_ssize_t *shape; > Py_ssize_t *strides; > > > PEP 3118 and all reality as I know it says ndim must be >= 0 so it > makes sense to me. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Aug 26 04:10:16 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 25 Aug 2007 19:10:16 -0700 Subject: [Python-3000] backported ABC In-Reply-To: References: Message-ID: Um, that patch contains only the C code for overloading isinstance() and issubclass(). Did you do anything about abc.py and _abcoll.py/collections.py and their respective unit tests? Or what about the unit tests for isinstance()/issubclass()? On 8/25/07, Benjamin Aranguren wrote: > Worked with Alex Martelli at the Goolge Python Sprint. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Sun Aug 26 04:54:56 2007 From: skip at pobox.com (skip at pobox.com) Date: Sat, 25 Aug 2007 21:54:56 -0500 Subject: [Python-3000] A couple 2to3 questions Message-ID: <18128.60288.458934.140003@montanaro.dyndns.org> I ran 2to3 over the Doc/tools directory. This left a number of problems which I initially began replacing manually. I then realized that it would be better to tweak 2to3. A couple things I wondered about: 1. How are we supposed to maintain changes to Doc/tools? Running svn status doesn't show any changes. 2. I noticed a couple places where it seems to replace "if isinstance" with "ifinstance". Seems like an output bug of some sort. 3. Here are some obvious transformations (I don't know what to do to make these changes to 2to3): * replace uppercase and lowercase from the string module with their "ascii_"-prefixed names. * replace types.StringType and types.UnicodeType with str and unicode. Skip From greg at krypto.org Sun Aug 26 05:02:08 2007 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 25 Aug 2007 20:02:08 -0700 Subject: [Python-3000] PyBuffer ndim unsigned In-Reply-To: References: <52dc1c820708251754w467f207amf09c5d6deea89cb0@mail.gmail.com> Message-ID: <52dc1c820708252002v3efce97eu869fd46e97e88271@mail.gmail.com> heh good point. ignore that thought. python is a signed language. :) On 8/25/07, Guido van Rossum wrote: > I look at it from another POV -- does anyone care about not being able > to represent dimensionalities over 2 billion? I don't see the > advantage of saying unsigned int here; it just means that we'll get > more compiler warnings in code that is otherwise fine. After all, the > previous line says 'int readonly' -- I'm sure that's meant to be a > bool as well. Hey, Python sequences use Py_ssize_t to express their > length, and I've never seen a string with a negative length either. > :-) > > I could even see code computing the difference between two dimensions > and checking if it is negative; don't some compilers actively work > against making such code work correctly? > > --Guido > > On 8/25/07, Gregory P. Smith wrote: > > Anyone mind if I do this? > > > > --- Include/object.h (revision 57412) > > +++ Include/object.h (working copy) > > @@ -148,7 +148,7 @@ > > Py_ssize_t itemsize; /* This is Py_ssize_t so it can be > > pointed to by strides in simple case.*/ > > int readonly; > > - int ndim; > > + unsigned int ndim; > > char *format; > > Py_ssize_t *shape; > > Py_ssize_t *strides; > > > > > > PEP 3118 and all reality as I know it says ndim must be >= 0 so it > > makes sense to me. > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From jimjjewett at gmail.com Sun Aug 26 05:02:22 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 25 Aug 2007 23:02:22 -0400 Subject: [Python-3000] Limitations of "batteries included" In-Reply-To: References: Message-ID: On 8/25/07, Guido van Rossum wrote: > I believe the only reasonable solution is to promote the use of > package managers, and to let go of the "batteries included" philosophy > where it comes to major external functionality. When it links to > something that requires me to do install a pre-built external > non-Python bundle anyway (e.g. Berkeley Db, Sqlite, and others), the > included battery is useless until it is "charged" by installing that > dependency; the Python wrapper might as well be managed by the same > package manager. Windows is in a slightly different category; many people can't easily install the external bundle. If it is included in the python binary (sqlite3, tcl), then everything is fine. But excluding them by default on non-windows machines seems to be opening the door to bitrot. (Remember that one of the pushes toward the buildbots was a realization of how long the windows build had stayed broken without anyone noticing.) -jJ From nnorwitz at gmail.com Sun Aug 26 05:13:10 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sat, 25 Aug 2007 20:13:10 -0700 Subject: [Python-3000] A couple 2to3 questions In-Reply-To: <18128.60288.458934.140003@montanaro.dyndns.org> References: <18128.60288.458934.140003@montanaro.dyndns.org> Message-ID: On 8/25/07, skip at pobox.com wrote: > I ran 2to3 over the Doc/tools directory. This left a number of problems > which I initially began replacing manually. I then realized that it would > be better to tweak 2to3. A couple things I wondered about: > > 1. How are we supposed to maintain changes to Doc/tools? Running svn > status doesn't show any changes. Dunno, Georg will have to answer this one. > 2. I noticed a couple places where it seems to replace "if isinstance" > with "ifinstance". Seems like an output bug of some sort. That bug was probably me. I did some large changes and broke somethings a while back. I've since learned my lesson and just use 2to3 to automate the task. :-) > 3. Here are some obvious transformations (I don't know what to do to > make these changes to 2to3): > > * replace uppercase and lowercase from the string module with > their "ascii_"-prefixed names. This should be an easy fixer. Typically the easiest thing to do is copy an existing fixer that is similar and replace the pattern and transform method. To figure out the pattern, use 2to3/find_pattern.py . 1) Pass the python expression on the command line. 2) Press return until it shows you the expression you are interested in. 3) Then type y and you have your pattern. Here's an example: $ python find_pattern.py 'string.letters' '.letters' 'string.letters' y power< 'string' trailer< '.' 'letters' > > That last line is the pattern to use. Use that string in the fixer as the PATTERN. You may want to add names so you can pull out pieces. For example, if we want to pull out letters, modify the pattern to add my_name: power< 'string' trailer< '.' my_name='letters' > > Then modify the transform method to get my_name, clone the node, and set the new node to what you want. (See another fixer for the details.) > * replace types.StringType and types.UnicodeType with str and > unicode. This one is already done. I checked in fixer for this a few days ago (during the sprint). See 2to3/fixes/fix_types.py . It also handles other builtin types that were aliased in the types module. HTH, n From janssen at parc.com Sun Aug 26 05:53:33 2007 From: janssen at parc.com (Bill Janssen) Date: Sat, 25 Aug 2007 20:53:33 PDT Subject: [Python-3000] Limitations of "batteries included" In-Reply-To: References: Message-ID: <07Aug25.205334pdt."57996"@synergy1.parc.xerox.com> > On 8/25/07, Guido van Rossum wrote: > > I believe the only reasonable solution is to promote the use of > package managers, and to let go of the "batteries included" philosophy It's important to realize that most operating systems (Windows, OS X) don't really support the use of package managers. Installers, yes; package managers, no. And installers don't do dependencies. And most users (and probably most developers) are running one of these package-manager-less systems. Even with package managers, installing an external extension is out of bounds for most users. Many work for companies where the IT department controls what can and can't be installed, and the IT department does the installs. I do this myself, out of sheer lazyness -- I don't want to understand the system of dependencies for each Linux variant and I don't want to work as a sysop, so when I need some package on some Fedora box that isn't there, I don't fire up "yum"; instead, I call our internal tech support to make it happen. This means a turn-around time that varies from an hour to several days. This can be a killer if you just want to try something out -- the energy barrier is too high. So as soon as you require an install of something, you lose 80% of your potential users. Though I agree with some of your other points, those about the fast-moving unstable frameworks, and about the packages that depend on an external non-Python non-standard resource. Aside from that, though, I believe "batteries included" is really effective. I'd like to see more API-based work, like the DB-API work, and the WSGI work, both of which have been very effective. I'd like to see something like PyGUI added as a standard UI API, with a default binding for each platform (GTK+ for Windows and Linux, Cocoa for OS X, Swing for Jython, HTML5 for Web apps, perhaps a Tk binding for legacy systems, etc.) I think a standard image-processing API, perhaps based on PIL, would be another interesting project. Bill From nas at arctrix.com Sun Aug 26 05:57:55 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 26 Aug 2007 03:57:55 +0000 (UTC) Subject: [Python-3000] Limitations of "batteries included" References: Message-ID: Guido van Rossum wrote: > Now, there's plenty of pure Python (or Python-specific) functionality > for which "batteries included" makes total sense, including the email > package, wsgiref, XML processing, and more; it's often a judgement > call. But I want to warn against the desire to include everything -- > it's not going to happen, and it shouldn't. It sounds like we basically agree as to what "batteries included" means. Still, I think we should include more batteries. The problem is that, with the current model, the Python development team has to take on too much responsibility in order to include them. The "email" package is a good example. Most people would agree that it should be included in the distribution. It meets the requirements of a battery: it provides widely useful functionality, it has a (relatively) stable API, and it's well documented. However, it should not have to live in the Python source tree and be looked after be the Python developers. There should be a set of packages that are part of the Python release that managed by their own teams (e.g. email, ElementTree). In order to make a Python release, we would coordinate with the other teams to pull known good versions of their packages into the final distribution package. There could be a PEP that defines how the package must be organized, making it possible to automate most of the bundling process (e.g. unit test and documentation conventions). Neil From nas at arctrix.com Sun Aug 26 06:10:17 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 26 Aug 2007 04:10:17 +0000 (UTC) Subject: [Python-3000] Removing email package until it's fixed References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> Message-ID: Brett Cannon wrote: > I don't like the former, but the latter is intriguing. If we could > host large packages (e.g., email, sqlite, ctypes, etc.) on python.org > by providing tracker, svn, and web space they could be developed and > released on their own schedule. Then the Python release would then > become a sumo release of these various packages. Hosting them on python.org is a separate decision. We should be able to pull in packages that are hosted anywhere into the "batteries included" distribution. It sounds like most people are supportive of this idea. All that's needed is a little documentation outline the rules that packages must confirm to and a little scripting. We could have another file in the distribution, similar to Modules/Setup, say Modules/Batteries. :-) Something like: # ElementTree -- An XML API http://effbot.org/downloads/elementtree.tar.gz # email -- An email and MIME handling package http://www.python.org/downloads/email.tar.gz There could be a makefile target or script that downloads them and unpacks them into the right places. Neil From aahz at pythoncraft.com Sun Aug 26 07:13:02 2007 From: aahz at pythoncraft.com (Aahz) Date: Sat, 25 Aug 2007 22:13:02 -0700 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> Message-ID: <20070826051302.GC24678@panix.com> On Sun, Aug 26, 2007, Neil Schemenauer wrote: > Brett Cannon wrote: >> >> I don't like the former, but the latter is intriguing. If we could >> host large packages (e.g., email, sqlite, ctypes, etc.) on python.org >> by providing tracker, svn, and web space they could be developed and >> released on their own schedule. Then the Python release would then >> become a sumo release of these various packages. > > Hosting them on python.org is a separate decision. We should be able > to pull in packages that are hosted anywhere into the "batteries > included" distribution. > > It sounds like most people are supportive of this idea. Please don't interpret a missing chorus of opposition as support. I'm only -0, but I definitely am negative on the idea based on my guess about the likelihood of problems. (OTOH, I have no opinion about temporarily removing the email package for a1 -- though I'm tempted to suggest we call it a0 instead.) -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you don't know what your program is supposed to do, you'd better not start writing it." --Dijkstra From aahz at pythoncraft.com Sun Aug 26 07:17:19 2007 From: aahz at pythoncraft.com (Aahz) Date: Sat, 25 Aug 2007 22:17:19 -0700 Subject: [Python-3000] Limitations of "batteries included" In-Reply-To: References: Message-ID: <20070826051719.GD24678@panix.com> On Sat, Aug 25, 2007, Guido van Rossum wrote: > > I believe the only reasonable solution is to promote the use of > package managers, and to let go of the "batteries included" philosophy > where it comes to major external functionality. When it links to > something that requires me to do install a pre-built external > non-Python bundle anyway (e.g. Berkeley Db, Sqlite, and others), the > included battery is useless until it is "charged" by installing that > dependency; the Python wrapper might as well be managed by the same > package manager. > > Now, there's plenty of pure Python (or Python-specific) functionality > for which "batteries included" makes total sense, including the email > package, wsgiref, XML processing, and more; it's often a judgement > call. But I want to warn against the desire to include everything -- > it's not going to happen, and it shouldn't. That overall makes sense and is roughly my understanding of the status for the past while -- it's why we've been pushing PyPI. What I would say is that the Python philosophy stays "batteries included" and does not move closer to a "sumo" philosophy. I do think a separate sumo distribution might make sense if someone wants to drive it. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you don't know what your program is supposed to do, you'd better not start writing it." --Dijkstra From skip at pobox.com Sun Aug 26 13:43:37 2007 From: skip at pobox.com (skip at pobox.com) Date: Sun, 26 Aug 2007 06:43:37 -0500 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: <20070826051302.GC24678@panix.com> References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> <20070826051302.GC24678@panix.com> Message-ID: <18129.26473.77328.489985@montanaro.dyndns.org> aahz> Please don't interpret a missing chorus of opposition as support. aahz> I'm only -0, but I definitely am negative on the idea based on my aahz> guess about the likelihood of problems. -0 on the idea of more batteries or fewer batteries? Skip From barry at python.org Sun Aug 26 14:03:59 2007 From: barry at python.org (Barry Warsaw) Date: Sun, 26 Aug 2007 08:03:59 -0400 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: <07Aug25.160033pdt."57996"@synergy1.parc.xerox.com> References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> <79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com> <07Aug25.160033pdt."57996"@synergy1.parc.xerox.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 25, 2007, at 7:00 PM, Bill Janssen wrote: >> FWIW, I'm very much against moving email out of the core. This has >> been discussed a number of times before, and as far as I am aware, no >> conclusion reached. However, the "batteries included" approach of >> Python is a huge benefit for me. > > I agree. But if the current code doesn't work with 3K, not sure what > else to do. I guess it could just be labelled a "show-stopper" till > it's fixed. Another possibility (which I personally favor) is to leave the email package in a1 as flawed as it is, but to disable the tests. It's an / alpha/ for gosh sakes, so maybe leaving it in and partly broken will help rustle up some volunteers to help get it fixed. If you're agreeable to this, you can just merge the sandbox[1] into the head of the branch. I'm traveling until Monday and won't get a chance to do that until I'm back on the net. The sandbox is up-to- date with all my latest changes. - -Barry [1] svn+ssh://pythondev at svn.python.org/sandbox/trunk/emailpkg/5_0-exp -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtFsL3EjvBPtnXfVAQKRXQQAqUOoafghpcoE5ENEiNDWmzQgQStXe3VP WFsh8QwcCGXXxKTih4dNHK8yLd+ayrwZCxzqFpv4Ie5DFacQ6/d4qq+XPX+vK92Y wWPMIRKXscTK5Ep0n6lfvb/3I+d9E/AJKa+exgXarHhkpSaij1V8FrXxqx1GgNMK Bw/nU5stMjA= =M2tb -----END PGP SIGNATURE----- From barry at python.org Sun Aug 26 14:05:37 2007 From: barry at python.org (Barry Warsaw) Date: Sun, 26 Aug 2007 08:05:37 -0400 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 26, 2007, at 12:10 AM, Neil Schemenauer wrote: > Brett Cannon wrote: >> I don't like the former, but the latter is intriguing. If we could >> host large packages (e.g., email, sqlite, ctypes, etc.) on python.org >> by providing tracker, svn, and web space they could be developed and >> released on their own schedule. Then the Python release would then >> become a sumo release of these various packages. > > Hosting them on python.org is a separate decision. We should be > able to pull in packages that are hosted anywhere into the > "batteries included" distribution. > > It sounds like most people are supportive of this idea. All that's > needed is a little documentation outline the rules that packages > must confirm to and a little scripting. > > We could have another file in the distribution, similar to > Modules/Setup, say Modules/Batteries. :-) Something like: > > # ElementTree -- An XML API > http://effbot.org/downloads/elementtree.tar.gz > > # email -- An email and MIME handling package > http://www.python.org/downloads/email.tar.gz > > There could be a makefile target or script that downloads them and > unpacks them into the right places. /IF/ we do this, I would require that the packages be available on the cheese^H^H^H^H^H er, PyPI. The key thing is the version number since there will be at least 3 versions of email being maintained. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtFskXEjvBPtnXfVAQLTxQQAhhmFOkjLUgMPl2Kt6q7yn1anZUhQlagb cxroOsXZ55tScn8KVnQ5oFbv1l5IFg+bzdEZZcNyEsCptFs9WuKqYUB7/hAJ+mF+ Cw/zQUGoZUT2ZyB19pIfb9At1tp6sf2vLZraXztsHh6jib2uQVc0kCKR3HA+Tjef XQKdyUTjVW4= =bM9q -----END PGP SIGNATURE----- From p.f.moore at gmail.com Sun Aug 26 14:33:26 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 26 Aug 2007 13:33:26 +0100 Subject: [Python-3000] Limitations of "batteries included" In-Reply-To: <2088134209622619925@unknownmsgid> References: <2088134209622619925@unknownmsgid> Message-ID: <79990c6b0708260533x105ca70fn3b528a8d632ddb99@mail.gmail.com> On 26/08/07, Bill Janssen wrote: > > On 8/25/07, Guido van Rossum wrote: > > > > I believe the only reasonable solution is to promote the use of > > package managers, and to let go of the "batteries included" philosophy > > It's important to realize that most operating systems (Windows, OS X) > don't really support the use of package managers. [...] > Even with package managers, installing an external extension is out of > bounds for most users. [...] > So as soon as you require an install of something, you lose 80% of your > potential users. These are very good points, and fit exactly with my experience. For my personal use, I happily install and use any package that helps. For deployment, however, I very rarely contemplate relying on anything other than "the essentials" (to me, that covers Python, pywin32, and cx_Oracle - they get installed by default on any of our systems). > Though I agree with some of your other points, those about the > fast-moving unstable frameworks, and about the packages that depend on > an external non-Python non-standard resource. Definitely. I think the whole issue of inclusion in the standard library is a delicate balance - but one which Python has so far got just about right. I'd like to see that continue. The improvements in PyPI, and the rise of setuptools and eggs, are great, but shouldn't in themselves be a reason to slim down the standard library. Paul. From aahz at pythoncraft.com Sun Aug 26 16:07:18 2007 From: aahz at pythoncraft.com (Aahz) Date: Sun, 26 Aug 2007 07:07:18 -0700 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: <18129.26473.77328.489985@montanaro.dyndns.org> References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> <20070826051302.GC24678@panix.com> <18129.26473.77328.489985@montanaro.dyndns.org> Message-ID: <20070826140718.GA15100@panix.com> On Sun, Aug 26, 2007, skip at pobox.com wrote: > > aahz> Please don't interpret a missing chorus of opposition as support. > aahz> I'm only -0, but I definitely am negative on the idea based on my > aahz> guess about the likelihood of problems. > > -0 on the idea of more batteries or fewer batteries? -0 on the idea of making "batteries included" include PyPI packages. Anything part of "batteries included" IMO should just be part of the standard install. BTW, you snipped too much context, so that I had to go rummaging through my old e-email to figure out what the context was. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you don't know what your program is supposed to do, you'd better not start writing it." --Dijkstra From guido at python.org Sun Aug 26 16:52:02 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 26 Aug 2007 07:52:02 -0700 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> <79990c6b0708251533tf0a2dd0k3c07e94fd52736bc@mail.gmail.com> Message-ID: On 8/26/07, Barry Warsaw wrote: > Another possibility (which I personally favor) is to leave the email > package in a1 as flawed as it is, but to disable the tests. It's an / > alpha/ for gosh sakes, so maybe leaving it in and partly broken will > help rustle up some volunteers to help get it fixed. > > If you're agreeable to this, you can just merge the sandbox[1] into > the head of the branch. I'm traveling until Monday and won't get a > chance to do that until I'm back on the net. The sandbox is up-to- > date with all my latest changes. No, thanks. The broken package doesn't do people much good. People whose code depends on the email package can't use it anyway until it's fixed; they either have to wait, or they have to help fix it. Instructions for accessing the broken package will of course be included in the README, and as soon as the email package is fixed we'll include it again (hopefully in 3.0a2). BTW I'm surprised nobody else is helping out fixing it. I sent several requests to the email-sig and AFAIK nobody piped up. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Aug 26 16:56:42 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 26 Aug 2007 07:56:42 -0700 Subject: [Python-3000] backported ABC In-Reply-To: References: Message-ID: Thanks! Would it inconvenience you terribly to upload this all to the new tracker (bugs.python.org)? Preferably as a single patch against the svn trunk (to use svn diff, you have to svn add the new files first!) Also, are you planning to work on _abcoll.py and the changes to collections.py? --Guido On 8/26/07, Benjamin Aranguren wrote: > We copied abc.py and test_abc.py from py3k svn and modified to work with 2.6. > > After making all the changes we ran all the tests to ensure that no > other modules were affected. > > Attached are abc.py, test_abc.py, and their relevant patches from 3.0 to 2.6. > > On 8/25/07, Guido van Rossum wrote: > > Um, that patch contains only the C code for overloading isinstance() > > and issubclass(). > > > > Did you do anything about abc.py and _abcoll.py/collections.py and > > their respective unit tests? Or what about the unit tests for > > isinstance()/issubclass()? > > > > On 8/25/07, Benjamin Aranguren wrote: > > > Worked with Alex Martelli at the Goolge Python Sprint. > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From baranguren at gmail.com Sun Aug 26 12:29:25 2007 From: baranguren at gmail.com (Benjamin Aranguren) Date: Sun, 26 Aug 2007 03:29:25 -0700 Subject: [Python-3000] backported ABC In-Reply-To: References: Message-ID: We copied abc.py and test_abc.py from py3k svn and modified to work with 2.6. After making all the changes we ran all the tests to ensure that no other modules were affected. Attached are abc.py, test_abc.py, and their relevant patches from 3.0 to 2.6. On 8/25/07, Guido van Rossum wrote: > Um, that patch contains only the C code for overloading isinstance() > and issubclass(). > > Did you do anything about abc.py and _abcoll.py/collections.py and > their respective unit tests? Or what about the unit tests for > isinstance()/issubclass()? > > On 8/25/07, Benjamin Aranguren wrote: > > Worked with Alex Martelli at the Goolge Python Sprint. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -------------- next part -------------- A non-text attachment was scrubbed... Name: abc.py Type: text/x-python Size: 7986 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070826/296fbf4b/attachment-0002.py -------------- next part -------------- A non-text attachment was scrubbed... Name: test_abc.py Type: text/x-python Size: 4591 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070826/296fbf4b/attachment-0003.py -------------- next part -------------- A non-text attachment was scrubbed... Name: abc_backport_to_2_6.patch Type: text/x-patch Size: 1867 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070826/296fbf4b/attachment-0002.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: test_abc_backport_to_2_6.patch Type: text/x-patch Size: 3543 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070826/296fbf4b/attachment-0003.bin From nas at arctrix.com Sun Aug 26 19:20:36 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Sun, 26 Aug 2007 17:20:36 +0000 (UTC) Subject: [Python-3000] Removing email package until it's fixed References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> <20070826051302.GC24678@panix.com> <18129.26473.77328.489985@montanaro.dyndns.org> <20070826140718.GA15100@panix.com> Message-ID: Aahz wrote: > -0 on the idea of making "batteries included" include PyPI packages. > Anything part of "batteries included" IMO should just be part of the > standard install. I think you misunderstand the proposal. The "batteries" would be included as part of the final Python release. From the end user's point of view there would be no change from the current model. The difference would be from the Python developer's point of view. Some libraries would no longer be part of SVN checkout and you would have to run a script to pull them into your source tree. IMO, depending on PyPI not necessary or even desirable. All that's necessary is that the batteries conform to some standards regarding layout, documentation and unit tests. They could be pulled based on URLs and the hostname of the URL is not important. That scheme would make is easier for someone to make a sumo distribution just by adding more URLs to the list before building it. Neil From baranguren at gmail.com Sun Aug 26 19:47:17 2007 From: baranguren at gmail.com (Benjamin Aranguren) Date: Sun, 26 Aug 2007 10:47:17 -0700 Subject: [Python-3000] backported ABC In-Reply-To: References: Message-ID: I got it now. both modules need to be backported as well. I'm on it. On 8/26/07, Benjamin Aranguren wrote: > No problem. Created issue 1026 in tracker with a single patch file attached. > > I'm not aware of what changes need to be done with _abcoll.py and > collections.py. If you can point me to the right direction, I would > definitely like to work on it. > > On 8/26/07, Guido van Rossum wrote: > > Thanks! > > > > Would it inconvenience you terribly to upload this all to the new > > tracker (bugs.python.org)? Preferably as a single patch against the > > svn trunk (to use svn diff, you have to svn add the new files first!) > > > > Also, are you planning to work on _abcoll.py and the changes to collections.py? > > > > --Guido > > > > On 8/26/07, Benjamin Aranguren wrote: > > > We copied abc.py and test_abc.py from py3k svn and modified to work with 2.6. > > > > > > After making all the changes we ran all the tests to ensure that no > > > other modules were affected. > > > > > > Attached are abc.py, test_abc.py, and their relevant patches from 3.0 to 2.6. > > > > > > On 8/25/07, Guido van Rossum wrote: > > > > Um, that patch contains only the C code for overloading isinstance() > > > > and issubclass(). > > > > > > > > Did you do anything about abc.py and _abcoll.py/collections.py and > > > > their respective unit tests? Or what about the unit tests for > > > > isinstance()/issubclass()? > > > > > > > > On 8/25/07, Benjamin Aranguren wrote: > > > > > Worked with Alex Martelli at the Goolge Python Sprint. > > > > > > > > -- > > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > > > > > > > > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > From john.m.camara at comcast.net Sun Aug 26 19:50:48 2007 From: john.m.camara at comcast.net (john.m.camara at comcast.net) Date: Sun, 26 Aug 2007 17:50:48 +0000 Subject: [Python-3000] Limitations of "batteries included" Message-ID: <082620071750.21994.46D1BD7800071D07000055EA22120207840E9D0E030E0CD203D202080106@comcast.net> Sorry. Forgot to change the subject -------------- Original message ---------------------- From: john.m.camara at comcast.net > On 8/25/07, "Guido van Rossum" wrote: > > Take for example GUI packages. Tkinter is far from ideal, but there > > are many competitors, none of them perfect (not even those packages > > specifically designed to be platform-neutral). We can't very well > > include all of the major packages (PyQt, PyGtk, wxPython, anygui) -- > > the release would just bloat tremendously, and getting stable versions > > of all of these would just be a maintenance nightmare. (I don't know > > how Linux distros do it, but they tend to have a large group of people > > *just* devoted to *bundling* stuff, and their release cycles are even > > slower. I don't think Python should be in that business.) > > Python can't include all the major packages but it is necessary for any > language to support a good GUI package in order to be widely adopted > by the masses. Right now this is one of Python's weaknesses that needs > to be corrected. I agree with you that none of the major packages are > perfect and at the current slow rate of progress in this area I doubt any > of them will be perfect any time soon. There just doesn't seam like there > is enough motivation out there for this issue to self correct itself unlike the > situation that is currently go on in the web frameworks where significant > progress has been made in the last 2 years. I think its time to just > pronounce a package as it will be good for the community. My vote would > be for wxPython but I'm not someone who truly cares much about GUIs > as I much prefer to write the back ends of systems and stay far away from > the front ends. > > > > Database wrappers are in the same boat, and IMO the approach of > > separately downloadable 3rd party wrappers (sometimes multiple > > competing wrappers for the same database) has served the users well. > > I agree with you at this point in time but SQLAlchemy is something special > and will likely be worthy to be part of the std library in 18-24 months if the > current rate of development continues. In my opinion, it's Python's new > killer library and I expect it will be given a significant amount of positive > press soon and will help Python's user base grow. > > > > > Would anyone seriously consider including something like Django, > > TurboGears or Pylons in a Python release? I hope not -- these all > > evolve at a rate about 10x that of Python, and the version included > > with a core distribution would be out of date (and a nuisance to > > replace) within months of the core release. > > At this point in time none of the web frameworks are worthy to be included > in the standard library. I believe the community has been doing a good > job in this area with great progress being made in the last few years. What > we need in the standard library are some additional low level libraries/api > like WSGI. For example libraries for authentication/authorization, a web > services bus to manage WSGI services (to provide start, stop, reload, > events, scheduler, etc), and a new configuration system so that higher > level frameworks can seamlessly work together. > > John From john.m.camara at comcast.net Sun Aug 26 19:48:45 2007 From: john.m.camara at comcast.net (john.m.camara at comcast.net) Date: Sun, 26 Aug 2007 17:48:45 +0000 Subject: [Python-3000] Python-3000 Digest, Vol 18, Issue 116 Message-ID: <082620071748.1124.46D1BCFD000DEB390000046422120207840E9D0E030E0CD203D202080106@comcast.net> On 8/25/07, "Guido van Rossum" wrote: > Take for example GUI packages. Tkinter is far from ideal, but there > are many competitors, none of them perfect (not even those packages > specifically designed to be platform-neutral). We can't very well > include all of the major packages (PyQt, PyGtk, wxPython, anygui) -- > the release would just bloat tremendously, and getting stable versions > of all of these would just be a maintenance nightmare. (I don't know > how Linux distros do it, but they tend to have a large group of people > *just* devoted to *bundling* stuff, and their release cycles are even > slower. I don't think Python should be in that business.) Python can't include all the major packages but it is necessary for any language to support a good GUI package in order to be widely adopted by the masses. Right now this is one of Python's weaknesses that needs to be corrected. I agree with you that none of the major packages are perfect and at the current slow rate of progress in this area I doubt any of them will be perfect any time soon. There just doesn't seam like there is enough motivation out there for this issue to self correct itself unlike the situation that is currently go on in the web frameworks where significant progress has been made in the last 2 years. I think its time to just pronounce a package as it will be good for the community. My vote would be for wxPython but I'm not someone who truly cares much about GUIs as I much prefer to write the back ends of systems and stay far away from the front ends. > > Database wrappers are in the same boat, and IMO the approach of > separately downloadable 3rd party wrappers (sometimes multiple > competing wrappers for the same database) has served the users well. I agree with you at this point in time but SQLAlchemy is something special and will likely be worthy to be part of the std library in 18-24 months if the current rate of development continues. In my opinion, it's Python's new killer library and I expect it will be given a significant amount of positive press soon and will help Python's user base grow. > > Would anyone seriously consider including something like Django, > TurboGears or Pylons in a Python release? I hope not -- these all > evolve at a rate about 10x that of Python, and the version included > with a core distribution would be out of date (and a nuisance to > replace) within months of the core release. At this point in time none of the web frameworks are worthy to be included in the standard library. I believe the community has been doing a good job in this area with great progress being made in the last few years. What we need in the standard library are some additional low level libraries/api like WSGI. For example libraries for authentication/authorization, a web services bus to manage WSGI services (to provide start, stop, reload, events, scheduler, etc), and a new configuration system so that higher level frameworks can seamlessly work together. John From barry at python.org Sun Aug 26 20:30:47 2007 From: barry at python.org (Barry Warsaw) Date: Sun, 26 Aug 2007 14:30:47 -0400 Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets) In-Reply-To: <87y7g0401v.fsf@uwakimon.sk.tsukuba.ac.jp> References: <93DBB66F-5D0D-4E46-8480-D2BFC693722A@python.org> <87y7g0401v.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <9CBCCF2F-B428-4D37-8C18-1EAFB86CD7D9@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 25, 2007, at 2:10 AM, Stephen J. Turnbull wrote: > Barry Warsaw writes: > >> I've been spending hours of my own time on the email package for py3k >> this week and every time I think I'm nearing success I get defeated >> again. > > I'm ankle deep in the Big Muddy (daughter tested positive for TB as > expected -- the Japanese innoculate all children against it because of > the sins of their fathers -- and school starts on Tuesday, so we need > to make a bunch of extra trips to doctors and whatnot), so what thin > hope I had of hanging out with the big boys at the Python-3000 sprint > long since evaporated. Stephen, sorry to hear about your daughter and I hope she's going to be okay of course! > However, starting next week I should have a day a week or so I can > devote to email stuff -- if you want to send any thoughts or > requisitions my way (or an URL to sprint IRC transcripts), I'd love to > help. Of course you'll get it all done and leave none for me, right? Unfortunately, we didn't really sprint much on it, but I did get a chance to spend time on the branch. I think I see the light at the end of the tunnel for getting the existing tests to pass, though I haven't even looked at test_email_codecs.py yet. Because of the way things are going to work with in put and output codecs, I'll definitely want to get some sanity checks with Asian codecs. I'll try to put together a list of issues and questions and get those sent out next week. >> But I'm determined to solve the worst of the problems this week. > > Bu-wha-ha-ha! Heh, well I'm getting closer. We're definitely going to have some API changes, so I'll outline those as well. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtHG13EjvBPtnXfVAQKCngP+PUTm82FjnVpqz7HvPLS/zPXBMelDNhkK AKGIk5hveka180QEbA/DMsu7LZmPK2jXOQJWxufRsLfuzwKL3WtDF1IIyiICkC/I HoR04bHZJzUdEzZuZPL53I704JoO8QBpXEOn/JdauFEaZ6qakueLdnqx1Ab0LbSP RCLiVh9BxtU= =6Ngh -----END PGP SIGNATURE----- From janssen at parc.com Sun Aug 26 20:44:48 2007 From: janssen at parc.com (Bill Janssen) Date: Sun, 26 Aug 2007 11:44:48 PDT Subject: [Python-3000] Limitations of "batteries included" In-Reply-To: <79990c6b0708260533x105ca70fn3b528a8d632ddb99@mail.gmail.com> References: <2088134209622619925@unknownmsgid> <79990c6b0708260533x105ca70fn3b528a8d632ddb99@mail.gmail.com> Message-ID: <07Aug26.114451pdt."57996"@synergy1.parc.xerox.com> > These are very good points, and fit exactly with my experience. For my > personal use, I happily install and use any package that helps. For > deployment, however, I very rarely contemplate relying on anything > other than "the essentials" (to me, that covers Python, pywin32, and > cx_Oracle - they get installed by default on any of our systems). Indeed. I still write everything against Python 2.3.5, just so that OS X users can use my stuff -- few people will install a second Python on their machine. Bill From guido at python.org Sun Aug 26 21:24:51 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 26 Aug 2007 12:24:51 -0700 Subject: [Python-3000] How should the hash digest of a Unicode string be computed? Message-ID: Change r57490 by Gregory P Smith broke a test in test_unicodedata and, on PPC OSX, several tests in test_hashlib. Looking into this it's pretty clear *why* it broke: before, the 's#' format code was used, while Gregory's change changed this into using the buffer API (to ensure the data won't move around). Now, when a (Unicode) string is passed to s#, it uses the UTF-8 encoding. But the buffer API uses the raw bytes in the Unicode object, which is typically UTF-16 or UTF-32. (I can't quite figure out why the tests didn't fail on my Linux box; I'm guessing it's an endianness issue, but it can't be that simple. Perhaps that box happens to be falling back on a different implementation of the checksums?) I checked in a fix (because I don't like broken tests :-) which restores the old behavior by passing PyBUF_CHARACTER to PyObject_GetBuffer(), which enables a special case in the buffer API for PyUnicode that returns the UTF-8 encoded bytes instead of the raw bytes. (I still find this questionable, especially since a few random places in bytesobject.c also use PyBUF_CHARACTER, presumably to make tests pass, but for the *bytes* type, requesting *characters* (even encoded ones) is iffy. But I'm wondering if passing a Unicode string to the various hash digest functions should work at all! Hashes are defined on sequences of bytes, and IMO we should insist on the user to pass us bytes, and not second-guess what to do with Unicode. Opinions? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From collinw at gmail.com Sun Aug 26 21:44:54 2007 From: collinw at gmail.com (Collin Winter) Date: Sun, 26 Aug 2007 12:44:54 -0700 Subject: [Python-3000] A couple 2to3 questions In-Reply-To: References: <18128.60288.458934.140003@montanaro.dyndns.org> Message-ID: <43aa6ff70708261244v5ed8b85bj8d62b001bb630134@mail.gmail.com> On 8/25/07, Neal Norwitz wrote: > On 8/25/07, skip at pobox.com wrote: > > 2. I noticed a couple places where it seems to replace "if isinstance" > > with "ifinstance". Seems like an output bug of some sort. > > That bug was probably me. I did some large changes and broke > somethings a while back. I've since learned my lesson and just use > 2to3 to automate the task. :-) It wasn't you; it was a bug in fix_type_equality. I've fixed it in r57514. Collin Winter From amauryfa at gmail.com Sun Aug 26 23:23:37 2007 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Sun, 26 Aug 2007 23:23:37 +0200 Subject: [Python-3000] python 3 closes sys.stdout Message-ID: Hello, It seems that the new I/O system closes the 3 standard descriptors (stdin, stdout and stderr) when the sys module is unloaded. I don't know if it is a good thing on Unix, but on Windows at least, python crashes on exit, when call_ll_exitfuncs calls fflush(stdout) and fflush(stderr). As a quick correction, I changed a test in _fileio.c::internal_close(): Index: Modules/_fileio.c =========================================== --- Modules/_fileio.c (revision 57506) +++ Modules/_fileio.c (working copy) @@ -45,7 +45,7 @@ internal_close(PyFileIOObject *self) { int save_errno = 0; - if (self->fd >= 0) { + if (self->fd >= 3) { int fd = self->fd; self->fd = -1; Py_BEGIN_ALLOW_THREADS OTOH, documentation of io.open() says """ (*) If a file descriptor is given, it is closed when the returned I/O object is closed. If you don't want this to happen, use os.dup() to create a duplicate file descriptor. """ So a more correct change would be to dup the three sys.stdout, sys.stdin, sys.stderr, in site.py: installnewio() (BTW, the -S option is broken. You guess why) What are the consequences of a dup() on the standard descriptors? I don't like the idea of sys.stdout.fileno() to be different than 1. I know some code using the numbers 0,1,2 to refer to the standard files. Or we could change the behaviour to "If a file descriptor is given, it won't be closed". You opened it, you close it. What do you think? -- Amaury Forgeot d'Arc From nnorwitz at gmail.com Sun Aug 26 23:36:02 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sun, 26 Aug 2007 14:36:02 -0700 Subject: [Python-3000] python 3 closes sys.stdout In-Reply-To: References: Message-ID: On 8/26/07, Amaury Forgeot d'Arc wrote: > Hello, > > It seems that the new I/O system closes the 3 standard descriptors > (stdin, stdout and stderr) when the sys module is unloaded. Amaury, Other than this problem, can you report on how py3k is working on Windows? How did you compile it? What version of the compiler? Did you have any problems? Do you have outstanding changes to make it work? Which tests are failing? etc. Thanks, n From victor.stinner at haypocalc.com Mon Aug 27 00:11:21 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 27 Aug 2007 00:11:21 +0200 Subject: [Python-3000] python 3 closes sys.stdout In-Reply-To: References: Message-ID: <200708270011.21719.victor.stinner@haypocalc.com> On Sunday 26 August 2007 23:23:37 Amaury Forgeot d'Arc wrote: > internal_close(PyFileIOObject *self) > { > int save_errno = 0; > - if (self->fd >= 0) { > + if (self->fd >= 3) { > int fd = self->fd; > self->fd = -1; > Py_BEGIN_ALLOW_THREADS Hum, a before fix would be to add an option to choose if the file should be closed or not on object destruction. Victor Stinner aka haypo http://hachoir.org/ From greg at krypto.org Mon Aug 27 00:54:07 2007 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 26 Aug 2007 15:54:07 -0700 Subject: [Python-3000] How should the hash digest of a Unicode string be computed? In-Reply-To: References: Message-ID: <52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com> I'm in favor of not allowing unicode for hash functions. Depending on the system default encoding for a hash will not be portable. another question for hashlib: It uses PyArg_Parse to get a single 's' out of an optional parameter [see the code] and I couldn't figure out what the best thing to do there was. It just needs a C string to pass to openssl to lookup a hash function by name. Its C so i doubt it'll ever be anything but ascii. How should that parameter be parsed instead of the old 's' string format? PyBUF_CHARACTER actually sounds ideal in that case assuming it guarantees UTF-8 but I wasn't clear that it did that (is it always utf-8 or the possibly useless as far as APIs expecting C strings are concerned system "default encoding")? Requiring a bytes object would also work but I really don't like the idea of users needing to use a specific type for something so simple. (i consider string constants with their preceding b, r, u, s, type characters ugly in code without a good reason for them to be there) test_hashlib.py passed on the x86 osx system i was using to write the code. I neglected to run the full suite or grep for hashlib in other test suites and run those so i missed the test_unicodedata failure, sorry about the breakage. Is it just me or do unicode objects supporting the buffer api seem like an odd concept given that buffer api consumers (rather than unicode consumers) shouldn't need to know about encodings of the data being received. -gps On 8/26/07, Guido van Rossum wrote: > Change r57490 by Gregory P Smith broke a test in test_unicodedata and, > on PPC OSX, several tests in test_hashlib. > > Looking into this it's pretty clear *why* it broke: before, the 's#' > format code was used, while Gregory's change changed this into using > the buffer API (to ensure the data won't move around). Now, when a > (Unicode) string is passed to s#, it uses the UTF-8 encoding. But the > buffer API uses the raw bytes in the Unicode object, which is > typically UTF-16 or UTF-32. (I can't quite figure out why the tests > didn't fail on my Linux box; I'm guessing it's an endianness issue, > but it can't be that simple. Perhaps that box happens to be falling > back on a different implementation of the checksums?) > > I checked in a fix (because I don't like broken tests :-) which > restores the old behavior by passing PyBUF_CHARACTER to > PyObject_GetBuffer(), which enables a special case in the buffer API > for PyUnicode that returns the UTF-8 encoded bytes instead of the raw > bytes. (I still find this questionable, especially since a few random > places in bytesobject.c also use PyBUF_CHARACTER, presumably to make > tests pass, but for the *bytes* type, requesting *characters* (even > encoded ones) is iffy. > > But I'm wondering if passing a Unicode string to the various hash > digest functions should work at all! Hashes are defined on sequences > of bytes, and IMO we should insist on the user to pass us bytes, and > not second-guess what to do with Unicode. > > Opinions? > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/greg%40krypto.org > From adam at hupp.org Mon Aug 27 03:00:44 2007 From: adam at hupp.org (Adam Hupp) Date: Sun, 26 Aug 2007 21:00:44 -0400 Subject: [Python-3000] Support for newline and encoding arguments to open in tempfile module, also mktemp deprecation Message-ID: <766a29bd0708261800y376b65a9n4723910ea27e17f6@mail.gmail.com> It would be useful to support 'newline' and 'encoding' arguments in tempfile.TemporaryFile and friends. These new arguments would be passed directly into io.open. I've uploaded a patch for this to: http://bugs.python.org/issue1033 The 'bufsize' argument to os.fdopen has changed to 'buffering' so I went ahead and made the same change to TemporaryFile etc. Is this a desirable? While in tempfile, I noticed that tempfile.mktemp() has the following comment: "This function is unsafe and should not be used." The docs list it as "Deprecated since release 2.3". Should it be removed in py3k? -- Adam Hupp | http://hupp.org/adam/ From oliphant.travis at ieee.org Mon Aug 27 03:13:33 2007 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sun, 26 Aug 2007 20:13:33 -0500 Subject: [Python-3000] How should the hash digest of a Unicode string be computed? In-Reply-To: <52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com> References: <52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com> Message-ID: Gregory P. Smith wrote: > I'm in favor of not allowing unicode for hash functions. Depending on > the system default encoding for a hash will not be portable. > > another question for hashlib: It uses PyArg_Parse to get a single 's' > out of an optional parameter [see the code] and I couldn't figure out > what the best thing to do there was. It just needs a C string to pass > to openssl to lookup a hash function by name. Its C so i doubt it'll > ever be anything but ascii. How should that parameter be parsed > instead of the old 's' string format? PyBUF_CHARACTER actually sounds > ideal in that case assuming it guarantees UTF-8 but I wasn't clear > that it did that (is it always utf-8 or the possibly useless as far as > APIs expecting C strings are concerned system "default encoding")? > Requiring a bytes object would also work but I really don't like the > idea of users needing to use a specific type for something so simple. > (i consider string constants with their preceding b, r, u, s, type > characters ugly in code without a good reason for them to be there) > The PyBUF_CHARACTER flag was an add-on after I realized that the old buffer API was being in several places to get Unicode objects to encode their data as a string (in the default encoding of the system, I believe). The unicode object is the only one that I know of that actually does something different when it is called with PyBUF_CHARACTER. > test_hashlib.py passed on the x86 osx system i was using to write the > code. I neglected to run the full suite or grep for hashlib in other > test suites and run those so i missed the test_unicodedata failure, > sorry about the breakage. > > Is it just me or do unicode objects supporting the buffer api seem > like an odd concept given that buffer api consumers (rather than > unicode consumers) shouldn't need to know about encodings of the data > being received. I think you have a point. The buffer API does support the concept of "formats" but not "encodings" so having this PyBUF_CHARACTER flag looks rather like a hack. I'd have to look, because I don't even remember what is returned as the "format" from a unicode object if it is requested (it is probably not correct). I would prefer that the notion of encoding a unicode object is separated from the notion of the buffer API, but last week I couldn't see another way to un-tease it. -Travis > > -gps > > On 8/26/07, Guido van Rossum wrote: >> Change r57490 by Gregory P Smith broke a test in test_unicodedata and, >> on PPC OSX, several tests in test_hashlib. >> >> Looking into this it's pretty clear *why* it broke: before, the 's#' >> format code was used, while Gregory's change changed this into using >> the buffer API (to ensure the data won't move around). Now, when a >> (Unicode) string is passed to s#, it uses the UTF-8 encoding. But the >> buffer API uses the raw bytes in the Unicode object, which is >> typically UTF-16 or UTF-32. (I can't quite figure out why the tests >> didn't fail on my Linux box; I'm guessing it's an endianness issue, >> but it can't be that simple. Perhaps that box happens to be falling >> back on a different implementation of the checksums?) >> >> I checked in a fix (because I don't like broken tests :-) which >> restores the old behavior by passing PyBUF_CHARACTER to >> PyObject_GetBuffer(), which enables a special case in the buffer API >> for PyUnicode that returns the UTF-8 encoded bytes instead of the raw >> bytes. (I still find this questionable, especially since a few random >> places in bytesobject.c also use PyBUF_CHARACTER, presumably to make >> tests pass, but for the *bytes* type, requesting *characters* (even >> encoded ones) is iffy. >> >> But I'm wondering if passing a Unicode string to the various hash >> digest functions should work at all! Hashes are defined on sequences >> of bytes, and IMO we should insist on the user to pass us bytes, and >> not second-guess what to do with Unicode. >> >> Opinions? >> >> -- >> --Guido van Rossum (home page: http://www.python.org/~guido/) >> _______________________________________________ >> Python-3000 mailing list >> Python-3000 at python.org >> http://mail.python.org/mailman/listinfo/python-3000 >> Unsubscribe: http://mail.python.org/mailman/options/python-3000/greg%40krypto.org >> From guido at python.org Mon Aug 27 03:51:39 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 26 Aug 2007 18:51:39 -0700 Subject: [Python-3000] Support for newline and encoding arguments to open in tempfile module, also mktemp deprecation In-Reply-To: <766a29bd0708261800y376b65a9n4723910ea27e17f6@mail.gmail.com> References: <766a29bd0708261800y376b65a9n4723910ea27e17f6@mail.gmail.com> Message-ID: On 8/26/07, Adam Hupp wrote: > It would be useful to support 'newline' and 'encoding' arguments in > tempfile.TemporaryFile and friends. These new arguments would be > passed directly into io.open. I've uploaded a patch for this to: > > http://bugs.python.org/issue1033 > > The 'bufsize' argument to os.fdopen has changed to 'buffering' so I > went ahead and made the same change to TemporaryFile etc. Is this a > desirable? Hm, why not just create the temporary file in binary mode and wrap an io.TextIOWrapper instance around it? > While in tempfile, I noticed that tempfile.mktemp() has the following comment: > > "This function is unsafe and should not be used." > > The docs list it as "Deprecated since release 2.3". Should it be > removed in py3k? I personally think the deprecation was an overreaction to the security concerns. People avoid the warning by calling mkstemp() but then just close the file descriptor and use the filename anyway; that's just as unsafe, but often there's just no other way. I say, remove the deprecation. The attack on mktemp() is much less likely because the name is much more random anyway. (If you haven't heard of the attack: another process could guess the name of the tempfile and quickly replacing it with a symbolic link pointing to a file owned by the user owning the process, e.g. /etc/passwd, which will then get overwritten. This is because /tmp is writable by anyone. It works for non-root users too, to some extent.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 27 03:54:49 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 26 Aug 2007 18:54:49 -0700 Subject: [Python-3000] How should the hash digest of a Unicode string be computed? In-Reply-To: References: <52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com> Message-ID: On 8/26/07, Travis Oliphant wrote: > Gregory P. Smith wrote: > > I'm in favor of not allowing unicode for hash functions. Depending on > > the system default encoding for a hash will not be portable. > > > > another question for hashlib: It uses PyArg_Parse to get a single 's' > > out of an optional parameter [see the code] and I couldn't figure out > > what the best thing to do there was. It just needs a C string to pass > > to openssl to lookup a hash function by name. Its C so i doubt it'll > > ever be anything but ascii. How should that parameter be parsed > > instead of the old 's' string format? PyBUF_CHARACTER actually sounds > > ideal in that case assuming it guarantees UTF-8 but I wasn't clear > > that it did that (is it always utf-8 or the possibly useless as far as > > APIs expecting C strings are concerned system "default encoding")? > > Requiring a bytes object would also work but I really don't like the > > idea of users needing to use a specific type for something so simple. > > (i consider string constants with their preceding b, r, u, s, type > > characters ugly in code without a good reason for them to be there) > > > > The PyBUF_CHARACTER flag was an add-on after I realized that the old > buffer API was being in several places to get Unicode objects to encode > their data as a string (in the default encoding of the system, I believe). > > The unicode object is the only one that I know of that actually does > something different when it is called with PyBUF_CHARACTER. Aha, I figured something like that. > > test_hashlib.py passed on the x86 osx system i was using to write the > > code. I neglected to run the full suite or grep for hashlib in other > > test suites and run those so i missed the test_unicodedata failure, > > sorry about the breakage. > > > > Is it just me or do unicode objects supporting the buffer api seem > > like an odd concept given that buffer api consumers (rather than > > unicode consumers) shouldn't need to know about encodings of the data > > being received. > > I think you have a point. The buffer API does support the concept of > "formats" but not "encodings" so having this PyBUF_CHARACTER flag looks > rather like a hack. I'd have to look, because I don't even remember > what is returned as the "format" from a unicode object if it is > requested (it is probably not correct). > > I would prefer that the notion of encoding a unicode object is separated > from the notion of the buffer API, but last week I couldn't see another > way to un-tease it. I'll work on this some more. The problem is that it is currently relied on in a number of places (some of which probably don't even know it), and all those places must be changed to explicitly encode the Unicode string instead of passing it to some API that expects bytes. FWIW, this is the only issue that I have with your work so far. Two of your friends made it to the Sprint at least one day, but I have to admit that I don't know if they made any changes. --Guido > -Travis > > > > > > > -gps > > > > On 8/26/07, Guido van Rossum wrote: > >> Change r57490 by Gregory P Smith broke a test in test_unicodedata and, > >> on PPC OSX, several tests in test_hashlib. > >> > >> Looking into this it's pretty clear *why* it broke: before, the 's#' > >> format code was used, while Gregory's change changed this into using > >> the buffer API (to ensure the data won't move around). Now, when a > >> (Unicode) string is passed to s#, it uses the UTF-8 encoding. But the > >> buffer API uses the raw bytes in the Unicode object, which is > >> typically UTF-16 or UTF-32. (I can't quite figure out why the tests > >> didn't fail on my Linux box; I'm guessing it's an endianness issue, > >> but it can't be that simple. Perhaps that box happens to be falling > >> back on a different implementation of the checksums?) > >> > >> I checked in a fix (because I don't like broken tests :-) which > >> restores the old behavior by passing PyBUF_CHARACTER to > >> PyObject_GetBuffer(), which enables a special case in the buffer API > >> for PyUnicode that returns the UTF-8 encoded bytes instead of the raw > >> bytes. (I still find this questionable, especially since a few random > >> places in bytesobject.c also use PyBUF_CHARACTER, presumably to make > >> tests pass, but for the *bytes* type, requesting *characters* (even > >> encoded ones) is iffy. > >> > >> But I'm wondering if passing a Unicode string to the various hash > >> digest functions should work at all! Hashes are defined on sequences > >> of bytes, and IMO we should insist on the user to pass us bytes, and > >> not second-guess what to do with Unicode. > >> > >> Opinions? > >> > >> -- > >> --Guido van Rossum (home page: http://www.python.org/~guido/) > >> _______________________________________________ > >> Python-3000 mailing list > >> Python-3000 at python.org > >> http://mail.python.org/mailman/listinfo/python-3000 > >> Unsubscribe: http://mail.python.org/mailman/options/python-3000/greg%40krypto.org > >> > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at krypto.org Mon Aug 27 05:43:30 2007 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 26 Aug 2007 20:43:30 -0700 Subject: [Python-3000] How should the hash digest of a Unicode string be computed? In-Reply-To: References: <52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com> Message-ID: <52dc1c820708262043s1358ec81mfdf39b309381f249@mail.gmail.com> On 8/26/07, Travis Oliphant wrote: > > Gregory P. Smith wrote: > > I'm in favor of not allowing unicode for hash functions. Depending on > > the system default encoding for a hash will not be portable. > > > > another question for hashlib: It uses PyArg_Parse to get a single 's' > > out of an optional parameter [see the code] and I couldn't figure out > > what the best thing to do there was. It just needs a C string to pass > > to openssl to lookup a hash function by name. Its C so i doubt it'll > > ever be anything but ascii. How should that parameter be parsed > > instead of the old 's' string format? PyBUF_CHARACTER actually sounds > > ideal in that case assuming it guarantees UTF-8 but I wasn't clear > > that it did that (is it always utf-8 or the possibly useless as far as > > APIs expecting C strings are concerned system "default encoding")? > > Requiring a bytes object would also work but I really don't like the > > idea of users needing to use a specific type for something so simple. > > (i consider string constants with their preceding b, r, u, s, type > > characters ugly in code without a good reason for them to be there) > > > > The PyBUF_CHARACTER flag was an add-on after I realized that the old > buffer API was being in several places to get Unicode objects to encode > their data as a string (in the default encoding of the system, I believe). > > The unicode object is the only one that I know of that actually does > something different when it is called with PyBUF_CHARACTER. > > Is it just me or do unicode objects supporting the buffer api seem > > like an odd concept given that buffer api consumers (rather than > > unicode consumers) shouldn't need to know about encodings of the data > > being received. > > I think you have a point. The buffer API does support the concept of > "formats" but not "encodings" so having this PyBUF_CHARACTER flag looks > rather like a hack. I'd have to look, because I don't even remember > what is returned as the "format" from a unicode object if it is > requested (it is probably not correct). given that utf-8 characters are varying widths i don't see how it could ever practically be correct for unicode. I would prefer that the notion of encoding a unicode object is separated > from the notion of the buffer API, but last week I couldn't see another > way to un-tease it. > > -Travis A thought that just occurred to me... Would a PyBUF_CANONICAL flag be useful instead of CHARACTERS? For unicode that'd mean utf-8 (not just the default encoding) but I could imagine other potential uses such as multi-dimension buffers (PIL image objects?) presenting a defined canonical form of the data useful for either serialization and hashing. Any buffer api implementing object would define its own canonical form. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070826/20b62c7a/attachment.htm From guido at python.org Mon Aug 27 07:02:16 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 26 Aug 2007 22:02:16 -0700 Subject: [Python-3000] How should the hash digest of a Unicode string be computed? In-Reply-To: <52dc1c820708262043s1358ec81mfdf39b309381f249@mail.gmail.com> References: <52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com> <52dc1c820708262043s1358ec81mfdf39b309381f249@mail.gmail.com> Message-ID: On 8/26/07, Gregory P. Smith wrote: > On 8/26/07, Travis Oliphant wrote: > > Gregory P. Smith wrote: > > > I'm in favor of not allowing unicode for hash functions. Depending on > > > the system default encoding for a hash will not be portable. > > > > > > another question for hashlib: It uses PyArg_Parse to get a single 's' > > > out of an optional parameter [see the code] and I couldn't figure out > > > what the best thing to do there was. It just needs a C string to pass > > > to openssl to lookup a hash function by name. Its C so i doubt it'll > > > ever be anything but ascii. How should that parameter be parsed > > > instead of the old 's' string format? PyBUF_CHARACTER actually sounds > > > ideal in that case assuming it guarantees UTF-8 but I wasn't clear > > > that it did that (is it always utf-8 or the possibly useless as far as > > > APIs expecting C strings are concerned system "default encoding")? > > > Requiring a bytes object would also work but I really don't like the > > > idea of users needing to use a specific type for something so simple. > > > (i consider string constants with their preceding b, r, u, s, type > > > characters ugly in code without a good reason for them to be there) > > > > > > > The PyBUF_CHARACTER flag was an add-on after I realized that the old > > buffer API was being in several places to get Unicode objects to encode > > their data as a string (in the default encoding of the system, I believe). > > > > The unicode object is the only one that I know of that actually does > > something different when it is called with PyBUF_CHARACTER. > > > > > Is it just me or do unicode objects supporting the buffer api seem > > > like an odd concept given that buffer api consumers (rather than > > > unicode consumers) shouldn't need to know about encodings of the data > > > being received. > > > > I think you have a point. The buffer API does support the concept of > > "formats" but not "encodings" so having this PyBUF_CHARACTER flag looks > > rather like a hack. I'd have to look, because I don't even remember > > what is returned as the "format" from a unicode object if it is > > requested (it is probably not correct). > > given that utf-8 characters are varying widths i don't see how it could ever > practically be correct for unicode. Well, *practically*, the unicode object returns UTF-8 for PyBUF_CHARACTER. That is correct (at least until I rip all this out, which I'm in the middle of -- but no time to finish it tonight). > > I would prefer that the notion of encoding a unicode object is separated > > from the notion of the buffer API, but last week I couldn't see another > > way to un-tease it. > > > > -Travis > > A thought that just occurred to me... Would a PyBUF_CANONICAL flag be useful > instead of CHARACTERS? For unicode that'd mean utf-8 (not just the default > encoding) but I could imagine other potential uses such as multi-dimension > buffers (PIL image objects?) presenting a defined canonical form of the data > useful for either serialization and hashing. Any buffer api implementing > object would define its own canonical form. Note, the default encoding in 3.0 is fixed to UTF-8. (And it's fixed in a much more permanent way than in 2.x -- it is really hardcoded and there is really no way to change it.) But I'm thinking YAGNI -- the buffer API should always just return the bytes as they already are sitting in memory, not some transformation thereof. The current behavior of the unicode object for PyBUF_CHARACTER violates this. (There are no other violations BTW.) This is why I want to rip it out. I'm close... -- --Guido van Rossum (home page: http://www.python.org/~guido/) From baranguren at gmail.com Sun Aug 26 19:29:18 2007 From: baranguren at gmail.com (Benjamin Aranguren) Date: Sun, 26 Aug 2007 10:29:18 -0700 Subject: [Python-3000] backported ABC In-Reply-To: References: Message-ID: No problem. Created issue 1026 in tracker with a single patch file attached. I'm not aware of what changes need to be done with _abcoll.py and collections.py. If you can point me to the right direction, I would definitely like to work on it. On 8/26/07, Guido van Rossum wrote: > Thanks! > > Would it inconvenience you terribly to upload this all to the new > tracker (bugs.python.org)? Preferably as a single patch against the > svn trunk (to use svn diff, you have to svn add the new files first!) > > Also, are you planning to work on _abcoll.py and the changes to collections.py? > > --Guido > > On 8/26/07, Benjamin Aranguren wrote: > > We copied abc.py and test_abc.py from py3k svn and modified to work with 2.6. > > > > After making all the changes we ran all the tests to ensure that no > > other modules were affected. > > > > Attached are abc.py, test_abc.py, and their relevant patches from 3.0 to 2.6. > > > > On 8/25/07, Guido van Rossum wrote: > > > Um, that patch contains only the C code for overloading isinstance() > > > and issubclass(). > > > > > > Did you do anything about abc.py and _abcoll.py/collections.py and > > > their respective unit tests? Or what about the unit tests for > > > isinstance()/issubclass()? > > > > > > On 8/25/07, Benjamin Aranguren wrote: > > > > Worked with Alex Martelli at the Goolge Python Sprint. > > > > > > -- > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From nnorwitz at gmail.com Mon Aug 27 08:57:07 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sun, 26 Aug 2007 23:57:07 -0700 Subject: [Python-3000] backported ABC In-Reply-To: References: Message-ID: Another thing that needs to be ported are the changes to Lib/test/regrtest.py. Pretty much anything that references ABCs in there needs backporting. You can verify it works properly by running regrtest.py with the -R option on any test that uses an ABC. It should not report leaks. The full command line should look something like: ./python Lib/test/regrtest.py -R 4:3 test_abc n -- On 8/26/07, Benjamin Aranguren wrote: > I got it now. both modules need to be backported as well. I'm on it. > > On 8/26/07, Benjamin Aranguren wrote: > > No problem. Created issue 1026 in tracker with a single patch file attached. > > > > I'm not aware of what changes need to be done with _abcoll.py and > > collections.py. If you can point me to the right direction, I would > > definitely like to work on it. > > > > On 8/26/07, Guido van Rossum wrote: > > > Thanks! > > > > > > Would it inconvenience you terribly to upload this all to the new > > > tracker (bugs.python.org)? Preferably as a single patch against the > > > svn trunk (to use svn diff, you have to svn add the new files first!) > > > > > > Also, are you planning to work on _abcoll.py and the changes to collections.py? > > > > > > --Guido > > > > > > On 8/26/07, Benjamin Aranguren wrote: > > > > We copied abc.py and test_abc.py from py3k svn and modified to work with 2.6. > > > > > > > > After making all the changes we ran all the tests to ensure that no > > > > other modules were affected. > > > > > > > > Attached are abc.py, test_abc.py, and their relevant patches from 3.0 to 2.6. > > > > > > > > On 8/25/07, Guido van Rossum wrote: > > > > > Um, that patch contains only the C code for overloading isinstance() > > > > > and issubclass(). > > > > > > > > > > Did you do anything about abc.py and _abcoll.py/collections.py and > > > > > their respective unit tests? Or what about the unit tests for > > > > > isinstance()/issubclass()? > > > > > > > > > > On 8/25/07, Benjamin Aranguren wrote: > > > > > > Worked with Alex Martelli at the Goolge Python Sprint. > > > > > > > > > > -- > > > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > > > > > > > > > > > > > > > > > -- > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/nnorwitz%40gmail.com > From nnorwitz at gmail.com Mon Aug 27 09:48:52 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Mon, 27 Aug 2007 00:48:52 -0700 Subject: [Python-3000] status (preparing for first alpha) Message-ID: Py3k is progressing nicely. We are planning the first alpha sometime this week. The tests are mostly passing. With all the churn over the last week, I'm sure it's about to change. :-) AFAIK, nearly all the tests pass on Linux and Mac OS X. There was a report that Windows/VC8 was able to build python but it crashed in test_builtin. Can anyone confirm this? Here are the tasks that we need help with before the alpha is released: * Verify Windows build works with VC7 (currently the default compiler for 2.5) * Verify Windows build passes all tests * Verify other Unix builds work and pass all tests * Fix reference leaks probably related to IO * Fix problem with signal 32 on old gentoo box (new IO related?) See below for more details about many of these. The string/unicode merge is making good progress. There are less than 400 references to PyString. Most of the references are in about 5-10 modules. Less than 50 modules in the core have any references to PyString. We still need help converting over to use unicode. If you are interested in helping out, the spreadsheet is the best place to look for tasks: http://spreadsheets.google.com/ccc?key=pBLWM8elhFAmKbrhhh0ApQA&hl=en_US&pli=1 Separate sheets exist for C, Python, Writing, and Reading tasks. You can review the 3.0 docs at: http://docs.python.org/dev/3.0/ They are updated every 12 hours. There are many parts which need improvement. There are 4 tests that report reference leaks: test_io leaked [62, 62] references test_urllib leaked [122, 122] references test_urllib2_localnet leaked [3, 3] references test_xmlrpc leaked [26, 26] references On the gentoo machine that builds the docs, I would like to run the tests. 2.x is currently running python without a problem. In 3.0, there is a strange error about receiving an Unknown signal 32. I'm guessing this is related to the new IO library, but that's really a guess. Does anyone have a clue about what's happening here? I don't think I can catch the signal (I tried). Part of the reason I suspect IO is that the problem seems to occur while running various tests that use sockets. test_poplib is often the first one. But even if that's skipped many other tests can cause the problem, including: test_queue test_smtplib test_socket test_socket_ssl test_socketserver. Perhaps there's a test that triggers the problem and almost any other test seems to be causing the problem? There are some unexplained oddities. The most recent issue I saw was this strange exception while running the tests: File "Lib/httplib.py", line 1157, in __init__ HTTPConnection.__init__(self, host, port, strict, timeout) TypeError: unbound method __init__() must be called with FakeHTTPConnection instance as first argument (got HTTPSConnection instance instead) I've seen this exactly once. I don't know what happened. Completely unrelated, I also had a problem with using uninitialized memory from test_bytes. That also only happened once. It could have been a problem with an underlying library. n From amauryfa at gmail.com Mon Aug 27 10:22:42 2007 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Mon, 27 Aug 2007 10:22:42 +0200 Subject: [Python-3000] status (preparing for first alpha) In-Reply-To: References: Message-ID: Hello, Neal Norwitz wrote: > There was a report that Windows/VC8 > was able to build python but it crashed in test_builtin. Can anyone > confirm this? After some more digging: - Only the debug build is concerned. No crash with a release build. - The crash is a stack overflow. - the failing function is test_cmp() in test_builtin.py, and indeed it tries to "verify that circular objects are not handled", by expecting a RuntimeErrror. - The debugger stops in PyUnicode_EncodeUTF8. This function defines somewhat large variable: #define MAX_SHORT_UNICHARS 300 /* largest size we'll do on the stack */ char stackbuf[MAX_SHORT_UNICHARS * 4]; I suspect that the stack requirements for a recursive __cmp__ have increased. It may be lower for a release build thanks to compiler optimizations. I will try to come later with more precise measurements. -- Amaury Forgeot d'Arc From greg at electricrain.com Mon Aug 27 09:59:25 2007 From: greg at electricrain.com (Gregory P. Smith) Date: Mon, 27 Aug 2007 00:59:25 -0700 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: <20070824165823.GM24059@electricrain.com> References: <46B7FACC.8030503@v.loewis.de> <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> <46CE5346.10301@canterbury.ac.nz> <20070824165823.GM24059@electricrain.com> Message-ID: <20070827075925.GT24059@electricrain.com> On Fri, Aug 24, 2007 at 09:58:24AM -0700, Gregory P. Smith wrote: > On Thu, Aug 23, 2007 at 09:17:04PM -0700, Guido van Rossum wrote: > > On 8/23/07, Greg Ewing wrote: > > > Gregory P. Smith wrote: > > > > Wasn't a past mailing list thread claiming the bytes type was supposed > > > > to be great for IO? How's that possible unless we add a lock to the > > > > bytesobject? > > > > > > Doesn't the new buffer protocol provide something for > > > getting a locked view of the data? If so, it seems like > > > bytes should implement that. > > > > It *does* implement that! So there's the solution: these APIs should > > not insist on bytes but use the buffer API. It's quite a bit of work I > > suspect (especially since you can't use PyArg_ParseTuple with y# any > > more) but worth it. > > > > BTW PyUnicode should *not* support the buffer API. > > > > I'll add both of these to the task spreadsheet. > > this sounds good, i'll work on it today for bsddb and hashlib. So I converted _bsddb.c to use the buffer API everywhere only to find that bytes objects don't support the PyBUF_LOCKDATA option of the buffer API... I should've seen that coming. :) Anyways I opened a bug to track that. Its needed in order to release the GIL while doing I/O from bytes objects. http://bugs.python.org/issue1035 My _bsddb patch is stored for posterity until issue1035 can be fixed in issue1036. I'll test it another day ignoring the mutability issues (as the current _bssdb.c does with its direct use of bytes) and update the patch after squashing bugs. -gps From skip at pobox.com Mon Aug 27 13:12:53 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 27 Aug 2007 06:12:53 -0500 Subject: [Python-3000] status (preparing for first alpha) In-Reply-To: References: Message-ID: <18130.45493.75756.332057@montanaro.dyndns.org> Neal> The string/unicode merge is making good progress. There are less Neal> than 400 references to PyString. Most of the references are in Neal> about 5-10 modules. Less than 50 modules in the core have any Neal> references to PyString. We still need help converting over to use Neal> unicode. If you are interested in helping out, the spreadsheet is Neal> the best place to look for tasks: Neal> http://spreadsheets.google.com/ccc?key=pBLWM8elhFAmKbrhhh0ApQA&hl=en_US&pli=1 As someone who hasn't participated in the string->unicode conversion up to this point (not even looking at any of the hundreds of related checkins) it's not at all obvious how to correctly replace PyString_* with PyUnicode_* in any given situation. Is there some document somewhere that can be used to at least give some hints? (I think I asked this before. I'm not sure I got an answer.) Also, given that we are close, shouldn't a few buildbots be set up? Skip From theller at ctypes.org Mon Aug 27 13:17:37 2007 From: theller at ctypes.org (Thomas Heller) Date: Mon, 27 Aug 2007 13:17:37 +0200 Subject: [Python-3000] status (preparing for first alpha) In-Reply-To: References: Message-ID: Neal Norwitz schrieb: > Py3k is progressing nicely. We are planning the first alpha sometime > this week. The tests are mostly passing. With all the churn over the > last week, I'm sure it's about to change. :-) AFAIK, nearly all the > tests pass on Linux and Mac OS X. There was a report that Windows/VC8 > was able to build python but it crashed in test_builtin. Can anyone > confirm this? > > Here are the tasks that we need help with before the alpha is released: > * Verify Windows build works with VC7 (currently the default compiler for 2.5) The build works for me, now that I've fixed PCBuild\build_ssl.py for Python3. > * Verify Windows build passes all tests Hehe. For me Python3 still cannot 'import time', because of umlauts in the _tzname libc variable: c:\svn\py3k\PCbuild>python_d Python 3.0x (py3k:57555, Aug 27 2007, 10:00:25) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import time Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11: invalid data [36397 refs] >>> Setting the environment variable TZ to 'GMT' for example is a workaround for this problem. Running the PCBuild\rt.bat script fails when it compares the expected output with the actual output. Some inspection shows that the comparison fails because there are '\n' linefeeds in the expected and '\n\r' linefeeds in the actual output: c:\svn\py3k\PCbuild>python_d -E -tt ../lib/test/regrtest.py test_grammar test test_grammar produced unexpected output: ********************************************************************** *** mismatch between line 1 of expected output and line 1 of actual output: - test_grammar + test_grammar ? + (['test_grammar\n'], ['test_grammar\r\n']) ... and so on ... (The last line is printed by some code I added to Lib\regrtest.py.) It seems that this behaviour was introduced by r57186: New I/O code from Tony Lownds implement newline feature correctly, and implements .newlines attribute in a 2.x-compatible fashion. Temporarily reverting this change from Lib\io.py I can run the tests without all this comparison failures. What I see is: ... test test_builtin failed -- Traceback (most recent call last): File "c:\svn\py3k\lib\test\test_builtin.py", line 1473, in test_round self.assertEqual(round(1e20), 1e20) AssertionError: 0 != 1e+020 ... (a lot of failures in test_doctest. Could this also be a line ending problem?) Unicode errors in various tests: test_glob test test_glob failed -- Traceback (most recent call last): File "c:\svn\py3k\lib\test\test_glob.py", line 87, in test_glob_directory_names eq(self.glob('*', '*a'), []) File "c:\svn\py3k\lib\test\test_glob.py", line 41, in glob res = glob.glob(p) File "c:\svn\py3k\lib\glob.py", line 16, in glob return list(iglob(pathname)) File "c:\svn\py3k\lib\glob.py", line 42, in iglob for name in glob_in_dir(dirname, basename): File "c:\svn\py3k\lib\glob.py", line 56, in glob1 names = os.listdir(dirname) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 27-31: unexpected end of data I'll stop the report here. A py3k buildbot on Windows would allow everyone to look at the test outcome. Thomas From theller at ctypes.org Mon Aug 27 13:33:32 2007 From: theller at ctypes.org (Thomas Heller) Date: Mon, 27 Aug 2007 13:33:32 +0200 Subject: [Python-3000] status (preparing for first alpha) In-Reply-To: References: Message-ID: Thomas Heller schrieb: > Neal Norwitz schrieb: >> Py3k is progressing nicely. We are planning the first alpha sometime >> this week. The tests are mostly passing. With all the churn over the >> last week, I'm sure it's about to change. :-) AFAIK, nearly all the >> tests pass on Linux and Mac OS X. There was a report that Windows/VC8 >> was able to build python but it crashed in test_builtin. Can anyone >> confirm this? >> >> Here are the tasks that we need help with before the alpha is released: >> * Verify Windows build works with VC7 (currently the default compiler for 2.5) > > The build works for me, now that I've fixed PCBuild\build_ssl.py for Python3. > >> * Verify Windows build passes all tests > Running the PCBuild\rt.bat script fails when it compares the expected output > with the actual output. Some inspection shows that the comparison fails because > there are '\n' linefeeds in the expected and '\n\r' linefeeds in the actual output: > > c:\svn\py3k\PCbuild>python_d -E -tt ../lib/test/regrtest.py > test_grammar > test test_grammar produced unexpected output: > ********************************************************************** > *** mismatch between line 1 of expected output and line 1 of actual output: > - test_grammar > + test_grammar > ? + > (['test_grammar\n'], ['test_grammar\r\n']) > ... and so on ... > > (The last line is printed by some code I added to Lib\regrtest.py.) http://bugs.python.org/issue1029 apparently fixes this problem. Thomas From guido at python.org Mon Aug 27 16:12:59 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Aug 2007 07:12:59 -0700 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: <20070827075925.GT24059@electricrain.com> References: <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> <46CE5346.10301@canterbury.ac.nz> <20070824165823.GM24059@electricrain.com> <20070827075925.GT24059@electricrain.com> Message-ID: On 8/27/07, Gregory P. Smith wrote: > So I converted _bsddb.c to use the buffer API everywhere only to find > that bytes objects don't support the PyBUF_LOCKDATA option of the > buffer API... I should've seen that coming. :) Anyways I opened a > bug to track that. Its needed in order to release the GIL while doing > I/O from bytes objects. > > http://bugs.python.org/issue1035 > > My _bsddb patch is stored for posterity until issue1035 can be fixed > in issue1036. I'll test it another day ignoring the mutability issues > (as the current _bssdb.c does with its direct use of bytes) and update > the patch after squashing bugs. Adding data locking shouldn't be too complicated, but is it necessary? The bytes object does support locking the buffer in place; isn't that enough? It means someone evil could still produce a phase error by changing the contents while you're looking at it (basically sabotaging their own application) but I don't see how they could cause a segfault that way. Even if you really need the LOCKDATA feature, perhaps you can check in a slight mod of your code that uses SIMPLE for now -- use a macro for the flags that's defined as PyBUF_SIMPLE and add a comment that you'd like it to be LOCKDATA once bytes support that. That way we have less code in the tracker and more in subversion -- always a good thing IMO. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 27 16:21:50 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Aug 2007 07:21:50 -0700 Subject: [Python-3000] status (preparing for first alpha) In-Reply-To: <18130.45493.75756.332057@montanaro.dyndns.org> References: <18130.45493.75756.332057@montanaro.dyndns.org> Message-ID: On 8/27/07, skip at pobox.com wrote: > As someone who hasn't participated in the string->unicode conversion up to > this point (not even looking at any of the hundreds of related checkins) > it's not at all obvious how to correctly replace PyString_* with PyUnicode_* > in any given situation. Is there some document somewhere that can be used > to at least give some hints? (I think I asked this before. I'm not sure I > got an answer.) There isn't one recipe. You first have to decide whether a particular API should use bytes or str. I would like to write something up because I know it will be important for maintainers of extension modules; but I don't have the time right now. > Also, given that we are close, shouldn't a few buildbots be set up? Agreed. Neal tried to set up a buildbot on the only machine he can easily use for this, but that's the "old gentoo box" where he keeps getting signal 32. (I suspect this may be a kernel bug and not our fault.) I forget who can set up buildbots -- is it Martin? Can someone else help? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Mon Aug 27 16:32:29 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 27 Aug 2007 09:32:29 -0500 Subject: [Python-3000] status (preparing for first alpha) In-Reply-To: References: <18130.45493.75756.332057@montanaro.dyndns.org> Message-ID: <18130.57469.16905.629301@montanaro.dyndns.org> >> Also, given that we are close, shouldn't a few buildbots be set up? Guido> Agreed. Neal tried to set up a buildbot on the only machine he Guido> can easily use for this, but that's the "old gentoo box" where he Guido> keeps getting signal 32. (I suspect this may be a kernel bug and Guido> not our fault.) I forget who can set up buildbots -- is it Guido> Martin? Can someone else help? I run a couple community buildbots on my G5 for SQLAlchemy. I can set one up there for py3k if desired. Just let me know what to do. Skip From aahz at pythoncraft.com Mon Aug 27 18:32:51 2007 From: aahz at pythoncraft.com (Aahz) Date: Mon, 27 Aug 2007 09:32:51 -0700 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> <20070826051302.GC24678@panix.com> <18129.26473.77328.489985@montanaro.dyndns.org> <20070826140718.GA15100@panix.com> Message-ID: <20070827163251.GA9067@panix.com> On Sun, Aug 26, 2007, Neil Schemenauer wrote: > Aahz wrote: >> >> -0 on the idea of making "batteries included" include PyPI packages. >> Anything part of "batteries included" IMO should just be part of the >> standard install. > > I think you misunderstand the proposal. The "batteries" would be > included as part of the final Python release. From the end user's > point of view there would be no change from the current model. The > difference would be from the Python developer's point of view. Some > libraries would no longer be part of SVN checkout and you would have > to run a script to pull them into your source tree. Given how little dev I do, I'm not entitled to an opinion, but given the number of messages I see to the mailing lists that end up as being checkout synch problems, I see this as a recipe for trouble, particularly for regression testing. Because this is just an infrastructure/procedure change to the dev process, it should be easy enough to revert if it proves problematic, so I remove my -0. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you don't know what your program is supposed to do, you'd better not start writing it." --Dijkstra From rhamph at gmail.com Mon Aug 27 19:21:21 2007 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 27 Aug 2007 11:21:21 -0600 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> <20070826051302.GC24678@panix.com> <18129.26473.77328.489985@montanaro.dyndns.org> <20070826140718.GA15100@panix.com> Message-ID: On 8/26/07, Neil Schemenauer wrote: > Aahz wrote: > > -0 on the idea of making "batteries included" include PyPI packages. > > Anything part of "batteries included" IMO should just be part of the > > standard install. > > I think you misunderstand the proposal. The "batteries" would be > included as part of the final Python release. From the end user's > point of view there would be no change from the current model. The > difference would be from the Python developer's point of view. Some > libraries would no longer be part of SVN checkout and you would have > to run a script to pull them into your source tree. > > IMO, depending on PyPI not necessary or even desirable. All that's > necessary is that the batteries conform to some standards regarding > layout, documentation and unit tests. They could be pulled based on > URLs and the hostname of the URL is not important. That scheme > would make is easier for someone to make a sumo distribution just by > adding more URLs to the list before building it. This would complicate the work of various packaging systems. Either they'd need to build their own mechanism to pull the sources from their archives, or they'd split them into separate packages and would no longer distribute with all the "batteries included" packages by default. Or more likely they'd pull them into a single source tarball in advance and ignore the whole mess. We'd probably distribute a single source tarball too, so we'd only burden the developers with this whole architecture. -1. email is a temporary situation. There are no consequences, so no further thought is needed. -- Adam Olsen, aka Rhamphoryncus From jimjjewett at gmail.com Mon Aug 27 19:59:40 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 27 Aug 2007 13:59:40 -0400 Subject: [Python-3000] How should the hash digest of a Unicode string be computed? In-Reply-To: References: Message-ID: On 8/26/07, Guido van Rossum wrote: > But I'm wondering if passing a Unicode string to the various hash > digest functions should work at all! Hashes are defined on sequences > of bytes, and IMO we should insist on the user to pass us bytes, and > not second-guess what to do with Unicode. Conceptually, unicode *by itself* can't be represented as a buffer. What can be represented is a unicode string + an encoding. The question is whether the hash function needs to know the encoding to figure out the hash. If you're hashing arbitrary bytes, then it doesn't really matter -- there is no expectation that a recoding should have the same hash. For hashing as a shortcut to __ne__, it does matter for text. Unfortunately, for historical reasons, plenty of code grabs the string buffer expecting text. For dict comparisons, we really ought to specify the equality (and therefore hash) in terms of a canonical equivalent, encoded in X (It isn't clear to me that X should be UTF-8 in particular, but the main thing is to pick something.) The alternative is that defensive code will need to do a (normally useless boilerplate) decode/canonicalize/reencode dance before dictionary checks and insertions. I would rather see that boilerplate done once in the unicode type (and again in any equivalent types, if need be), because (1) most storage type/encodings would be able to take shortcuts. (2) if people don't do the defensive coding, the bugs will be very obscure -jJ From guido at python.org Mon Aug 27 20:05:30 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Aug 2007 11:05:30 -0700 Subject: [Python-3000] How should the hash digest of a Unicode string be computed? In-Reply-To: References: Message-ID: On 8/27/07, Jim Jewett wrote: > On 8/26/07, Guido van Rossum wrote: > > But I'm wondering if passing a Unicode string to the various hash > > digest functions should work at all! Hashes are defined on sequences > > of bytes, and IMO we should insist on the user to pass us bytes, and > > not second-guess what to do with Unicode. > > Conceptually, unicode *by itself* can't be represented as a buffer. > > What can be represented is a unicode string + an encoding. The > question is whether the hash function needs to know the encoding to > figure out the hash. > > If you're hashing arbitrary bytes, then it doesn't really matter -- > there is no expectation that a recoding should have the same hash. > > For hashing as a shortcut to __ne__, it does matter for text. > > Unfortunately, for historical reasons, plenty of code grabs the string > buffer expecting text. Such code is broken, and this will be an error soon. I think this handles all the other issues -- as promised, *any* operation that mixes str and bytes (or anything else supporting the buffer API) will fail with a TypeError unless an encoding is specified explicitly. > For dict comparisons, we really ought to specify the equality (and > therefore hash) in terms of a canonical equivalent, encoded in X (It > isn't clear to me that X should be UTF-8 in particular, but the main > thing is to pick something.) No, dict keys can't be bytes or buffers. > The alternative is that defensive code will need to do a (normally > useless boilerplate) decode/canonicalize/reencode dance before > dictionary checks and insertions. > > I would rather see that boilerplate done once in the unicode type (and > again in any equivalent types, if need be), because > (1) most storage type/encodings would be able to take shortcuts. > (2) if people don't do the defensive coding, the bugs will be very obscure There is no dance. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brakhane at googlemail.com Mon Aug 27 20:01:49 2007 From: brakhane at googlemail.com (Dennis Brakhane) Date: Mon, 27 Aug 2007 20:01:49 +0200 Subject: [Python-3000] Will standard library modules comply with PEP 8? Message-ID: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com> Hi, sorry if this has been answered before, I search the mailing list and didn't find anything. I'd like to ask if the modules in the standard library will comply with PEP 8. I've always found it weird that - in the logging module, for example - I have to get the logger via getLogger instead of get_logger. I understand that the logging module is older than PEP 8 and therefore couldn't be changed. So if there's a time to "fix" logging, it'd probably be now. Greetings, Dennis From brett at python.org Mon Aug 27 21:25:35 2007 From: brett at python.org (Brett Cannon) Date: Mon, 27 Aug 2007 12:25:35 -0700 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com> References: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com> Message-ID: On 8/27/07, Dennis Brakhane wrote: > Hi, > > sorry if this has been answered before, I search the mailing list and > didn't find anything. > > I'd like to ask if the modules in the standard library will comply > with PEP 8. I've always found it weird that - in the logging module, > for example - I have to get the logger via getLogger instead of > get_logger. I understand that the logging module is older than PEP 8 > and therefore couldn't be changed. So if there's a time to "fix" > logging, it'd probably be now. Standard library decisions have not been made yet. But this could definitely be a possibility. -Brett From g.brandl at gmx.net Mon Aug 27 21:31:47 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 27 Aug 2007 21:31:47 +0200 Subject: [Python-3000] [patch] roman.py In-Reply-To: References: Message-ID: Guido van Rossum schrieb: > Thanks, applied. > > There's a lot more to bing able to run "make html PYTHON=python3.0" > successfully, isn't there? Yes, there is; IMO it won't have to work for alpha1, but I'll work on this during the next few weeks. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From guido at python.org Mon Aug 27 21:37:24 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Aug 2007 12:37:24 -0700 Subject: [Python-3000] [patch] roman.py In-Reply-To: References: Message-ID: On 8/27/07, Georg Brandl wrote: > Guido van Rossum schrieb: > > Thanks, applied. > > > > There's a lot more to bing able to run "make html PYTHON=python3.0" > > successfully, isn't there? > > Yes, there is; IMO it won't have to work for alpha1, but I'll work on this > during the next few weeks. Right. That's why I forced PYTHON = python2.5 in the Makefile for now... :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Mon Aug 27 21:44:01 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 27 Aug 2007 21:44:01 +0200 Subject: [Python-3000] A couple 2to3 questions In-Reply-To: <18128.60288.458934.140003@montanaro.dyndns.org> References: <18128.60288.458934.140003@montanaro.dyndns.org> Message-ID: skip at pobox.com schrieb: > I ran 2to3 over the Doc/tools directory. This left a number of problems > which I initially began replacing manually. I then realized that it would > be better to tweak 2to3. A couple things I wondered about: > > 1. How are we supposed to maintain changes to Doc/tools? Running svn > status doesn't show any changes. The individual tools are checked out from different repositories on the first "make html". tools/docutils and tools/pygments are fixed versions of the respective libraries, checked out from the svn.python.org/external/ repository. I'll have Pygments 2to3-ready with the next (0.9) release, and I'll probably look at docutils soon too, perhaps creating a branch. tools/sphinx is checked out from svn.python.org/doctools/trunk and maintained there, so if you want to change that code (which you're welcome to do) just check it out from there. (In theory, if you cd to tools/sphinx, you can use svn from there too.) Cheers, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From lists at cheimes.de Mon Aug 27 21:59:30 2007 From: lists at cheimes.de (Christian Heimes) Date: Mon, 27 Aug 2007 21:59:30 +0200 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com> References: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com> Message-ID: Dennis Brakhane wrote: > I'd like to ask if the modules in the standard library will comply > with PEP 8. I've always found it weird that - in the logging module, > for example - I have to get the logger via getLogger instead of > get_logger. I understand that the logging module is older than PEP 8 > and therefore couldn't be changed. So if there's a time to "fix" > logging, it'd probably be now. If I were in the position to decide I would rather change the PEP than the logging module. I prefer Zope 3 style camel case names for public attributes and methods (http://wiki.zope.org/zope3/ZopePythonNamingConventions point 3) over underscore names. I like to see the camel case style for public names as an alternative in PEP 8. I find it easier to read and less to type. But again it is just my personal and subjective opinion. Provided that a package uses a *single* style I can live with both styles but I'm using the camel case style for my projects. Chrstian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature Url : http://mail.python.org/pipermail/python-3000/attachments/20070827/edb2da45/attachment.pgp From nas at arctrix.com Mon Aug 27 22:01:58 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Mon, 27 Aug 2007 14:01:58 -0600 Subject: [Python-3000] Removing email package until it's fixed In-Reply-To: References: <8BB3EAC6-96C3-4EA5-A9C0-7391308E6662@acm.org> <20070826051302.GC24678@panix.com> <18129.26473.77328.489985@montanaro.dyndns.org> <20070826140718.GA15100@panix.com> Message-ID: <20070827200158.GA4566@arctrix.com> On Mon, Aug 27, 2007 at 11:21:21AM -0600, Adam Olsen wrote: > This would complicate the work of various packaging systems. You're not getting it. The tarball that we distribute as a Python release would look basically like it does now (i.e. it would include things like the "email" package). I can't see how that would complicate the life of anyone downstream of the people putting together the Python release. Neil From guido at python.org Mon Aug 27 22:05:13 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Aug 2007 13:05:13 -0700 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: References: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com> Message-ID: On 8/27/07, Christian Heimes wrote: > Dennis Brakhane wrote: > > I'd like to ask if the modules in the standard library will comply > > with PEP 8. I've always found it weird that - in the logging module, > > for example - I have to get the logger via getLogger instead of > > get_logger. I understand that the logging module is older than PEP 8 > > and therefore couldn't be changed. So if there's a time to "fix" > > logging, it'd probably be now. > > If I were in the position to decide I would rather change the PEP than > the logging module. I prefer Zope 3 style camel case names for public > attributes and methods > (http://wiki.zope.org/zope3/ZopePythonNamingConventions point 3) over > underscore names. I like to see the camel case style for public names as > an alternative in PEP 8. I find it easier to read and less to type. But > again it is just my personal and subjective opinion. Let's not start another bikeshed color debate. The PEP has been discussed, discussed again, and accepted. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Mon Aug 27 22:15:00 2007 From: lists at cheimes.de (Christian Heimes) Date: Mon, 27 Aug 2007 22:15:00 +0200 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: References: <226a19190708271101p10cb55ffrf65d42da0d3f1dd7@mail.gmail.com> Message-ID: <46D330C4.3060409@cheimes.de> Guido van Rossum wrote: > Let's not start another bikeshed color debate. The PEP has been > discussed, discussed again, and accepted. *g* :] I was on the verge of writing that I don't want to start another bike shed [1] discussion ... Christian [1] http://www.freebsd.org/cgi/getmsg.cgi?fetch=506636+517178+/usr/local/www/db/text/1999/freebsd-hackers/19991003.freebsd-hackers From guido at python.org Mon Aug 27 22:38:54 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Aug 2007 13:38:54 -0700 Subject: [Python-3000] Does bytes() need to support bytes(, )? Message-ID: I'm still working on stricter enforcement of the "don't mix str and bytes" rule. I'm finding a lot of trivial problems, which are relatively easy to fix but time-consuming. While doing this, I realize there are two idioms for converting a str to bytes: s.encode(e) or bytes(s, e). These have identical results. I think we can't really drop s.encode(), for symmetry with b.decode(). So is bytes(s, e) redundant? To make things murkier, str(b, e) is not quite redundant compared to b.encode(e), since str(b, e) also accepts buffer objects. But this doesn't apply to bytes(s, e) -- that one *only* accepts str. (NB: bytes(x) is a different API and accepts a different set of types.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From john.m.camara at comcast.net Tue Aug 28 00:16:51 2007 From: john.m.camara at comcast.net (john.m.camara at comcast.net) Date: Mon, 27 Aug 2007 22:16:51 +0000 Subject: [Python-3000] Will standard library modules comply with PEP 8? Message-ID: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net> > Message: 10 > Date: Mon, 27 Aug 2007 13:05:13 -0700 > From: "Guido van Rossum" > Subject: Re: [Python-3000] Will standard library modules comply with > PEP 8? > To: "Christian Heimes" > Cc: python-3000 at python.org > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > On 8/27/07, "Guido van Rossum" wrote: > On 8/27/07, Christian Heimes wrote: > > Dennis Brakhane wrote: > > > I'd like to ask if the modules in the standard library will comply > > > with PEP 8. I've always found it weird that - in the logging module, > > > for example - I have to get the logger via getLogger instead of > > > get_logger. I understand that the logging module is older than PEP 8 > > > and therefore couldn't be changed. So if there's a time to "fix" > > > logging, it'd probably be now. > > > > If I were in the position to decide I would rather change the PEP than > > the logging module. I prefer Zope 3 style camel case names for public > > attributes and methods > > (http://wiki.zope.org/zope3/ZopePythonNamingConventions point 3) over > > underscore names. I like to see the camel case style for public names as > > an alternative in PEP 8. I find it easier to read and less to type. But > > again it is just my personal and subjective opinion. > > Let's not start another bikeshed color debate. The PEP has been > discussed, discussed again, and accepted. > Not trying to continue the bikeshed debate but just pointing out an area in PEP 8 which could be improved. I would like to see PEP 8 remove the "as necessary to improve readability" in the function and method naming conventions. That way methods like StringIO.getvalue() can be renamed to StringIO.get_value(). from PEP 8 Function Names Function names should be lowercase, with words separated by underscores as necessary to improve readability. ... Method Names and Instance Variables Use the function naming rules: lowercase with words separated by underscores as necessary to improve readability. ... John From greg at krypto.org Tue Aug 28 01:33:45 2007 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 27 Aug 2007 16:33:45 -0700 Subject: [Python-3000] Does bytes() need to support bytes(, )? In-Reply-To: References: Message-ID: <52dc1c820708271633k42b64363ufd64650d1fc02cb8@mail.gmail.com> +1 from me, i don't see a reason for bytes(s, e) to exist when s.encode(e) does the same job and is more symmetric. On 8/27/07, Guido van Rossum wrote: > I'm still working on stricter enforcement of the "don't mix str and > bytes" rule. I'm finding a lot of trivial problems, which are > relatively easy to fix but time-consuming. > > While doing this, I realize there are two idioms for converting a str > to bytes: s.encode(e) or bytes(s, e). These have identical results. I > think we can't really drop s.encode(), for symmetry with b.decode(). > So is bytes(s, e) redundant? > > To make things murkier, str(b, e) is not quite redundant compared to > b.encode(e), since str(b, e) also accepts buffer objects. But this > doesn't apply to bytes(s, e) -- that one *only* accepts str. (NB: > bytes(x) is a different API and accepts a different set of types.) > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/greg%40krypto.org > From guido at python.org Tue Aug 28 02:16:37 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Aug 2007 17:16:37 -0700 Subject: [Python-3000] Need help enforcing strict str/bytes distinctions Message-ID: As anyone following the py3k checkins should have figured out by now, I'm on a mission to require all code to be consistent about bytes vs. str. For example binary files will soon refuse str arguments to write(), and vice versa. I have a patch that turns on this enforcement, but I have anout 14 failing unit tests that require a lot of attention. I'm hoping a few folks might have time to help out. Here are the unit tests that still need work: test_asynchat test_bsddb3 test_cgi test_cmd_line test_csv test_doctest test_gettext test_httplib test_shelve test_sqlite test_tarfile test_urllib test_urllib2 test_urllib2_localnet Attached is the patch that makes them fail. Note that it forces an error when you use PyBUF_CHARACTERS when calling PyObject_GetBuffer on a str (PyUnicode) object. -- --Guido van Rossum (home page: http://www.python.org/~guido/) -------------- next part -------------- A non-text attachment was scrubbed... Name: strictbytes.diff Type: text/x-patch Size: 11911 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070827/cce5ffb8/attachment-0001.bin From adam at hupp.org Tue Aug 28 03:58:51 2007 From: adam at hupp.org (Adam Hupp) Date: Mon, 27 Aug 2007 21:58:51 -0400 Subject: [Python-3000] Support for newline and encoding arguments to open in tempfile module, also mktemp deprecation In-Reply-To: References: <766a29bd0708261800y376b65a9n4723910ea27e17f6@mail.gmail.com> Message-ID: <766a29bd0708271858x2d76d15bn4344b7d431e0ac3f@mail.gmail.com> On 8/26/07, Guido van Rossum wrote: > > Hm, why not just create the temporary file in binary mode and wrap an > io.TextIOWrapper instance around it? That works, but leaves TemporaryFile with a text mode that is somewhat crippled. TemporaryFile unconditionally uses the default filesystem encoding when in text mode so it can't be relied upon to hold arbitrary strings. This is error prone and confusing IMO. An additional reason for adding newline and encoding: TemporaryFile has always taken all of the optional arguments open() has, namely 'mode' and 'bufsize'. There is a nice symmetry in adding these new arguments as well. -- Adam Hupp | http://hupp.org/adam/ From guido at python.org Tue Aug 28 04:03:22 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Aug 2007 19:03:22 -0700 Subject: [Python-3000] Support for newline and encoding arguments to open in tempfile module, also mktemp deprecation In-Reply-To: <766a29bd0708271858x2d76d15bn4344b7d431e0ac3f@mail.gmail.com> References: <766a29bd0708261800y376b65a9n4723910ea27e17f6@mail.gmail.com> <766a29bd0708271858x2d76d15bn4344b7d431e0ac3f@mail.gmail.com> Message-ID: OK, I think you've convinced me. Now, how about also making the default mode be text instead of binary? I've got a hunch that text files are used more than binary files, even where temporary files are concerned. --Guido On 8/27/07, Adam Hupp wrote: > On 8/26/07, Guido van Rossum wrote: > > > > Hm, why not just create the temporary file in binary mode and wrap an > > io.TextIOWrapper instance around it? > > That works, but leaves TemporaryFile with a text mode that is somewhat > crippled. TemporaryFile unconditionally uses the default filesystem > encoding when in text mode so it can't be relied upon to hold > arbitrary strings. This is error prone and confusing IMO. > > An additional reason for adding newline and encoding: TemporaryFile > has always taken all of the optional arguments open() has, namely > 'mode' and 'bufsize'. There is a nice symmetry in adding these new > arguments as well. > > > -- > Adam Hupp | http://hupp.org/adam/ > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From stephen at xemacs.org Tue Aug 28 04:09:00 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 28 Aug 2007 11:09:00 +0900 Subject: [Python-3000] PyBuffer ndim unsigned In-Reply-To: <52dc1c820708252002v3efce97eu869fd46e97e88271@mail.gmail.com> References: <52dc1c820708251754w467f207amf09c5d6deea89cb0@mail.gmail.com> <52dc1c820708252002v3efce97eu869fd46e97e88271@mail.gmail.com> Message-ID: <87ir70mmv7.fsf@uwakimon.sk.tsukuba.ac.jp> Gregory P. Smith writes: > heh good point. ignore that thought. python is a signed language. :) For what little it's worth, I object strongly. The problem isn't Python, it's C. Because the rules give unsigned precedence over signed in implicit conversions, mixed signed/unsigned arithmetic in C is just a world of pain. It's especially dangerous when dealing with Unix-convention stream functions where non-negative returns are lengths and negative returns are error codes. Often the only indication you get is one of those stupid "due to insufficient range of type comparison is always true" warnings. In my experience except when dealing with standard functions of unsigned type, it's best to avoid unsigned like the plague. It's worth the effort of doing a range check on an unsigned return and then stuffing it into a signed if you got one big enough. Sign me ... "Escaped from Unsigned Purgatory in XEmacs" From adam at hupp.org Tue Aug 28 04:54:31 2007 From: adam at hupp.org (Adam Hupp) Date: Mon, 27 Aug 2007 22:54:31 -0400 Subject: [Python-3000] Need help enforcing strict str/bytes distinctions In-Reply-To: References: Message-ID: <766a29bd0708271954t49ec448fhaac11c633de447f0@mail.gmail.com> This patch (already in the tracker) fixes test_csv: http://bugs.python.org/issue1033 On 8/27/07, Guido van Rossum wrote: > As anyone following the py3k checkins should have figured out by now, > I'm on a mission to require all code to be consistent about bytes vs. > str. For example binary files will soon refuse str arguments to > write(), and vice versa. > > I have a patch that turns on this enforcement, but I have anout 14 > failing unit tests that require a lot of attention. I'm hoping a few > folks might have time to help out. > > Here are the unit tests that still need work: > > test_asynchat > test_bsddb3 > test_cgi > test_cmd_line > test_csv > test_doctest > test_gettext > test_httplib > test_shelve > test_sqlite > test_tarfile > test_urllib > test_urllib2 > test_urllib2_localnet > > Attached is the patch that makes them fail. Note that it forces an > error when you use PyBUF_CHARACTERS when calling PyObject_GetBuffer on > a str (PyUnicode) object. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/adam%40hupp.org > > > -- Adam Hupp | http://hupp.org/adam/ From barry at python.org Tue Aug 28 04:57:52 2007 From: barry at python.org (Barry Warsaw) Date: Mon, 27 Aug 2007 22:57:52 -0400 Subject: [Python-3000] Does bytes() need to support bytes(, )? In-Reply-To: References: Message-ID: <06E2D54D-1F9A-4B66-ACE8-692A6BF93CA6@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 27, 2007, at 4:38 PM, Guido van Rossum wrote: > I'm still working on stricter enforcement of the "don't mix str and > bytes" rule. I'm finding a lot of trivial problems, which are > relatively easy to fix but time-consuming. > > While doing this, I realize there are two idioms for converting a str > to bytes: s.encode(e) or bytes(s, e). These have identical results. I > think we can't really drop s.encode(), for symmetry with b.decode(). > So is bytes(s, e) redundant? I think it might be. I've hit this several time while working on the email package and it's certainly confusing. I've also run into situations where I did not like the default e=utf-8 argument for bytes (). Sometimes I am able to work around failures by doing this: "bytes (ord(c) for c in s)" until I found "bytes(s, 'raw-unicode-escape')" I'm probably doing something really dumb to need that, but it does get me farther along. I do intend to go back and look at those (there are only a few) when I get the rest of the package working again. Getting back to the original question, I'd like to see "bytes(s, e)" dropped in favor of "s.encode(e)" and maayyybeee (he says bracing for the shout down) "bytes(s)" to be defined as "bytes(s, 'raw-unicode- escape')". - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCUAwUBRtOPMXEjvBPtnXfVAQKOoAP3RDpIXe1LHFCZuZmCGUlkg579RftvV4H+ Q8Roy+RUbCBlw17dZjjlfVUyESdCnLF0Pv2LHKm6fIvsUeKRpFFFeNbV71aTk8kB zaZixFIhH7pQMReHiQ6Ich8SBnIxj0Hixz4KQ7tp8w1TENOE9secAtTWPhWSwIZU 09XeNyFXJw== =orby -----END PGP SIGNATURE----- From barry at python.org Tue Aug 28 05:02:34 2007 From: barry at python.org (Barry Warsaw) Date: Mon, 27 Aug 2007 23:02:34 -0400 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net> References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net> Message-ID: <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote: > I would like to see PEP 8 remove the "as necessary to improve > readability" in the function and method naming conventions. That > way methods like StringIO.getvalue() can be renamed to > StringIO.get_value(). +1 - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtOQSnEjvBPtnXfVAQJASQQAlkOOBa0Nvznx3saiN3d3SuzPA1AqhOqU 4D3lRSh4o6UdlorsXKYtP7KJJqha01lE5zb3hc4u3okmt6zXL11CKu74hBDTbMrR 5b3Q3Gw8b6Uvw+YqYF5P/39VkaEb3/FJ9Fq7r5qP4d8m3xAieAEJXsQdIewM++qW 5TFohaILL28= =b+dD -----END PGP SIGNATURE----- From guido at python.org Tue Aug 28 05:20:15 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Aug 2007 20:20:15 -0700 Subject: [Python-3000] Does bytes() need to support bytes(, )? In-Reply-To: <06E2D54D-1F9A-4B66-ACE8-692A6BF93CA6@python.org> References: <06E2D54D-1F9A-4B66-ACE8-692A6BF93CA6@python.org> Message-ID: On 8/27/07, Barry Warsaw wrote: > On Aug 27, 2007, at 4:38 PM, Guido van Rossum wrote: > > > I'm still working on stricter enforcement of the "don't mix str and > > bytes" rule. I'm finding a lot of trivial problems, which are > > relatively easy to fix but time-consuming. > > > > While doing this, I realize there are two idioms for converting a str > > to bytes: s.encode(e) or bytes(s, e). These have identical results. I > > think we can't really drop s.encode(), for symmetry with b.decode(). > > So is bytes(s, e) redundant? > > I think it might be. I've hit this several time while working on the > email package and it's certainly confusing. I've also run into > situations where I did not like the default e=utf-8 argument for bytes > (). Sometimes I am able to work around failures by doing this: "bytes > (ord(c) for c in s)" until I found "bytes(s, 'raw-unicode-escape')" > > I'm probably doing something really dumb to need that, but it does > get me farther along. I do intend to go back and look at those > (there are only a few) when I get the rest of the package working again. > > Getting back to the original question, I'd like to see "bytes(s, e)" > dropped in favor of "s.encode(e)" and maayyybeee (he says bracing for > the shout down) "bytes(s)" to be defined as "bytes(s, 'raw-unicode- > escape')". I see a consensus developing for dropping bytes(s, e). Start avoiding it like the plague now to help reduce the work needed once it's actually gone. But I don't see the point of defaulting to raw-unicode-escape -- what's the use case for that? I think you should just explicitly say s.encode('raw-unicode-escape') where you need that. Any reason you can't? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 28 05:22:06 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Aug 2007 20:22:06 -0700 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org> References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net> <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org> Message-ID: On 8/27/07, Barry Warsaw wrote: > > On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote: > > > I would like to see PEP 8 remove the "as necessary to improve > > readability" in the function and method naming conventions. That > > way methods like StringIO.getvalue() can be renamed to > > StringIO.get_value(). > > +1 > - -Barry Sure, but after the 3.0a1 release (slated for 8/31, i.e. this Friday). We've got enough changes coming down the pike already that affect every other file, and IMO this clearly belongs to the library reorg. (I'm personally perfectly fine with getvalue(), but I understand others don't see it that way.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From stephen at xemacs.org Tue Aug 28 05:30:20 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 28 Aug 2007 12:30:20 +0900 Subject: [Python-3000] Python-3000 Digest, Vol 18, Issue 116 In-Reply-To: <082620071748.1124.46D1BCFD000DEB390000046422120207840E9D0E030E0CD203D202080106@comcast.net> References: <082620071748.1124.46D1BCFD000DEB390000046422120207840E9D0E030E0CD203D202080106@comcast.net> Message-ID: <87veb0z67n.fsf@uwakimon.sk.tsukuba.ac.jp> john.m.camara at comcast.net writes: > Python can't include all the major packages but it is necessary for any > language to support a good GUI package in order to be widely adopted > by the masses. [...] My vote would > be for wxPython but I'm not someone who truly cares much about GUIs > as I much prefer to write the back ends of systems and stay far away from > the front ends. My experience with wxPython on Mac OS X using the MacPorts (formerly DarwinPorts) distribution has been somewhat annoying. wxPython seems to be closely bound to wxWindows, which in turn has a raft of dependencies making upgrades delicate. It also seems to be quite heavy compared to the more specialized GUIs like PyGTK and PyQt. From guido at python.org Tue Aug 28 05:36:30 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 27 Aug 2007 20:36:30 -0700 Subject: [Python-3000] Need help enforcing strict str/bytes distinctions In-Reply-To: <766a29bd0708271954t49ec448fhaac11c633de447f0@mail.gmail.com> References: <766a29bd0708271954t49ec448fhaac11c633de447f0@mail.gmail.com> Message-ID: So it does! Thanks! (Did you perchance borrow my time machine? :-) --Guido On 8/27/07, Adam Hupp wrote: > This patch (already in the tracker) fixes test_csv: > > http://bugs.python.org/issue1033 > > On 8/27/07, Guido van Rossum wrote: > > As anyone following the py3k checkins should have figured out by now, > > I'm on a mission to require all code to be consistent about bytes vs. > > str. For example binary files will soon refuse str arguments to > > write(), and vice versa. > > > > I have a patch that turns on this enforcement, but I have anout 14 > > failing unit tests that require a lot of attention. I'm hoping a few > > folks might have time to help out. > > > > Here are the unit tests that still need work: > > > > test_asynchat > > test_bsddb3 > > test_cgi > > test_cmd_line > > test_csv > > test_doctest > > test_gettext > > test_httplib > > test_shelve > > test_sqlite > > test_tarfile > > test_urllib > > test_urllib2 > > test_urllib2_localnet > > > > Attached is the patch that makes them fail. Note that it forces an > > error when you use PyBUF_CHARACTERS when calling PyObject_GetBuffer on > > a str (PyUnicode) object. > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/adam%40hupp.org > > > > > > > > > -- > Adam Hupp | http://hupp.org/adam/ > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From stephen at xemacs.org Tue Aug 28 05:36:56 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 28 Aug 2007 12:36:56 +0900 Subject: [Python-3000] Py3k Sprint Tasks (Google Docs & Spreadsheets) In-Reply-To: <9CBCCF2F-B428-4D37-8C18-1EAFB86CD7D9@python.org> References: <93DBB66F-5D0D-4E46-8480-D2BFC693722A@python.org> <87y7g0401v.fsf@uwakimon.sk.tsukuba.ac.jp> <9CBCCF2F-B428-4D37-8C18-1EAFB86CD7D9@python.org> Message-ID: <87tzqkz5wn.fsf@uwakimon.sk.tsukuba.ac.jp> Barry Warsaw writes: > Stephen, sorry to hear about your daughter and I hope she's going to > be okay of course! Oh, she's *fine*. There's just a conflict between the Japanese practice of vaccinating all school children against TB, and the U.S. practice of testing for TB antibodies. About 1 in 3 kids coming from Japan to U.S. schools get snagged. Annoying, but I'll trade this for the problems with visas and the like that colleagues have had *any* day. > haven't even looked at test_email_codecs.py yet. Because of the way > things are going to work with in put and output codecs, I'll > definitely want to get some sanity checks with Asian codecs. OK, *that* I can help with! From talin at acm.org Tue Aug 28 07:36:02 2007 From: talin at acm.org (Talin) Date: Mon, 27 Aug 2007 22:36:02 -0700 Subject: [Python-3000] Python-3000 Digest, Vol 18, Issue 116 In-Reply-To: <87veb0z67n.fsf@uwakimon.sk.tsukuba.ac.jp> References: <082620071748.1124.46D1BCFD000DEB390000046422120207840E9D0E030E0CD203D202080106@comcast.net> <87veb0z67n.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <46D3B442.20600@acm.org> Stephen J. Turnbull wrote: > john.m.camara at comcast.net writes: > > > Python can't include all the major packages but it is necessary for any > > language to support a good GUI package in order to be widely adopted > > by the masses. [...] My vote would > > be for wxPython but I'm not someone who truly cares much about GUIs > > as I much prefer to write the back ends of systems and stay far away from > > the front ends. > > My experience with wxPython on Mac OS X using the MacPorts (formerly > DarwinPorts) distribution has been somewhat annoying. wxPython seems > to be closely bound to wxWindows, which in turn has a raft of > dependencies making upgrades delicate. It also seems to be quite > heavy compared to the more specialized GUIs like PyGTK and PyQt. Part of the problem is that all GUI toolkits today are heavy, because the set of standard widgets that a GUI toolkit is expected to support has grown enormously. A typical UI programmer today would be very disappointed in a toolkit that didn't support, say, multi-column grids, dynamic layout, tabbed dialogs, toolbars, static HTML rendering, and so on. I myself generally won't bother with a GUI toolkit that doesn't have draggable tabbed document windows, since I tend to design apps that use that style of document management. I know that Greg Ewing was working on a "minimal" python GUI (http://www.cosc.canterbury.ac.nz/greg.ewing/python_gui/), but it hasn't been updated in over a year. And I'm not sure that a minimal toolkit is really all that useful. Even if you restricted it to only those things needed to write IDLE, that still means you have to have a text editor widget which is itself a major component. But I sure would like a completely "Pythonic" GUI that supported all of the features that I need. -- Talin From greg.ewing at canterbury.ac.nz Tue Aug 28 08:06:05 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 28 Aug 2007 18:06:05 +1200 Subject: [Python-3000] How should the hash digest of a Unicode string be computed? In-Reply-To: <52dc1c820708262043s1358ec81mfdf39b309381f249@mail.gmail.com> References: <52dc1c820708261554n6b31e40bya6d885c0f683a633@mail.gmail.com> <52dc1c820708262043s1358ec81mfdf39b309381f249@mail.gmail.com> Message-ID: <46D3BB4D.1000505@canterbury.ac.nz> Gregory P. Smith wrote: > A thought that just occurred to me... Would a PyBUF_CANONICAL flag be > useful instead of CHARACTERS? I don't think the buffer API should be allowing for anything that requires, or could require, the provider to convert the data into a different form. It should stick to being a way of getting direct access to the underlying data, whatever form it's in. There could be type codes for various representations of Unicode if desired. But re-encoding should be a separate step. -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 28 08:15:37 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 28 Aug 2007 18:15:37 +1200 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: References: <20070822235929.GA12780@electricrain.com> <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> <46CE5346.10301@canterbury.ac.nz> <20070824165823.GM24059@electricrain.com> <20070827075925.GT24059@electricrain.com> Message-ID: <46D3BD89.4000909@canterbury.ac.nz> Guido van Rossum wrote: > someone evil could still produce a phase error by > changing the contents while you're looking at it (basically sabotaging > their own application) but I don't see how they could cause a segfault > that way. Maybe not in the same program, but if the data is output from the back end of a compiler, and it gets corrupted, then when you try to run the resulting object file... -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 28 08:27:39 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 28 Aug 2007 18:27:39 +1200 Subject: [Python-3000] Does bytes() need to support bytes(, )? In-Reply-To: References: Message-ID: <46D3C05B.2060706@canterbury.ac.nz> Guido van Rossum wrote: > I think we can't really drop s.encode(), for symmetry with b.decode(). Do we actually need b.decode()? -- Greg From greg at krypto.org Tue Aug 28 08:33:24 2007 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 27 Aug 2007 23:33:24 -0700 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: References: <46CD2209.8000408@v.loewis.de> <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> <46CE5346.10301@canterbury.ac.nz> <20070824165823.GM24059@electricrain.com> <20070827075925.GT24059@electricrain.com> Message-ID: <52dc1c820708272333m517bb5d0v3c658f9eb46ea16b@mail.gmail.com> > Adding data locking shouldn't be too complicated, but is it necessary? > The bytes object does support locking the buffer in place; isn't that > enough? It means someone evil could still produce a phase error by > changing the contents while you're looking at it (basically sabotaging > their own application) but I don't see how they could cause a segfault > that way. I'm sure the BerkeleyDB library is not expecting the data passed in as a lookup key to change mid database traversal. No idea if it'll handle that gracefully or not but I wouldn't expect it to and bet its possible to cause a segfault and/or irrepairable database damage that way. The same goes for any other C APIs that you may pass data to that release the GIL. > Even if you really need the LOCKDATA feature, perhaps you can check in > a slight mod of your code that uses SIMPLE for now -- use a macro for > the flags that's defined as PyBUF_SIMPLE and add a comment that you'd > like it to be LOCKDATA once bytes support that. > > That way we have less code in the tracker and more in subversion -- > always a good thing IMO. yeah i have it almost working in SIMPLE mode for now, i'll check it in soon. its no worse in behavior than the existing bytes object using code currently checked in. From greg.ewing at canterbury.ac.nz Tue Aug 28 08:39:20 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 28 Aug 2007 18:39:20 +1200 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org> References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net> <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org> Message-ID: <46D3C318.60403@canterbury.ac.nz> On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote: > I would like to see PEP 8 remove the "as necessary to improve > readability" in the function and method naming conventions. That would leave a whole bunch of built-in stuff non-conforming with PEP 8, though... -- Greg From greg.ewing at canterbury.ac.nz Tue Aug 28 09:18:13 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 28 Aug 2007 19:18:13 +1200 Subject: [Python-3000] Python-3000 Digest, Vol 18, Issue 116 In-Reply-To: <46D3B442.20600@acm.org> References: <082620071748.1124.46D1BCFD000DEB390000046422120207840E9D0E030E0CD203D202080106@comcast.net> <87veb0z67n.fsf@uwakimon.sk.tsukuba.ac.jp> <46D3B442.20600@acm.org> Message-ID: <46D3CC35.9040203@canterbury.ac.nz> Talin wrote: > I know that Greg Ewing was working on a "minimal" python GUI > (http://www.cosc.canterbury.ac.nz/greg.ewing/python_gui/), but it hasn't > been updated in over a year. And I'm not sure that a minimal toolkit is > really all that useful. Don't worry, I haven't given up! And I plan to support text editing by wrapping the Cocoa and gtk text widgets. My belief is that a Python GUI wrapper can be both lightweight and featureful, provided there is native support on the platform concerned for the desired features. If there isn't such support, that's a weakness of the platform, not of the pure-Python wrapper philosophy. -- Greg From nnorwitz at gmail.com Tue Aug 28 09:23:27 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Tue, 28 Aug 2007 00:23:27 -0700 Subject: [Python-3000] buildbots Message-ID: We got 'em. Let the spam begin! :-) This page is not linked from the web anywhere: http://python.org/dev/buildbot/3.0/ I'm not expecting a lot of signal out of them at the beginning. All but one has successfully compiled py3k though. I noticed there were many warnings on windows. I wonder if they are important: pythoncore - 0 error(s), 6 warning(s) _ctypes - 0 error(s), 1 warning(s) bz2 - 0 error(s), 9 warning(s) _ssl - 0 error(s), 23 warning(s) _socket - 0 error(s), 1 warning(s) On trunk, the same machine only has: bz2 - 0 error(s), 2 warning(s) There are several other known warnings on various platforms: Objects/stringobject.c:4104: warning: comparison is always false due to limited range of data type Python/import.c:886: warning: comparison is always true due to limited range of data type Python/../Objects/stringlib/unicodedefs.h:26: warning: 'STRINGLIB_CMP' defined but not used I find it interesting that the gentoo buildbot can run the tests to completion even though I can't run the tests from the command line. There was one error: Traceback (most recent call last): File "/home/buildslave/python-trunk/3.0.norwitz-x86/build/Lib/test/test_normalization.py", line 36, in test_main for line in open_urlresource(TESTDATAURL): File "/home/buildslave/python-trunk/3.0.norwitz-x86/build/Lib/io.py", line 1240, in __next__ line = self.readline() File "/home/buildslave/python-trunk/3.0.norwitz-x86/build/Lib/io.py", line 1319, in readline readahead, pending = self._read_chunk() File "/home/buildslave/python-trunk/3.0.norwitz-x86/build/Lib/io.py", line 1123, in _read_chunk pending = self._decoder.decode(readahead, not readahead) File "/home/buildslave/python-trunk/3.0.norwitz-x86/build/Lib/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 105: ordinal not in range(128) The alpha has this problem: test_socket sem_init: Too many open files Unhandled exception in thread started by > Traceback (most recent call last): File "/net/taipan/scratch1/nnorwitz/python/3.0.norwitz-tru64/build/Lib/test/test_socket.py", line 124, in clientRun self.server_ready.wait() File "/net/taipan/scratch1/nnorwitz/python/3.0.norwitz-tru64/build/Lib/threading.py", line 367, in wait self._cond.wait(timeout) File "/net/taipan/scratch1/nnorwitz/python/3.0.norwitz-tru64/build/Lib/threading.py", line 209, in wait waiter = _allocate_lock() thread.error: can't allocate lock Fatal Python error: UNREF invalid object *** IOT/Abort trap Also test_long failed on the Alpha. ia64 had this problem: test test_builtin failed -- Traceback (most recent call last): File "/home/pybot/buildarea/3.0.klose-debian-ia64/build/Lib/test/test_builtin.py", line 1474, in test_round self.assertEqual(round(1e20), 1e20) AssertionError: 0 != 1e+20 Then: test_tarfile python: Objects/exceptions.c:1392: PyUnicodeDecodeError_Create: Assertion `start < 2147483647' failed. make: *** [buildbottest] Aborted On the amd64 (ubuntu) test_unicode_file fails all 3 tests. The windows buildbot seems to be failing due to line ending issues? Another windows buildbot failed to compile: _tkinter - 3 error(s), 1 warning(s) See the link for more details. Lots of little errors. It doesn't look like any buildbot will pass on the first run. However, it looks like many are pretty close. n PS Sorry about the spam on python-checkins. It looks like there can be only a single mailing list and that it's all or nothing for getting mail. At least I didn't see an obvious way to configure by branch. You'll just have to filter out the stuff to py3k. Since I always seem to recreate the steps necessary for adding a new branch, here are some notes (mostly for me). If anyone else wants to help out with the buildbot, etc, that would be great. To add a new branch for a buildbot: * Add the branch in the buildbot master.cfg file. 2 places need to be updated. * Add new rules in the apache default configuration file (2 lines). Make sure to use the same port number in both the changes. * Check in the buildbot master config. apache config too? Remember it takes a while (30-60 seconds) to restart both apache and the buildbot master. Both need to be restarted for the change to take effect. From lars at gustaebel.de Tue Aug 28 09:44:20 2007 From: lars at gustaebel.de (Lars =?iso-8859-15?Q?Gust=E4bel?=) Date: Tue, 28 Aug 2007 09:44:20 +0200 Subject: [Python-3000] Need help enforcing strict str/bytes distinctions In-Reply-To: References: Message-ID: <20070828074420.GA15998@core.g33x.de> On Mon, Aug 27, 2007 at 05:16:37PM -0700, Guido van Rossum wrote: > As anyone following the py3k checkins should have figured out by now, > I'm on a mission to require all code to be consistent about bytes vs. > str. For example binary files will soon refuse str arguments to > write(), and vice versa. > > I have a patch that turns on this enforcement, but I have anout 14 > failing unit tests that require a lot of attention. I'm hoping a few > folks might have time to help out. > > Here are the unit tests that still need work: > [...] > test_tarfile Fixed in r57608. -- Lars Gust?bel lars at gustaebel.de The direct use of force is such a poor solution to any problem, it is generally employed only by small children and large nations. (David Friedman) From python at rcn.com Tue Aug 28 10:15:07 2007 From: python at rcn.com (Raymond Hettinger) Date: Tue, 28 Aug 2007 01:15:07 -0700 Subject: [Python-3000] Will standard library modules comply with PEP 8? References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net> <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org> Message-ID: <014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1> > On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote: > >> I would like to see PEP 8 remove the "as necessary to improve >> readability" in the function and method naming conventions. That >> way methods like StringIO.getvalue() can be renamed to >> StringIO.get_value(). Gratuitous breakage -- for nothing. This is idiotic, pedantic, and counterproductive. (No offense intended, I'm talking about the suggestion, not the suggestor). Ask ten of your programmer friends to write down "result equals object dot get value" and see if more than one in ten uses an underscore (no stacking the deck with Cobol programmers). Raymond From eric+python-dev at trueblade.com Tue Aug 28 11:49:06 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Tue, 28 Aug 2007 05:49:06 -0400 Subject: [Python-3000] buildbots In-Reply-To: References: Message-ID: <46D3EF92.6020406@trueblade.com> Neal Norwitz wrote: > There are several other known warnings on various platforms: ... > Python/../Objects/stringlib/unicodedefs.h:26: warning: 'STRINGLIB_CMP' > defined but not used I fixed this warning in r57613. Unfortunately I had to change from an inline function to a macro, but I don't see another way. From barry at python.org Tue Aug 28 13:40:23 2007 From: barry at python.org (Barry Warsaw) Date: Tue, 28 Aug 2007 07:40:23 -0400 Subject: [Python-3000] Does bytes() need to support bytes(, )? In-Reply-To: References: <06E2D54D-1F9A-4B66-ACE8-692A6BF93CA6@python.org> Message-ID: <7A1473AB-611F-4D53-82EB-E9682F2741CD@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 27, 2007, at 11:20 PM, Guido van Rossum wrote: > But I don't see the point of defaulting to raw-unicode-escape -- > what's the use case for that? I think you should just explicitly say > s.encode('raw-unicode-escape') where you need that. Any reason you > can't? Nope. So what would bytes(s) do? - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtQJp3EjvBPtnXfVAQIh8AP+KFlZjz8sF40L/6AKZNYiOHn48HBitV8a 29Blv/JhTJlt7ZLEypm+SbudCfRmQTnUoBPfvTxezKhjHzaffaZyjqB308VlPqxv nv3aTGJvxrQNDzJT1GeltddZj/GBG7Pk5ZpsjjejROe0OGHyGwpWXt0py6tfDED/ 2Dk9Fdp8zCU= =ESMM -----END PGP SIGNATURE----- From eric+python-dev at trueblade.com Tue Aug 28 13:48:24 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Tue, 28 Aug 2007 07:48:24 -0400 Subject: [Python-3000] string.Formatter class Message-ID: <46D40B88.4080202@trueblade.com> One of the things that PEP 3101 deliberately under specifies is the Formatter class, leaving decisions up to the implementation. Now that a working implementation exists, I think it's reasonable to tighten it up. I have checked in a Formatter class that specifies the following methods (in addition to the ones already defined in the PEP): parse(format_string) Loops over the format_string and returns an iterable of tuples (literal_text, field_name, format_spec, conversion). This is used by vformat to break the string in to either literal text, or fields that need expanding. If literal_text is None, then expand (field_name, format_spec, conversion) and append it to the output. If literal_text is not None, append it to the output. get_field(field_name, args, kwargs, used_args) Given a field_name as returned by parse, convert it to an object to be formatted. The default version takes strings of the form defined in the PEP, such as "0[name]" or "label.title". It records which args have been used in used_args. args and kwargs are as passed in to vformat. convert_field(value, conversion) Converts the value (returned by get_field) using the conversion (returned by the parse tuple). The default version understands 'r' (repr) and 's' (str). Given these, we can define a formatter that uses the normal syntax, but calls its arguments to get their value: ================= class CallFormatter(Formatter): def format_field(self, value, format_spec): return format(value(), format_spec) fmt = CallFormatter() print(fmt.format('*{0}*', datetime.datetime.now)) ================= which prints: *2007-08-28 07:39:29.946909* Or, something that uses vertical bars for separating markup: ================= class BarFormatter(Formatter): # returns an iterable that contains tuples of the form: # (literal_text, field_name, format_spec, conversion) def parse(self, format_string): for field in format_string.split('|'): if field[0] == '+': # it's markup field_name, _, format_spec = field[1:].partition(':') yield None, field_name, format_spec, None else: yield field, None, None, None fmt = BarFormatter() print(fmt.format('*|+0:^10s|*', 'foo')) ================= which prints: * foo * Or, define your own conversion character: ================= class XFormatter(Formatter): def convert_field(self, value, conversion): if conversion == 'x': return None if conversion == 'r': return repr(value) if conversion == 's': return str(value) return value fmt = XFormatter() print(fmt.format("{0!r}:{0!x}", fmt)) ================= which prints: <__main__.XFormatter object at 0xf6f6d2cc>:None These are obviously contrived examples, without great error checking, but I think they demonstrate the flexibility. I'm not wild about the method names, so any suggestions are appreciated. Any other comments are welcome, too. Eric. From barry at python.org Tue Aug 28 14:22:37 2007 From: barry at python.org (Barry Warsaw) Date: Tue, 28 Aug 2007 08:22:37 -0400 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net> <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 27, 2007, at 11:22 PM, Guido van Rossum wrote: > On 8/27/07, Barry Warsaw wrote: >> >> On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote: >> >>> I would like to see PEP 8 remove the "as necessary to improve >>> readability" in the function and method naming conventions. That >>> way methods like StringIO.getvalue() can be renamed to >>> StringIO.get_value(). >> >> +1 >> - -Barry > > Sure, but after the 3.0a1 release (slated for 8/31, i.e. this Friday). > We've got enough changes coming down the pike already that affect > every other file, and IMO this clearly belongs to the library reorg. Yes definitely. I was +1'ing the change to the PEP language. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtQTjXEjvBPtnXfVAQJ/ygQAoLb1mHJcrIJzZGe3ACb+crVPvtOQ8j/f 7x/LRe3ETklODsegq7+kgy353Nfob8QKLjd+AAlB/44btO6pXMth0AeUQyZ9ZPFz /BwfcDHij1UvdxfSRov9kspnGhd18rPeEfP+mnXsBGKFSgTdiCottB5C5yfmXI8z 2tEQnSQ2FGo= =XI6x -----END PGP SIGNATURE----- From bwinton at latte.ca Tue Aug 28 15:27:01 2007 From: bwinton at latte.ca (Blake Winton) Date: Tue, 28 Aug 2007 09:27:01 -0400 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: <014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1> References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net> <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org> <014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1> Message-ID: <46D422A5.7010501@latte.ca> Raymond Hettinger wrote: >> On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote: >>> I would like to see PEP 8 remove the "as necessary to improve >>> readability" in the function and method naming conventions. That >>> way methods like StringIO.getvalue() can be renamed to >>> StringIO.get_value(). > > Gratuitous breakage -- for nothing. This is idiotic, pedantic, > and counterproductive. (No offense intended, I'm talking about > the suggestion, not the suggestor). > > Ask ten of your programmer friends to write down "result equals > object dot get value" and see if more than one in ten uses an > underscore (no stacking the deck with Cobol programmers). Sure, but given the rise of Java, how many of them will spell it with a capital 'V'? ;) On the one hand, I really like consistency in my programming languages. On the other hand, a foolish consistency is the hobgoblin of little minds. Later, Blake. From benji at benjiyork.com Tue Aug 28 15:51:31 2007 From: benji at benjiyork.com (Benji York) Date: Tue, 28 Aug 2007 09:51:31 -0400 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: <46D422A5.7010501@latte.ca> References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net> <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org> <014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1> <46D422A5.7010501@latte.ca> Message-ID: <46D42863.8010300@benjiyork.com> Blake Winton wrote: > Raymond Hettinger wrote: >>> On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote: >>>> I would like to see PEP 8 remove the "as necessary to improve >>>> readability" in the function and method naming conventions. That >>>> way methods like StringIO.getvalue() can be renamed to >>>> StringIO.get_value(). >> Gratuitous breakage -- for nothing. This is idiotic, pedantic, > > and counterproductive. (No offense intended, I'm talking about > > the suggestion, not the suggestor). >> Ask ten of your programmer friends to write down "result equals > > object dot get value" and see if more than one in ten uses an >> underscore (no stacking the deck with Cobol programmers). > > Sure, but given the rise of Java, how many of them will spell it with a > capital 'V'? ;) > > On the one hand, I really like consistency in my programming languages. > On the other hand, a foolish consistency is the hobgoblin of little minds. I call quote misapplication. Having predictable identifier names isn't "foolish". Having to divine what is and is not "necessary to improve readability" isn't either, but is perhaps suboptimal. -- Benji York http://benjiyork.com From ncoghlan at gmail.com Tue Aug 28 16:08:45 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 29 Aug 2007 00:08:45 +1000 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: <46D42863.8010300@benjiyork.com> References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net> <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org> <014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1> <46D422A5.7010501@latte.ca> <46D42863.8010300@benjiyork.com> Message-ID: <46D42C6D.3050602@gmail.com> Benji York wrote: > Blake Winton wrote: >> On the one hand, I really like consistency in my programming languages. >> On the other hand, a foolish consistency is the hobgoblin of little minds. > > I call quote misapplication. Having predictable identifier names isn't > "foolish". Having to divine what is and is not "necessary to improve > readability" isn't either, but is perhaps suboptimal. On the gripping hand, breaking getattr, getitem, setattr, setitem, delattr and delitem without a *really* good reason would mean seriously annoying a heck of a lot of people for no real gain. Being more consistent in following PEP 8 would be good, particularly for stuff which is going to break (or at least need to be looked at) anyway. The question of whether or not to change things which would otherwise be fine needs to be considered far more carefully. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Tue Aug 28 17:19:15 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Aug 2007 08:19:15 -0700 Subject: [Python-3000] Does bytes() need to support bytes(, )? In-Reply-To: <46D3C05B.2060706@canterbury.ac.nz> References: <46D3C05B.2060706@canterbury.ac.nz> Message-ID: On 8/27/07, Greg Ewing wrote: > Guido van Rossum wrote: > > I think we can't really drop s.encode(), for symmetry with b.decode(). > > Do we actually need b.decode()? For symmetry with s.encode(). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 28 17:21:50 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Aug 2007 08:21:50 -0700 Subject: [Python-3000] Does bytes() need to support bytes(, )? In-Reply-To: <7A1473AB-611F-4D53-82EB-E9682F2741CD@python.org> References: <06E2D54D-1F9A-4B66-ACE8-692A6BF93CA6@python.org> <7A1473AB-611F-4D53-82EB-E9682F2741CD@python.org> Message-ID: On 8/28/07, Barry Warsaw wrote: > On Aug 27, 2007, at 11:20 PM, Guido van Rossum wrote: > > > But I don't see the point of defaulting to raw-unicode-escape -- > > what's the use case for that? I think you should just explicitly say > > s.encode('raw-unicode-escape') where you need that. Any reason you > > can't? > > Nope. So what would bytes(s) do? Raise TypeError (when s is a str). The argument to bytes() must be either an int (then it creates a zero-filled bytes bytes array of that length) or an iterable of ints (then it creates a bytes array initialized with those ints -- if any int is out of range, an exception is raised, and also if any value is not an int). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Aug 28 17:26:57 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Aug 2007 08:26:57 -0700 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: <46D42C6D.3050602@gmail.com> References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net> <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org> <014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1> <46D422A5.7010501@latte.ca> <46D42863.8010300@benjiyork.com> <46D42C6D.3050602@gmail.com> Message-ID: On 8/28/07, Nick Coghlan wrote: > Benji York wrote: > > Blake Winton wrote: > >> On the one hand, I really like consistency in my programming languages. > >> On the other hand, a foolish consistency is the hobgoblin of little minds. > > > > I call quote misapplication. Having predictable identifier names isn't > > "foolish". Having to divine what is and is not "necessary to improve > > readability" isn't either, but is perhaps suboptimal. > > On the gripping hand, breaking getattr, getitem, setattr, setitem, > delattr and delitem without a *really* good reason would mean seriously > annoying a heck of a lot of people for no real gain. > > Being more consistent in following PEP 8 would be good, particularly for > stuff which is going to break (or at least need to be looked at) anyway. > The question of whether or not to change things which would otherwise be > fine needs to be considered far more carefully. The prudent way is to change the PEP (I'll do it) but to be conservative in implementation. The PEP should be used to guide new API design and the occasional grand refactoring; it should not be used as an excuse to change every non-conforming API. That said, I do want to get rid of all module and package names that still use CapitalWords. Module names are relatively easy to fix with 2to3. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at python.org Tue Aug 28 17:42:26 2007 From: thomas at python.org (Thomas Wouters) Date: Tue, 28 Aug 2007 17:42:26 +0200 Subject: [Python-3000] Removing simple slicing In-Reply-To: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com> References: <9e804ac0708240733r1f1781e2m18902834e2218777@mail.gmail.com> Message-ID: <9e804ac0708280842p734f0603l89c587dae6631bb7@mail.gmail.com> I updated the patches destined for the trunk (slice-object support for all objects that supported simple slicing, and actual extended slicing support for most of them) and checked them in. Next stop is cleaning up the actual slice-removal bits. I do have two remaining issues: what do we do about PyMapping_Check(), and should I make the post an actual PEP? I'm thinking PyMapping_Check() should be removed, or made to look at tp_as_sequence->sq_item instead of tp_as_sequence->sq_slice and deprecated. I also think this change should be documented in an actual PEP, as it'll be a pretty big change for any C extension implementing simple slices but not slice-object support. On 8/24/07, Thomas Wouters wrote: > > > I did some work at last year's Google sprint on removing the simple > slicing API (__getslice__, tp_as_sequence->sq_slice) in favour of the more > flexible sliceobject API (__getitem__ and tp_as_mapping->mp_subscript using > slice objects as index.) For some more detail, see the semi-PEP below. (I > hesitate to call it a PEP because it's way past the Py3k PEP deadline, but > the email I was originally going to send on this subject grew in such a size > that I figured I might as well use PEP layout and use the opportunity to > record some best practices and behaviour. And the change should probably be > recorded in a PEP anyway, even though it has never been formally proposed, > just taken as a given.) > > If anyone is bored and/or interested in doing some complicated work, there > is still a bit of (optional) work to be done in this area: I uploaded > patches to be applied to the trunk SF 8 months ago -- extended slicing > support for a bunch of types. Some of that extended slicing support is > limited to step-1 slices, though, most notably UserString.MutableStringand ctypes. I can guarantee adding non-step-1 support to them is a > challenging and fulfilling exercise, having done it for several types, but I > can't muster the intellectual stamina to do it for these (to me) fringe > types. The patches can be found in Roundup: http://bugs.python.org/issue?%40search_text=&title=&%40columns=title&id=&%40columns=id&creation=&creator=twouters&activity=&%40columns=activity&%40sort=activity&actor=&type=&components=&versions=&severity=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&%40pagesize=50&%40startwith=0&%40action=search > (there doesn't seem to be a shorter URL; just search for issues created by > 'twouters' instead.) > > If nobody cares, I will be checking these patches into the trunk this > weekend (after updating them), and then update and check in the rest of the > p3yk-noslice branch into the py3k branch. > > Abstract > ======== > > This proposal discusses getting rid of the two types of slicing Python > uses, > ``simple`` and ``extended``. Extended slicing was added later, and uses a > different API at both the C and the Python level for backward > compatibility. > Extended slicing can express everything simple slicing can express, > however, making the simple slicing API practically redundant. > > A Tale of Two APIs > ================== > > Simple slicing is a slice operation without a step, Ellipsis or tuple of > slices -- the archetypical slice of just `start` and/or `stop`, with a > single colon separating them and both sides being optional:: > > L[1:3] > L[2:] > L[:-5] > L[:] > > An extended slice is any slice that isn't simple:: > > L[1:5:2] > L[1:3, 8:10] > L[1, ..., 5:-2] > L[1:3:] > > (Note that the presence of an extra colon in the last example makes the > very > first simple slice an extended slice, but otherwise expresses the exact > same > slicing operation.) > > In applying a simple slice, Python does the work of translating omitted, > out > of bounds or negative indices into the appropriate actual indices, based > on > the length of the sequence. The normalized ``start`` and ``stop`` indices > are then passed to the appropriate method: ``__getslice__``, > ``__setslice__`` or ``__delslice__`` for Python classes, > ``tp_as_sequence``'s ``sq_slice`` or ``sq_ass_slice`` for C types. > > For extended slicing, no special handling of slice indices is done. The > indices in ``start:stop:step`` are wrapped in a ``slice`` object, with > missing indices represented as None. The indices are otherwise taken > as-is. > The sequence object is then indexed with the slice object as if it were a > mapping: ``__getitem__``,`` __setitem__`` or ``__delitem__`` for Python > classes, ``tp_as_mapping``'s ``mp_subscript`` or ``mp_ass_subscript``. > It is entirely up to the sequence to interpret the meaning of missing, out > > of bounds or negative indices, let alone non-numerical indices like tuples > or Ellipsis or arbitrary objects. > > Since at least Python 2.1, applying a simple slice to an object that does > not > implement the simple slicing API will fall back to using extended slicing, > > calling __getitem__ (or mp_subscript) instead of __getslice__ (or > sq_slice), > and similarly for slice assignment/deletion. > > Problems > ======== > > Aside from the obvious disadvantage of having two ways to do the same > thing, > simple slicing is an inconvenient wart for several reasons: > > 1) It (passively) promotes supporting only simple slicing, as observed by > the builtin types only supporting extended slicing many years after > extended slicing was introduced. > > 2) The Python VM dedicates 12 of its opcodes, about 11%, to support > simple slicing, and effectively reserves another 13 for code > convenience. Reducing the Big Switch in the bytecode interpreter > would certainly not hurt Python performance. > > 5) The same goes for the number of functions, macros and > function-pointers > supporting simple slicing, although the impact would be > maintainability > and readability of the source rather than performance. > > Proposed Solution > ================= > > The proposed solution, as implemented in the p3yk-noslice SVN branch, gets > rid of the simple slicing methods and PyType entries. The simple C API > (using ``Py_ssize_t`` for start and stop) remains, but creates a slice > object as necessary instead. Various types had to be updated to support > slice objects, or improve the simple slicing case of extended slicing. > > The result is that ``__getslice__``, ``__setslice__`` and ``__delslice__`` > are no longer > called in any situation. Classes that delegate ``__getitem__`` (or the C > equivalent) to a sequence type get any slicing behaviour of that type for > free. Classes that implement their own slicing will have to be modified to > > accept slice objects and process the indices themselves. This means that > at > the C level, like is already the case at the Python level, the same method > is used for mapping-like access as for slicing. C types will still want to > > implement ``tp_as_sequence->sq_item``, but that function will only be > called > when using the ``PySequence_*Item()`` API. Those API functions do not > (yet) fall > back to using ``tp_as_mapping->mp_subscript``, although they possibly > should. > > A casualty of this change is ``PyMapping_Check()``. It used to check for > ``tp_as_mapping`` being available, and was modified to check for > ``tp_as_mapping`` but *not* ``tp_as_sequence->sq_slice`` when extended > slicing was added to the builtin types. It could conceivably check for > ``tp_as_sequence->sq_item`` instead of ``sq_slice``, but the added value > is > unclear (especially considering ABCs.) In the standard library and CPython > > itself, ``PyMapping_Check()`` is used mostly to provide early errors, for > instance by checking the arguments to ``exec()``. > > Alternate Solution > ------------------ > > A possible alternative to removing simple slicing completely, would be to > introduce a new typestruct hook, with the same signature as > ``tp_as_mapping->mp_subscript``, which would be called for slicing > operations. All as-mapping index operations would have to fall back to > this > new ``sq_extended_slice`` hook, in order for ``seq[slice(...)]`` to work > as > expected. For some added efficiency and error-checking, expressions using > actual slice syntax could compile into bytecodes specific for slicing (of > which there would only be three, instead of twelve.) This approach would > simplify C types wanting to support extended slicing but not > arbitrary-object indexing (and vice-versa) somewhat, but the benefit seems > too small to warrant the added complexity in the CPython runtime itself. > > > Implementing Extended Slicing > ============================= > > Supporting extended slicing in C types is not as easily done as supporting > simple slicing. There are a number of edgecases in interpreting the odder > combinations of ``start``, ``stop`` and ``step``. This section tries to > give > some explanations and best practices. > > Extended Slicing in C > --------------------- > > Because the mapping API takes precedence over the sequence API, any > ``tp_as_mapping->mp_subscript`` and ``tp_as_mapping->mp_ass_subscript`` > functions need to proper typechecks on their argument. In Python 2.5 and > later, this is best done using ``PyIndex_Check()`` and ``PySlice_Check()`` > > (and possibly ``PyTuple_Check()`` and comparison against ``Py_Ellipsis``.) > For compatibility with Python 2.4 and earlier, ``PyIndex_Check()`` would > have to be replaced with ``PyInt_Check()`` and ``PyLong_Check()``. > > Indices that pass ``PyIndex_Check()`` should be converted to a > ``Py_ssize_t`` using ``PyIndex_AsSsizeT()`` and delegated to > ``tp_as_sequence->sq_item``. (For compatibility with Python 2.4, use > ``PyNumber_AsLong()`` and downcast to an ``int`` instead.) > > The exact meaning of tuples of slices, and of Ellipsis, is up to the type, > as no standard-library types support it. It may be useful to use the same > convention as the Numpy package. Slices inside tuples, if supported, > should > probably follow the same rules as direct slices. > > From slice objects, correct indices can be extracted with > ``PySlice_GetIndicesEx()``. Negative and out-of-bounds indices will be > adjusted based on the provided length, but a negative ``step``, and a > ``stop`` before a ``step`` are kept as-is. This means that, for a getslice > operation, a simple for-loop can be used to visit the correct items in the > correct order:: > > for (cur = start, i = 0; i < slicelength; cur += step, i++) > dest[i] = src[cur]; > > > If ``PySlice_GetIndicesEx()`` is not appropriate, the individual indices > can > be extracted from the ``PySlice`` object. If the indices are to be > converted > to C types, that should be done using ``PyIndex_Check()``, > ``PyIndex_AsSsizeT()`` and the ``Py_ssize_t`` type, except that ``None`` > should be accepted as the default value for the index. > > For deleting slices (``mp_ass_subscript`` called with ``NULL`` as > value) where the order does not matter, a reverse slice can be turned into > > the equivalent forward slice with:: > > if (step < 0) { > stop = start + 1; > start = stop + step*(slicelength - 1) - 1; > step = -step; > } > > > For slice assignment with a ``step`` other than 1, it's usually necessary > to > require the source iterable to have the same length as the slice. When > assigning to a slice of length 0, care needs to be taken to select the > right > insertion point. For a slice S[5:2], the correct insertion point is before > > index 5, not before index 2. > > For both deleting slice and slice assignment, it is important to remember > arbitrary Python code may be executed when calling Py_DECREF() or > otherwise > interacting with arbitrary objects. Because of that, it's important your > datatype stays consistent throughout the operation. Either operate on a > copy > of your datatype, or delay (for instance) Py_DECREF() calls until the > datatype is updated. The latter is usually done by keeping a scratchpad of > > to-be-DECREF'ed items. > > Extended slicing in Python > -------------------------- > > The simplest way to support extended slicing in Python is by delegating to > an underlying type that already supports extended slicing. The class can > simply index the underlying type with the slice object (or tuple) it was > indexed with. > > Barring that, the Python code will have to pretty much apply > the same logic as the C type. ``PyIndex_AsSsizeT()`` is available as > ``operator.index()``, with a ``try/except`` block replacing > ``PyIndex_Check()``. ``isinstance(o, slice)`` and ``sliceobj.indices()`` > replace ``PySlice_Check()`` and ``PySlice_GetIndices()``, but the > slicelength > (which is provided by ``PySlice_GetIndicesEx()``) has to be calculated > manually. > > Testing extended slicing > ------------------------ > > Proper tests of extended slicing capabilities should at least include the > following (if the operations are supported), assuming a sequence of > length 10. Triple-colon notation is used everywhere so it uses extended > slicing even in Python 2.5 and earlier:: > > S[2:5:] (same as S[2:5]) > S[5:2:] (same as S[5:2], an empty slice) > S[::] (same as S[:], a copy of the sequence) > S[:2:] (same as S[:2]) > S[:11:] (same as S[:11], a copy of the sequence) > S[5::] (same as S[5:]) > S[-11::] (same as S[-11:], a copy of the sequence) > S[-5:2:1] (same as S[:2]) > S[-5:-2:2] (same as S[-5:-2], an empty slice) > S[5:2:-1] (the reverse of S[2:4]) > S[-2:-5:-1] (the reverse of S[-4:-1]) > > S[:5:2] ([ S[0], S[2], S[4] ])) > S[9::2] ([ S[9] ]) > S[8::2] ([ S[8] ]) > S[7::2] ([ S[7], S[9]]) > S[1::-1] ([ S[1], S[0] ]) > S[1:0:-1] ([ S[1] ], does not include S[0]!) > S[1:-1:-1] (an empty slice) > S[::10] ([ S[0] ]) > S[::-10] ([ S[9] ]) > > S[2:5:] = [1, 2, 3] ([ S[2], S[3], S[4] ] become [1, 2, 3]) > S[2:5:] = [1] (S[2] becomes 1, S[3] and S[4] are deleted) > S[5:2:] = [1, 2, 3] ([1, 2, 3] inserted before S[5]) > S[2:5:2] = [1, 2] ([ S[2], S[4] ] become [1, 2]) > S[5:2:-2] = [1, 2] ([ S[3], S[5] ] become [2, 1]) > S[3::3] = [1, 2, 3] ([ S[3], S[6], S[9] ] become [1, 2, 3]) > S[:-5:-2] = [1, 2] ([ S[7], S[9] ] become [2, 1]) > > S[::-1] = S (reverse S in-place awkwardly) > S[:5:] = S (replaces S[:5] with a copy of S) > > S[2:5:2] = [1, 2, 3] (error: assigning length-3 to slicelength-2) > S[2:5:2] = None (error: need iterable) > > > -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070828/1d5ece79/attachment-0001.htm From guido at python.org Tue Aug 28 17:51:41 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Aug 2007 08:51:41 -0700 Subject: [Python-3000] Immutable bytes type and bsddb or other IO In-Reply-To: <52dc1c820708272333m517bb5d0v3c658f9eb46ea16b@mail.gmail.com> References: <20070823073837.GA14725@electricrain.com> <46CD3BFF.5080904@v.loewis.de> <20070823171837.GI24059@electricrain.com> <46CE5346.10301@canterbury.ac.nz> <20070824165823.GM24059@electricrain.com> <20070827075925.GT24059@electricrain.com> <52dc1c820708272333m517bb5d0v3c658f9eb46ea16b@mail.gmail.com> Message-ID: On 8/27/07, Gregory P. Smith wrote: > I'm sure the BerkeleyDB library is not expecting the data passed in as > a lookup key to change mid database traversal. No idea if it'll > handle that gracefully or not but I wouldn't expect it to and bet its > possible to cause a segfault and/or irrepairable database damage that > way. The same goes for any other C APIs that you may pass data to > that release the GIL. In the case of BerkeleyDB I find this a weak argument -- there are so many other things you can do to that API from Python that might cause it to go beserk, that mutating the bytes while it's looking at them sounds like a rather roundabout approach to sabotage. Now, in general, I'm the first one to worry about techniques that could let "pure Python" code cause a segfault, but when using a 3rd party library, there usually isn't a choice. Yet another thing is malignant *data*, but that's not the case here -- you would have to actively write evil code to trigger this condition. So I don't see this as a security concern (otherwise the mere existence of code probably would qualify as a security concern ;-). IOW I'm not worried. (Though I'm not saying I would reject a patch that adds the data locking facility to the bytes type. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From bjourne at gmail.com Tue Aug 28 17:56:21 2007 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Tue, 28 Aug 2007 17:56:21 +0200 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: <014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1> References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net> <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org> <014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1> Message-ID: <740c3aec0708280856i6da94edfr4ca7274894315421@mail.gmail.com> On 8/28/07, Raymond Hettinger wrote: > > On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote: > > > >> I would like to see PEP 8 remove the "as necessary to improve > >> readability" in the function and method naming conventions. That > >> way methods like StringIO.getvalue() can be renamed to > >> StringIO.get_value(). > > Gratuitous breakage -- for nothing. This is idiotic, pedantic, and counterproductive. (No offense intended, I'm talking about the > suggestion, not the suggestor). Scale up. If X is the amount of pain inflicted by breaking the method name and X/10 per year is the amount gained due to improved api predictability, then the investment pays off in only 10 years. Everyone using Python 5k will thank you for it. Besides, with deprecations, changing api isn't that painful. > Ask ten of your programmer friends to write down "result equals object dot get value" and see if more than one in ten uses an > underscore (no stacking the deck with Cobol programmers). I wonder how many will write down "result equals object dot get value".... :) -- mvh Bj?rn From martin at v.loewis.de Tue Aug 28 18:22:57 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Aug 2007 18:22:57 +0200 Subject: [Python-3000] status (preparing for first alpha) In-Reply-To: References: <18130.45493.75756.332057@montanaro.dyndns.org> Message-ID: <46D44BE1.6010007@v.loewis.de> > Agreed. Neal tried to set up a buildbot on the only machine he can > easily use for this, but that's the "old gentoo box" where he keeps > getting signal 32. (I suspect this may be a kernel bug and not our > fault.) I forget who can set up buildbots -- is it Martin? Can someone > else help? It's fairly easy to do - I just have to tell the build slaves to build the 3k branch as well. The active branches (2.5, trunk, 3k) will then compete for the slaves, in a FIFO manner (assuming there are concurrent commits). Regards, Martin From martin at v.loewis.de Tue Aug 28 18:52:50 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Aug 2007 18:52:50 +0200 Subject: [Python-3000] status (preparing for first alpha) In-Reply-To: <46D44BE1.6010007@v.loewis.de> References: <18130.45493.75756.332057@montanaro.dyndns.org> <46D44BE1.6010007@v.loewis.de> Message-ID: <46D452E2.3080300@v.loewis.de> > It's fairly easy to do - I just have to tell the build slaves to > build the 3k branch as well. The active branches (2.5, trunk, 3k) > will then compete for the slaves, in a FIFO manner (assuming there > are concurrent commits). Apparently, Neal already did that. The 3k buildbots are at http://www.python.org/dev/buildbot/3.0/ Currently, the tests pass on none of these machines; some fail to build. Regards, Martin From theller at ctypes.org Tue Aug 28 18:57:13 2007 From: theller at ctypes.org (Thomas Heller) Date: Tue, 28 Aug 2007 18:57:13 +0200 Subject: [Python-3000] buildbots Message-ID: <46D453E9.4020903@ctypes.org> Unfortunately, I read nearly all my mailing lists through gmane with nntp - and gmane is down currently (it doesn't deliver new messages any more). So I cannot write a reply in the original thread :-( Neal: > > We got 'em. Let the spam begin! :-) > > > > This page is not linked from the web anywhere: > > http://python.org/dev/buildbot/3.0/ > > > > I'm not expecting a lot of signal out of them at the beginning. All > > but one has successfully compiled py3k though. I noticed there were > > many warnings on windows. I wonder if they are important: > > > > pythoncore - 0 error(s), 6 warning(s) > > _ctypes - 0 error(s), 1 warning(s) > > bz2 - 0 error(s), 9 warning(s) > > _ssl - 0 error(s), 23 warning(s) > > _socket - 0 error(s), 1 warning(s) > > > > On trunk, the same machine only has: > > bz2 - 0 error(s), 2 warning(s) Since the tests fail on the trunk (on the windows machines), the 'clean' step is not run. So, the next build is not a complete rebuild, and only some parts are actually compiled. IMO. If you look at later compiler runs (Windows, py3k), you see that there are also less errors. > > The windows buildbot seems to be failing due to line ending issues? Yes. http://bugs.python.org/issue1029 fixes the problem. I've also recorded this failure in http://bugs.python.org/issue1041. Other windows build problems are recorded as issues 1039, 1040, 1041, 1042, 1043. The most severe is http://bugs.python.org/issue1039, since it causes the buildbot test runs to hang (an assertion in the debug MS runtime library displays a messagebox). > > Another windows buildbot failed to compile: > > _tkinter - 3 error(s), 1 warning(s) I'll have to look into that. This is the win64 buildbot. Thomas From martin at v.loewis.de Tue Aug 28 19:31:39 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Aug 2007 19:31:39 +0200 Subject: [Python-3000] buildbots In-Reply-To: References: Message-ID: <46D45BFB.6090501@v.loewis.de> > * Add the branch in the buildbot master.cfg file. 2 places need to be updated. > * Add new rules in the apache default configuration file (2 lines). > Make sure to use the same port number in both the changes. > * Check in the buildbot master config. apache config too? * Edit pydotorg:build/data/dev/buildbot/content.ht Regards, Martin From martin at v.loewis.de Tue Aug 28 19:35:06 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Aug 2007 19:35:06 +0200 Subject: [Python-3000] buildbots In-Reply-To: <46D453E9.4020903@ctypes.org> References: <46D453E9.4020903@ctypes.org> Message-ID: <46D45CCA.3050206@v.loewis.de> > Since the tests fail on the trunk (on the windows machines), > the 'clean' step is not run. No. The 'clean' step is run even if the test step failed. The problem must be somewhere else: for some reason, the connection breaks down/times out; this causes the build to abort. Can you check the slave logfile to see what they say? Regards, Martin From theller at ctypes.org Tue Aug 28 21:06:02 2007 From: theller at ctypes.org (Thomas Heller) Date: Tue, 28 Aug 2007 21:06:02 +0200 Subject: [Python-3000] buildbots In-Reply-To: <46D462EB.4070600@ctypes.org> References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> Message-ID: <46D4721A.2040208@ctypes.org> Thomas Heller schrieb: > Martin v. L?wis schrieb: >>> Since the tests fail on the trunk (on the windows machines), >>> the 'clean' step is not run. >> >> No. The 'clean' step is run even if the test step failed. >> >> The problem must be somewhere else: for some reason, the >> connection breaks down/times out; this causes the build >> to abort. >> > On the windows buildbot named x86 XP-3 trunk I see: > > An XP firewall message box asking if python_d should be unblocked (which is possibly unrelated). > > A Debug assertion message box. Clicking 'Retry' to debug start Visual Studio, > it points at line 1343 in db-4.4.20\log\log_put.c: > > /* > * If the open failed for reason other than the file > * not being there, complain loudly, the wrong user > * probably started up the application. > */ > if (ret != ENOENT) { > __db_err(dbenv, > "%s: log file unreadable: %s", *namep, db_strerror(ret)); > =>> return (__db_panic(dbenv, ret)); > } > > Now that I have written this I'm not so sure any longer whether this was for the trunk > or the py3k build ;-(. I've checked again: it is in the trunk. Do you know if it is possible to configure windows so that debug assertions do NOT display a message box (it is very convenient for interactive testing, but not so for automatic tests)? Thomas From eric+python-dev at trueblade.com Tue Aug 28 22:33:42 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Tue, 28 Aug 2007 16:33:42 -0400 Subject: [Python-3000] string.Formatter class In-Reply-To: <46D40B88.4080202@trueblade.com> References: <46D40B88.4080202@trueblade.com> Message-ID: <46D486A6.2070701@trueblade.com> Eric Smith wrote: > One of the things that PEP 3101 deliberately under specifies is the > Formatter class, leaving decisions up to the implementation. Now that a > working implementation exists, I think it's reasonable to tighten it up. I should also have included the recipe for the 'smart' formatter mentioned in the PEP, which automatically searches local and global namespaces: --------------------- import inspect from string import Formatter class SmartFormatter(Formatter): def vformat(self, format_string, args, kwargs): self.locals = inspect.currentframe(2).f_locals return super(SmartFormatter, self).vformat(format_string, args, kwargs) def get_value(self, key, args, kwargs): if isinstance(key, basestring): try: # try kwargs first return kwargs[key] except KeyError: try: # try locals next return self.locals[key] except KeyError: # try globals last return globals()[key] else: return args[key] def func(): var0 = 'local--0' print(fmt.format('in func 0:{var0} 1:{var1}')) fmt = SmartFormatter() var0 = 'global-0' var1 = 'global-1' func() print(fmt.format('in module 0:{var0} 1:{var1}')) --------------------- This code produces: in func 0:local--0 1:global-1 in module 0:global-0 1:global-1 From theller at ctypes.org Tue Aug 28 18:50:07 2007 From: theller at ctypes.org (Thomas Heller) Date: Tue, 28 Aug 2007 18:50:07 +0200 Subject: [Python-3000] buildbots Message-ID: <46D4523F.4040609@ctypes.org> Unfortunately, I read nearly all my mailing lists through gmane with nntp - and gmane is down currently (it doesn't deliver new messages any more). So I cannot write a reply in the original thread :-( Neal: > We got 'em. Let the spam begin! :-) > > This page is not linked from the web anywhere: > http://python.org/dev/buildbot/3.0/ > > I'm not expecting a lot of signal out of them at the beginning. All > but one has successfully compiled py3k though. I noticed there were > many warnings on windows. I wonder if they are important: > > pythoncore - 0 error(s), 6 warning(s) > _ctypes - 0 error(s), 1 warning(s) > bz2 - 0 error(s), 9 warning(s) > _ssl - 0 error(s), 23 warning(s) > _socket - 0 error(s), 1 warning(s) > > On trunk, the same machine only has: > bz2 - 0 error(s), 2 warning(s) Since the tests fail on the trunk (on the windows machines), the 'clean' step is not run. So, the next build is not a complete rebuild, and only some parts are actually compiled. IMO. If you look at later compiler runs (Windows, py3k), you see that there are also less errors. > The windows buildbot seems to be failing due to line ending issues? Yes. http://bugs.python.org/issue1029 fixes the problem. I've also recorded this failure in http://bugs.python.org/issue1041. Other windows build problems are recorded as issues 1039, 1040, 1041, 1042, 1043. The most severe is http://bugs.python.org/issue1039, since it causes the buildbot test runs to hang (an assertion in the debug MS runtime library displays a messagebox). > Another windows buildbot failed to compile: > _tkinter - 3 error(s), 1 warning(s) I'll have to look into that. This is the win64 buildbot. Thomas From theller at python.net Tue Aug 28 18:51:23 2007 From: theller at python.net (Thomas Heller) Date: Tue, 28 Aug 2007 18:51:23 +0200 Subject: [Python-3000] buildbots Message-ID: <46D4528B.5070108@python.net> Unfortunately, I read nearly all my mailing lists through gmane with nntp - and gmane is down currently (it doesn't deliver new messages any more). So I cannot write a reply in the original thread :-( Neal: > > We got 'em. Let the spam begin! :-) > > > > This page is not linked from the web anywhere: > > http://python.org/dev/buildbot/3.0/ > > > > I'm not expecting a lot of signal out of them at the beginning. All > > but one has successfully compiled py3k though. I noticed there were > > many warnings on windows. I wonder if they are important: > > > > pythoncore - 0 error(s), 6 warning(s) > > _ctypes - 0 error(s), 1 warning(s) > > bz2 - 0 error(s), 9 warning(s) > > _ssl - 0 error(s), 23 warning(s) > > _socket - 0 error(s), 1 warning(s) > > > > On trunk, the same machine only has: > > bz2 - 0 error(s), 2 warning(s) Since the tests fail on the trunk (on the windows machines), the 'clean' step is not run. So, the next build is not a complete rebuild, and only some parts are actually compiled. IMO. If you look at later compiler runs (Windows, py3k), you see that there are also less errors. > > The windows buildbot seems to be failing due to line ending issues? Yes. http://bugs.python.org/issue1029 fixes the problem. I've also recorded this failure in http://bugs.python.org/issue1041. Other windows build problems are recorded as issues 1039, 1040, 1041, 1042, 1043. The most severe is http://bugs.python.org/issue1039, since it causes the buildbot test runs to hang (an assertion in the debug MS runtime library displays a messagebox). > > Another windows buildbot failed to compile: > > _tkinter - 3 error(s), 1 warning(s) I'll have to look into that. This is the win64 buildbot. Thomas From john.m.camara at comcast.net Tue Aug 28 23:08:00 2007 From: john.m.camara at comcast.net (john.m.camara at comcast.net) Date: Tue, 28 Aug 2007 21:08:00 +0000 Subject: [Python-3000] Will standard library modules comply with PEP 8? Message-ID: <082820072108.14036.46D48EB000054A84000036D422155934140E9D0E030E0CD203D202080106@comcast.net> On 8/28/07, Nick Coghlan wrote: > On the gripping hand, breaking getattr, getitem, setattr, setitem, > delattr and delitem without a *really* good reason would mean seriously > annoying a heck of a lot of people for no real gain. > Making an exception to the naming convention for builtins seams acceptable as it's just a small list of method names for someone to remember. The bigger issue I see is the numerous inconsistencies that exist in the standard library. I know when I was learning Python (8 years ago) I found these inconsistencies annoying. I also see that everyone I convince to use Python also has this issue for the first year or 2 as they learn to master the language. At this point in time a change of names would be a pain for myself as I'm well aware of the names used in the standard library and will likely type them wrong in the begining if the change takes place. But I know it will only take a short time to get used to the new names so I see it as a small price to pay to shorten the learning curve for newbies. I'll even get over the pain of updating 300+ Kloc that I maintain. John From thomas at python.org Tue Aug 28 23:17:37 2007 From: thomas at python.org (Thomas Wouters) Date: Tue, 28 Aug 2007 23:17:37 +0200 Subject: [Python-3000] Merging the trunk SSL changes. Message-ID: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com> I'm trying to merge the trunk into the py3k branch (so I can work on removing simple slices), but the SSL changes in the trunk are in the way. That is to say, the new 'ssl' module depends on the Python 2.x layout in the 'socket' module. Specifically, that socket.socket is a wrapper class around _socket.socket, so it can subclass that and still pass a socket as a separate argument to __init__. Unfortunately, in Python 3.0, socket.socketis a subclass of _socket.socket, so that trick won't work. And there isn't really a way to fake it, either, except by making ssl.sslsocket *not* subclass socket.socket. I'm going to check in this merge despite of the extra breakage, but it would be good if someone could either fix the py3k branch proper (I don't see how), or change the trunk strategy to be more forward-compatible. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070828/a5866f77/attachment.htm From jimjjewett at gmail.com Wed Aug 29 00:47:30 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 28 Aug 2007 18:47:30 -0400 Subject: [Python-3000] Will standard library modules comply with PEP 8? In-Reply-To: <46D42863.8010300@benjiyork.com> References: <082720072216.11186.46D34D530002BA1F00002BB222155670740E9D0E030E0CD203D202080106@comcast.net> <6FB6CAC3-9E78-4EA7-BAD5-48CFCEDD8661@python.org> <014101c7e94b$8c6c8650$6701a8c0@RaymondLaptop1> <46D422A5.7010501@latte.ca> <46D42863.8010300@benjiyork.com> Message-ID: On 8/28/07, Benji York wrote: > Blake Winton wrote: > > Raymond Hettinger wrote: > >>> On Aug 27, 2007, at 6:16 PM, john.m.camara at comcast.net wrote: > >> Ask ten of your programmer friends to write down "result equals > >> object dot get value" ... > > Sure, but given the rise of Java, how many of them will spell it with a > > capital 'V'? ;) > > On the one hand, I really like consistency in my programming languages. > > On the other hand, a foolish consistency is the hobgoblin of little minds. > I call quote misapplication. Having predictable identifier names isn't > "foolish". Agreed; the question is what is predictable. When I worked in Common Lisp, the separators were usually _ or -, and it was a royal pain to remember which. In python, there isn't a consistent separator, because it can be any of runthewordstogether, wordBreaksCapitalize, IncludingFirstWord, or underscore_is_your_friend. Unfortunately, even if we picked a single convention, it still wouldn't always seem right, because sometimes the words represent different kinds of things ("get" and name) or involve acronyms that already have their own capitalization rules (HTTP). So we can't possibly get perfect consistency. If we're at (made-up numbers) 85% now, and can only get to 95%, and that extra 10% would break other consistencies that we didn't consider (such as consistency with wrapped code) ... is it really worth incompatibilities? -jJ From jimjjewett at gmail.com Wed Aug 29 01:07:49 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 28 Aug 2007 19:07:49 -0400 Subject: [Python-3000] string.Formatter class In-Reply-To: <46D40B88.4080202@trueblade.com> References: <46D40B88.4080202@trueblade.com> Message-ID: On 8/28/07, Eric Smith wrote: > parse(format_string) >... returns an iterable of tuples > (literal_text, field_name, format_spec, conversion) Which are really either (literal_text, None, None, None) or (None, field_name, format_spec, conversion) I can't help thinking that these two return types shouldn't be alternatives that both pretend to be 4-tuples. At the least, they should be "literal text" vs (field_name, format_spec, conversion) but you might want to take inspiration from the "tail" of an elementtree node, and return the field with the literal next to it as a single object. (literal_text, field_name, format_spec, conversion) Where the consumer should output the literal text followed by the results of formatting the field. And yes, the last tuple would often be (literal_text, None, None, None) to indicate no additional fields need processing. -jJ From eric+python-dev at trueblade.com Wed Aug 29 01:18:24 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Tue, 28 Aug 2007 19:18:24 -0400 Subject: [Python-3000] string.Formatter class In-Reply-To: References: <46D40B88.4080202@trueblade.com> Message-ID: <46D4AD40.9070006@trueblade.com> Jim Jewett wrote: > On 8/28/07, Eric Smith wrote: >> parse(format_string) > >> ... returns an iterable of tuples >> (literal_text, field_name, format_spec, conversion) > > Which are really either > > (literal_text, None, None, None) > or > (None, field_name, format_spec, conversion) > > I can't help thinking that these two return types shouldn't be > alternatives that both pretend to be 4-tuples. At the least, they > should be > > "literal text" > vs > (field_name, format_spec, conversion) I agree that it might not be the best interface. It was originally just an internal thing, where it didn't really matter. But since then it's (possibly) become exposed as part of the Formatter API, so rethinking it makes sense. I really didn't want to write: for result in parse(format_string): if isinstance(result, str): # it's a literal else: field_name, format_spec, conversion = result > but you might want to take inspiration from the "tail" of an > elementtree node, and return the field with the literal next to it as > a single object. > > (literal_text, field_name, format_spec, conversion) I think I like that best. Thanks! Eric. From janssen at parc.com Wed Aug 29 01:37:08 2007 From: janssen at parc.com (Bill Janssen) Date: Tue, 28 Aug 2007 16:37:08 PDT Subject: [Python-3000] Merging the trunk SSL changes. In-Reply-To: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com> References: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com> Message-ID: <07Aug28.163713pdt."57996"@synergy1.parc.xerox.com> > I'm trying to merge the trunk into the py3k branch (so I can work on > removing simple slices), but the SSL changes in the trunk are in the way. > That is to say, the new 'ssl' module depends on the Python 2.x layout in the > 'socket' module. Specifically, that socket.socket is a wrapper class around > _socket.socket, so it can subclass that and still pass a socket as a > separate argument to __init__. Unfortunately, in Python 3.0, > socket.socketis a subclass of _socket.socket, so that trick won't > work. And there isn't > really a way to fake it, either, except by making ssl.sslsocket *not* > subclass socket.socket. It's on my list -- in 3K, the plan is that ssl.SSLSocket is going to inherit from socket.socket and socket.SocketIO, and (I think) pass fileno instead of _sock. > I'm going to check in this merge despite of the extra breakage, but it would > be good if someone could either fix the py3k branch proper (I don't see > how), or change the trunk strategy to be more forward-compatible. If you can hold off one day before doing the trunk merge, I'm going to post a fix to the Windows SSL breakage this evening (PDT). Bill From thomas at python.org Wed Aug 29 01:48:39 2007 From: thomas at python.org (Thomas Wouters) Date: Wed, 29 Aug 2007 01:48:39 +0200 Subject: [Python-3000] Merging the trunk SSL changes. In-Reply-To: <3730382933471592889@unknownmsgid> References: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com> <3730382933471592889@unknownmsgid> Message-ID: <9e804ac0708281648y70d7ff68nb1a290af793c068b@mail.gmail.com> On 8/29/07, Bill Janssen wrote: > If you can hold off one day before doing the trunk merge, I'm going to > post a fix to the Windows SSL breakage this evening (PDT). Too late, sorry, it's already checked in. You can revert the SSL bits if you want, and take care to merge the proper changes later. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070829/adcfe9d6/attachment.htm From janssen at parc.com Wed Aug 29 02:15:49 2007 From: janssen at parc.com (Bill Janssen) Date: Tue, 28 Aug 2007 17:15:49 PDT Subject: [Python-3000] Merging the trunk SSL changes. In-Reply-To: <9e804ac0708281648y70d7ff68nb1a290af793c068b@mail.gmail.com> References: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com> <3730382933471592889@unknownmsgid> <9e804ac0708281648y70d7ff68nb1a290af793c068b@mail.gmail.com> Message-ID: <07Aug28.171558pdt."57996"@synergy1.parc.xerox.com> > > If you can hold off one day before doing the trunk merge, I'm going to > > post a fix to the Windows SSL breakage this evening (PDT). > > > Too late, sorry, it's already checked in. You can revert the SSL bits if you > want, and take care to merge the proper changes later. No, that's OK. I'll just (eventually) generate a 3K patch against what's in the repo. Probably not this week. Here's my work plan (from yesterday's python-dev): 1) Generate a patch to the trunk to remove all use of socket.ssl in library modules (and elsewhere except for test/test_socket_ssl.py), and switch them to use the ssl module. This would affect httplib, imaplib, poplib, smtplib, urllib, and xmlrpclib. This patch should also deprecate the use of socket.ssl, and particularly the "server" and "issuer" methods on it, which can return bad data. 2) Expand the test suite to exhaustively test edge cases, particularly things like invalid protocol ids, bad cert files, bad key files, etc. 3) Take the threaded server example in test/test_ssl.py, clean it up, and add it to the Demos directory (maybe it should be a HOWTO?). 4) Generate a patch for the Py3K branch. This patch would remove the "ssl" function from the socket module, and would also remove the "server" and "issuer" methods on the SSL context. The ssl.sslsocket class would be renamed to SSLSocket (PEP 8), and would inherit from socket.socket and io.RawIOBase. The current improvements to the Modules/_ssl.c file would be folded in. The patch would also fix all uses of socket.ssl in the other library modules. 5) Generate a package for older Pythons (2.3-2.5). This would install the ssl module, plus the improved version of _ssl.c. Needs more design. I've currently got a patch for (1). Sounds like I should switch the order of (3) and (4). Bill From guido at python.org Wed Aug 29 03:29:13 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Aug 2007 18:29:13 -0700 Subject: [Python-3000] Merging the trunk SSL changes. In-Reply-To: <-3823197267807538008@unknownmsgid> References: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com> <3730382933471592889@unknownmsgid> <9e804ac0708281648y70d7ff68nb1a290af793c068b@mail.gmail.com> <-3823197267807538008@unknownmsgid> Message-ID: On 8/28/07, Bill Janssen wrote: > > > If you can hold off one day before doing the trunk merge, I'm going to > > > post a fix to the Windows SSL breakage this evening (PDT). > > > > > > Too late, sorry, it's already checked in. You can revert the SSL bits if you > > want, and take care to merge the proper changes later. > > No, that's OK. I'll just (eventually) generate a 3K patch against > what's in the repo. Probably not this week. > > Here's my work plan (from yesterday's python-dev): > > 1) Generate a patch to the trunk to remove all use of socket.ssl in > library modules (and elsewhere except for > test/test_socket_ssl.py), and switch them to use the ssl module. > This would affect httplib, imaplib, poplib, smtplib, urllib, > and xmlrpclib. > > This patch should also deprecate the use of socket.ssl, and > particularly the "server" and "issuer" methods on it, which can > return bad data. > > 2) Expand the test suite to exhaustively test edge cases, particularly > things like invalid protocol ids, bad cert files, bad key files, > etc. > > 3) Take the threaded server example in test/test_ssl.py, clean it up, > and add it to the Demos directory (maybe it should be a HOWTO?). > > 4) Generate a patch for the Py3K branch. This patch would remove the > "ssl" function from the socket module, and would also remove the > "server" and "issuer" methods on the SSL context. The ssl.sslsocket > class would be renamed to SSLSocket (PEP 8), and would inherit > from socket.socket and io.RawIOBase. The current improvements to > the Modules/_ssl.c file would be folded in. The patch would > also fix all uses of socket.ssl in the other library modules. > > 5) Generate a package for older Pythons (2.3-2.5). This would > install the ssl module, plus the improved version of _ssl.c. > Needs more design. > > > I've currently got a patch for (1). Sounds like I should switch the > order of (3) and (4). Until ssl.py is fixed, I've added quick hacks to test_ssl.py and test_socket_ssl.py to disable these tests, so people won't be alarmed by the test failures. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Wed Aug 29 05:27:56 2007 From: janssen at parc.com (Bill Janssen) Date: Tue, 28 Aug 2007 20:27:56 PDT Subject: [Python-3000] Merging the trunk SSL changes. In-Reply-To: References: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com> <3730382933471592889@unknownmsgid> <9e804ac0708281648y70d7ff68nb1a290af793c068b@mail.gmail.com> <-3823197267807538008@unknownmsgid> Message-ID: <07Aug28.202800pdt."57996"@synergy1.parc.xerox.com> > Until ssl.py is fixed, I've added quick hacks to test_ssl.py and > test_socket_ssl.py to disable these tests, so people won't be alarmed > by the test failures. You might just want to configure out SSL support, or have Lib/ssl.py raise an ImportError, for the moment. Bill From eric+python-dev at trueblade.com Wed Aug 29 05:33:10 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Tue, 28 Aug 2007 23:33:10 -0400 Subject: [Python-3000] string.Formatter class In-Reply-To: <46D4AD40.9070006@trueblade.com> References: <46D40B88.4080202@trueblade.com> <46D4AD40.9070006@trueblade.com> Message-ID: <46D4E8F6.30508@trueblade.com> Eric Smith wrote: > Jim Jewett wrote: >> but you might want to take inspiration from the "tail" of an >> elementtree node, and return the field with the literal next to it as >> a single object. >> >> (literal_text, field_name, format_spec, conversion) > > I think I like that best. I implemented this in r57641. I think it simplifies things. At least, it's easier to explain. Due to an optimization dealing with escaped braces, it's possible for (literal, None, None, None) to be returned more than once. I don't think that's a problem, as long as it's documented. If you look at string.py's Formatter.vformat, I don't think it complicates the implementation at all. Thanks for the suggestion. From guido at python.org Wed Aug 29 05:32:31 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Aug 2007 20:32:31 -0700 Subject: [Python-3000] Merging the trunk SSL changes. In-Reply-To: <210293633547667201@unknownmsgid> References: <9e804ac0708281417i4a613e6en4a5faf86abd90b7b@mail.gmail.com> <3730382933471592889@unknownmsgid> <9e804ac0708281648y70d7ff68nb1a290af793c068b@mail.gmail.com> <-3823197267807538008@unknownmsgid> <210293633547667201@unknownmsgid> Message-ID: Yes, that makes more sense. Bah, three revisions for one. On 8/28/07, Bill Janssen wrote: > > Until ssl.py is fixed, I've added quick hacks to test_ssl.py and > > test_socket_ssl.py to disable these tests, so people won't be alarmed > > by the test failures. > > You might just want to configure out SSL support, or have Lib/ssl.py > raise an ImportError, for the moment. > > Bill > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 29 06:07:44 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Aug 2007 21:07:44 -0700 Subject: [Python-3000] Need help enforcing strict str/bytes distinctions In-Reply-To: <20070828074420.GA15998@core.g33x.de> References: <20070828074420.GA15998@core.g33x.de> Message-ID: On 8/28/07, Lars Gust?bel wrote: > On Mon, Aug 27, 2007 at 05:16:37PM -0700, Guido van Rossum wrote: > > As anyone following the py3k checkins should have figured out by now, > > I'm on a mission to require all code to be consistent about bytes vs. > > str. For example binary files will soon refuse str arguments to > > write(), and vice versa. > > > > I have a patch that turns on this enforcement, but I have anout 14 > > failing unit tests that require a lot of attention. I'm hoping a few > > folks might have time to help out. > > > > Here are the unit tests that still need work: > > [...] > > test_tarfile > > Fixed in r57608. Thanks! I fixed most others (I think); we're down to test_asynchat and test_urllib2_localnet failing. But I've checked in the main patch enforcing stricter str/bytes, just to get things rolling. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From eric+python-dev at trueblade.com Wed Aug 29 15:02:57 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 29 Aug 2007 09:02:57 -0400 Subject: [Python-3000] [Python-checkins] buildbot failure in S-390 Debian 3.0 In-Reply-To: <20070829125131.714341E4002@bag.python.org> References: <20070829125131.714341E4002@bag.python.org> Message-ID: <46D56E81.1030604@trueblade.com> The URL is getting mangled, it should be: http://www.python.org/dev/buildbot/all/S-390%20Debian%203.0/builds/9 buildbot at python.org wrote: > The Buildbot has detected a new failure of S-390 Debian 3.0. > Full details are available at: > http://www.python.org/dev/buildbot/all/S-390%2520Debian%25203.0/builds/9 > > Buildbot URL: http://www.python.org/dev/buildbot/all/ > > Build Reason: > Build Source Stamp: [branch branches/py3k] HEAD > Blamelist: eric.smith > > BUILD FAILED: failed compile > > sincerely, > -The Buildbot > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins > From guido at python.org Wed Aug 29 15:20:52 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 06:20:52 -0700 Subject: [Python-3000] [Python-checkins] buildbot failure in S-390 Debian 3.0 In-Reply-To: <46D56E81.1030604@trueblade.com> References: <20070829125131.714341E4002@bag.python.org> <46D56E81.1030604@trueblade.com> Message-ID: I noticed this failed with a traceback from distutils, caused by a bug in an exception handler; which I fixed. I also noticed that there were a *lot* of warnings like this: Objects/object.c:193: warning: format '%d' expects type 'int', but argument 7 has type 'Py_ssize_t' These can be fixed using %zd I believe. Any volunteers? This would tremendously improve 64-bit quality! --Guido On 8/29/07, Eric Smith wrote: > The URL is getting mangled, it should be: > http://www.python.org/dev/buildbot/all/S-390%20Debian%203.0/builds/9 > > buildbot at python.org wrote: > > The Buildbot has detected a new failure of S-390 Debian 3.0. > > Full details are available at: > > http://www.python.org/dev/buildbot/all/S-390%2520Debian%25203.0/builds/9 > > > > Buildbot URL: http://www.python.org/dev/buildbot/all/ > > > > Build Reason: > > Build Source Stamp: [branch branches/py3k] HEAD > > Blamelist: eric.smith > > > > BUILD FAILED: failed compile > > > > sincerely, > > -The Buildbot > > > > _______________________________________________ > > Python-checkins mailing list > > Python-checkins at python.org > > http://mail.python.org/mailman/listinfo/python-checkins > > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Wed Aug 29 16:01:17 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 29 Aug 2007 10:01:17 -0400 Subject: [Python-3000] Need help enforcing strict str/bytes distinctions In-Reply-To: References: <20070828074420.GA15998@core.g33x.de> Message-ID: On 8/29/07, Guido van Rossum wrote: > On 8/28/07, Lars Gust?bel wrote: > > On Mon, Aug 27, 2007 at 05:16:37PM -0700, Guido van Rossum wrote: > > > As anyone following the py3k checkins should have figured out by now, > > > I'm on a mission to require all code to be consistent about bytes vs. > > > str. For example binary files will soon refuse str arguments to > > > write(), and vice versa. > > > > > > I have a patch that turns on this enforcement, but I have anout 14 > > > failing unit tests that require a lot of attention. I'm hoping a few > > > folks might have time to help out. > > > > > > Here are the unit tests that still need work: > > > [...] > > > test_tarfile > > > > Fixed in r57608. > > Thanks! > > I fixed most others (I think); we're down to test_asynchat and > test_urllib2_localnet failing. But I've checked in the main patch > enforcing stricter str/bytes, just to get things rolling. I'm working on test_urllib2_localnet now. Jeremy > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu > From martin at v.loewis.de Wed Aug 29 16:05:59 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Aug 2007 16:05:59 +0200 Subject: [Python-3000] buildbots In-Reply-To: <46D4721A.2040208@ctypes.org> References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> Message-ID: <46D57D47.1090709@v.loewis.de> > Do you know if it is possible to configure windows so that debug assertions do NOT > display a message box (it is very convenient for interactive testing, but not so > for automatic tests)? You can use _set_error_mode(_OUT_TO_STDERR) to make assert() go to stderr rather than to a message box. You can use _CrtSetReportMode(_CRT_ASSERT /* or _CRT_WARN or CRT_ERROR */, _CRTDBG_MODE_FILE) to make _ASSERT() go to a file; you need to call _CrtSetReportFile( _CRT_ASSERT, _CRTDBG_FILE_STDERR ) in addition to make the file stderr. Not sure what window precisely you got, so I can't comment which of these (if any) would have made the message go away. Regards, Martin From guido at python.org Wed Aug 29 16:29:36 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 07:29:36 -0700 Subject: [Python-3000] Need help enforcing strict str/bytes distinctions In-Reply-To: References: <20070828074420.GA15998@core.g33x.de> Message-ID: Stop, I already fixed it. Sorry! On 8/29/07, Jeremy Hylton wrote: > On 8/29/07, Guido van Rossum wrote: > > On 8/28/07, Lars Gust?bel wrote: > > > On Mon, Aug 27, 2007 at 05:16:37PM -0700, Guido van Rossum wrote: > > > > As anyone following the py3k checkins should have figured out by now, > > > > I'm on a mission to require all code to be consistent about bytes vs. > > > > str. For example binary files will soon refuse str arguments to > > > > write(), and vice versa. > > > > > > > > I have a patch that turns on this enforcement, but I have anout 14 > > > > failing unit tests that require a lot of attention. I'm hoping a few > > > > folks might have time to help out. > > > > > > > > Here are the unit tests that still need work: > > > > [...] > > > > test_tarfile > > > > > > Fixed in r57608. > > > > Thanks! > > > > I fixed most others (I think); we're down to test_asynchat and > > test_urllib2_localnet failing. But I've checked in the main patch > > enforcing stricter str/bytes, just to get things rolling. > > I'm working on test_urllib2_localnet now. > > Jeremy > > > > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Wed Aug 29 18:49:08 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 29 Aug 2007 12:49:08 -0400 Subject: [Python-3000] ctype crashes Message-ID: I'm seeing a bunch of C extensions crash on my box. I'm uncertain about a few issues, but I think I'm running 32-bit binary on a 64-bit linux box. The crash I see in ctypes is the following: #0 0x080a483e in PyUnicodeUCS2_FromString (u=0x5
) at ../Objects/unicodeobject.c:471 #1 0xf7cd4f8e in z_get (ptr=0x0, size=4) at /usr/local/google/home/jhylton/python/py3k/Modules/_ctypes/cfield.c:1380 #2 0xf7ccdbb5 in Simple_get_value (self=0xf7ba8a04) at /usr/local/google/home/jhylton/python/py3k/Modules/_ctypes/_ctypes.c:3976 #3 0x0807f218 in PyObject_GenericGetAttr (obj=0xf7ba8a04, name=0xf7e26ea0) at ../Objects/object.c:1098 #4 0x080b63da in PyEval_EvalFrameEx (f=0x81ca8fc, throwflag=0) at ../Python/ceval.c:1937 I'll look at this again sometime this afternoon, but I'm headed for lunch now. Jeremy From guido at python.org Wed Aug 29 18:52:25 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 09:52:25 -0700 Subject: [Python-3000] [Python-checkins] buildbot failure in S-390 Debian 3.0 In-Reply-To: References: <20070829125131.714341E4002@bag.python.org> <46D56E81.1030604@trueblade.com> Message-ID: Never mind. Amaury pointed out that the code already includes PY_FORMAT_SIZE_T, but that particular platform doesn't support %zd. Maybe PY_FORMAT_SIZE_T should be "l" instead on that platform? (As it's not Windows I'm pretty sure sizeof(long) == sizeof(void*)...) --Guido On 8/29/07, Guido van Rossum wrote: > I noticed this failed with a traceback from distutils, caused by a bug > in an exception handler; which I fixed. > > I also noticed that there were a *lot* of warnings like this: > > Objects/object.c:193: warning: format '%d' expects type 'int', but > argument 7 has type 'Py_ssize_t' > > These can be fixed using %zd I believe. Any volunteers? This would > tremendously improve 64-bit quality! > > --Guido > > On 8/29/07, Eric Smith wrote: > > The URL is getting mangled, it should be: > > http://www.python.org/dev/buildbot/all/S-390%20Debian%203.0/builds/9 > > > > buildbot at python.org wrote: > > > The Buildbot has detected a new failure of S-390 Debian 3.0. > > > Full details are available at: > > > http://www.python.org/dev/buildbot/all/S-390%2520Debian%25203.0/builds/9 > > > > > > Buildbot URL: http://www.python.org/dev/buildbot/all/ > > > > > > Build Reason: > > > Build Source Stamp: [branch branches/py3k] HEAD > > > Blamelist: eric.smith > > > > > > BUILD FAILED: failed compile > > > > > > sincerely, > > > -The Buildbot > > > > > > _______________________________________________ > > > Python-checkins mailing list > > > Python-checkins at python.org > > > http://mail.python.org/mailman/listinfo/python-checkins > > > > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 29 19:08:44 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 10:08:44 -0700 Subject: [Python-3000] Invalid type for 'u' argument 3 Message-ID: On some buildbots I see this failure to build the datetime module: building 'datetime' extension gcc -pthread -fPIC -fno-strict-aliasing -g -Wall -Wstrict-prototypes -I. -I/home2/buildbot/slave/3.0.loewis-linux/build/./Include -I./Include -I. -I/usr/local/include -I/home2/buildbot/slave/3.0.loewis-linux/build/Include -I/home2/buildbot/slave/3.0.loewis-linux/build -c /home2/buildbot/slave/3.0.loewis-linux/build/Modules/datetimemodule.c -o build/temp.linux-i686-3.0/home2/buildbot/slave/3.0.loewis-linux/build/Modules/datetimemodule.o /home2/buildbot/slave/3.0.loewis-linux/build/Modules/datetimemodule.c: In function 'datetime_strptime': /home2/buildbot/slave/3.0.loewis-linux/build/Modules/datetimemodule.c:3791: error: Invalid type for 'u' argument 3 The source line is this: if (!PyArg_ParseTuple(args, "uu:strptime", &string, &format)) I hink this is relevant, in pyport.h: #ifdef HAVE_ATTRIBUTE_FORMAT_PARSETUPLE #define Py_FORMAT_PARSETUPLE(func,p1,p2) __attribute__((format(func,p1,p2))) #else #define Py_FORMAT_PARSETUPLE(func,p1,p2) #endif But how does this work? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Wed Aug 29 19:17:31 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Aug 2007 19:17:31 +0200 Subject: [Python-3000] [Python-checkins] buildbot failure in S-390 Debian 3.0 In-Reply-To: References: <20070829125131.714341E4002@bag.python.org> <46D56E81.1030604@trueblade.com> Message-ID: <46D5AA2B.6070409@v.loewis.de> Guido van Rossum schrieb: > Never mind. Amaury pointed out that the code already includes > PY_FORMAT_SIZE_T, but that particular platform doesn't support %zd. > Maybe PY_FORMAT_SIZE_T should be "l" instead on that platform? (As > it's not Windows I'm pretty sure sizeof(long) == sizeof(void*)...) Are you still talking about S/390? I see this from configure: checking size of int... 4 checking size of long... 4 checking size of void *... 4 checking size of size_t... 4 So: a) it's not a 64-bit system (it should then be an 31-bit system), b) Python already would use %ld if sizeof(ssize_t)!=sizeof(int), but sizeof(ssize_t)==sizeof(long) Not sure why gcc is complaining; according to this change to APR http://www.mail-archive.com/dev at apr.apache.org/msg18533.html it might still be that the warning goes away if %ld is used on S390 (similar to what is done for __APPLE__). Interestingly enough, they use this code for OSX :-) + *apple-darwin*) + osver=`uname -r` + case $osver in + [[0-7]].*) + ssize_t_fmt="d" + ;; + *) + ssize_t_fmt="ld" + ;; + esac Regards, Martin From martin at v.loewis.de Wed Aug 29 19:24:16 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Aug 2007 19:24:16 +0200 Subject: [Python-3000] Invalid type for 'u' argument 3 In-Reply-To: References: Message-ID: <46D5ABC0.8040903@v.loewis.de> > On some buildbots I see this failure to build the datetime module: See also bugs.python.org/1055. > error: Invalid type for 'u' argument 3 > > The source line is this: > > if (!PyArg_ParseTuple(args, "uu:strptime", &string, &format)) and string and format are of type char*; for "u", they should be of type Py_UNICODE*. > I hink this is relevant, in pyport.h: > > #ifdef HAVE_ATTRIBUTE_FORMAT_PARSETUPLE > #define Py_FORMAT_PARSETUPLE(func,p1,p2) __attribute__((format(func,p1,p2))) > #else > #define Py_FORMAT_PARSETUPLE(func,p1,p2) > #endif > > But how does this work? Also consider this: PyAPI_FUNC(int) PyArg_ParseTuple(PyObject *, const char *, ...) Py_FORMAT_PARSETUPLE(PyArg_ParseTuple, 2, 3); Together, they expand to int PyArg_ParseTuple(PyObject*, const char*, ...) __attribute__((format(PyArg_ParseTuple, 2, 3))); It's exactly one buildbot slave (I hope), the one that runs mvlgcc. I created a patch for GCC to check ParseTuple calls for correctness, and set up a buildbot slave to use this compiler. Regards, Martin From guido at python.org Wed Aug 29 19:28:58 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 10:28:58 -0700 Subject: [Python-3000] Invalid type for 'u' argument 3 In-Reply-To: <46D5ABC0.8040903@v.loewis.de> References: <46D5ABC0.8040903@v.loewis.de> Message-ID: Oh wow. I see, very clever and useful. It found a real bug! (Except it was transparent since these variables were only used to pass to another "uu" format, canceling out the type.) Fixed. Committed revision 57665. --Guido On 8/29/07, "Martin v. L?wis" wrote: > > On some buildbots I see this failure to build the datetime module: > > See also bugs.python.org/1055. > > > error: Invalid type for 'u' argument 3 > > > > The source line is this: > > > > if (!PyArg_ParseTuple(args, "uu:strptime", &string, &format)) > > and string and format are of type char*; for "u", they should be of > type Py_UNICODE*. > > > I hink this is relevant, in pyport.h: > > > > #ifdef HAVE_ATTRIBUTE_FORMAT_PARSETUPLE > > #define Py_FORMAT_PARSETUPLE(func,p1,p2) __attribute__((format(func,p1,p2))) > > #else > > #define Py_FORMAT_PARSETUPLE(func,p1,p2) > > #endif > > > > But how does this work? > > Also consider this: > > PyAPI_FUNC(int) PyArg_ParseTuple(PyObject *, const char *, ...) > Py_FORMAT_PARSETUPLE(PyArg_ParseTuple, 2, 3); > > Together, they expand to > > int PyArg_ParseTuple(PyObject*, const char*, ...) > __attribute__((format(PyArg_ParseTuple, 2, 3))); > > It's exactly one buildbot slave (I hope), the one that runs mvlgcc. I > created a patch for GCC to check ParseTuple calls for correctness, and > set up a buildbot slave to use this compiler. > > Regards, > Martin > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at alum.mit.edu Wed Aug 29 19:31:51 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 29 Aug 2007 13:31:51 -0400 Subject: [Python-3000] proposal: comparing bytes and str raises TypeError Message-ID: As I was cleaning up the http libraries, I noticed a lot of code that has comparisons with string literals. As we change code to return bytes instead of strings, these comparisons start to fail silently. When you're lucky, you have a test that catches the failure. In the httplib case, there were a couple places where the code got stuck in a loop, because it was waiting for a socket to return "" before exiting. There are lots of places where we are not so lucky. I made a local change to my bytesobject.c to raise an exception whenever it is compared to a PyUnicode_Object. This has caught a number of real bugs that weren't caught by the test suite. I think we should make this the expected behavior for comparisons of bytes and strings, because users are going to have the same problem and it's hard to track down without changing the interpreter. The obvious downside is that you can't have a heterogeneous containers that mix strings and bytes: >>> L = ["1", b"1"] >>> "1" in L True >>> "2" in L Traceback (most recent call last): File "", line 1, in TypeError: can't compare str and bytes But I'm not sure that we actually need to support this case. Jeremy From eric+python-dev at trueblade.com Wed Aug 29 19:34:37 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 29 Aug 2007 13:34:37 -0400 Subject: [Python-3000] Can a Python object move in memory? Message-ID: <46D5AE2D.7050005@trueblade.com> As part of the PEP 3101 stuff, I have an iterator that I've written in C. It has a PyUnicodeObject* in it, which holds the string I'm parsing. I do some parsing, return a result, do some more parsing on the next call, etc. This code is callable from Python code. I keep Py_UNICODE* pointers into this PyUnicodeObject in my iterator object, and I access these pointers on subsequent calls to my next() method. Is this an error? The more I think about it the more convinced I am it's an error. I can change it to use indexes instead of pointers pretty easily, if I need to. From skip at pobox.com Wed Aug 29 19:36:43 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 29 Aug 2007 12:36:43 -0500 Subject: [Python-3000] Will Py3K be friendlier to optimization opportunities? Message-ID: <18133.44715.247285.482372@montanaro.dyndns.org> At various times in the past Python's highly dynamic nature has gotten in the way of various optimizations (consider optimizing access to globals which a number of us have taken cracks at). I believe Guido has said on more than one occasion that he could see Python becoming a bit less dynamic to allow some of these sorts of optimizations (I hope I'm not putting words into your virtual mouth, Guido). Another thing that pops up from time-to-time is the GIL and its impact on multithreaded applications. Is Python 3 likely to change in any way so as to make future performance optimization work more fruitful? I realize that it may be more reasonable to expect extreme performance gains to come from Python-like systems like Pyrex or ShedSkin, but it might still be worthwhile to consider what might be possible after 3.0a1 is released. Based on the little reading I've done in the PEPs, the changes I've seen that lean in this direction are: * from ... import * is no longer supported at function scope * None, True and False become keywords * optional function annotations (PEP 3107) I'm sure there must be other changes which, while not strictly done to support further optimization, will allow more to be done in some areas. Is there more than that? Skip From martin at v.loewis.de Wed Aug 29 19:41:08 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Aug 2007 19:41:08 +0200 Subject: [Python-3000] Can a Python object move in memory? In-Reply-To: <46D5AE2D.7050005@trueblade.com> References: <46D5AE2D.7050005@trueblade.com> Message-ID: <46D5AFB4.30205@v.loewis.de> > I keep Py_UNICODE* pointers into this PyUnicodeObject in my iterator > object, and I access these pointers on subsequent calls to my next() > method. Is this an error? The more I think about it the more convinced > I am it's an error. Because the pointer may change? There is a (silent) promise that for a given PyUnicodeObject, the Py_UNICODE* will never change. Regards, Martin From eric+python-dev at trueblade.com Wed Aug 29 19:44:13 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Wed, 29 Aug 2007 13:44:13 -0400 Subject: [Python-3000] Can a Python object move in memory? In-Reply-To: <46D5AFB4.30205@v.loewis.de> References: <46D5AE2D.7050005@trueblade.com> <46D5AFB4.30205@v.loewis.de> Message-ID: <46D5B06D.3040002@trueblade.com> Martin v. L?wis wrote: >> I keep Py_UNICODE* pointers into this PyUnicodeObject in my iterator >> object, and I access these pointers on subsequent calls to my next() >> method. Is this an error? The more I think about it the more convinced >> I am it's an error. > > Because the pointer may change? There is a (silent) promise that for > a given PyUnicodeObject, the Py_UNICODE* will never change. Right, it's the pointer changing that I'm worried about. Should I not bother with changing my code, then? From martin at v.loewis.de Wed Aug 29 19:45:33 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Aug 2007 19:45:33 +0200 Subject: [Python-3000] Will Py3K be friendlier to optimization opportunities? In-Reply-To: <18133.44715.247285.482372@montanaro.dyndns.org> References: <18133.44715.247285.482372@montanaro.dyndns.org> Message-ID: <46D5B0BD.6010209@v.loewis.de> > Is Python 3 likely to change in any way so as to make future performance > optimization work more fruitful? I think Python 3 is likely what you see in subversion today, plus any PEPs that have been accepted and not yet implemented (there are only few of these). So any higher-reaching goal must wait for Python 4, Python 5, or Python 6. In particular, people have repeatedly requested that the GIL be removed for Python 3. There is nothing remotely resembling a patch implementing such a feature at the moment, so this won't happen. Regards, Martin From amauryfa at gmail.com Wed Aug 29 19:51:06 2007 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 29 Aug 2007 19:51:06 +0200 Subject: [Python-3000] py3k patches for Windows Message-ID: Hello, I recently created patches that correct some problems in py3k on Windows. They are: - http://bugs.python.org/issue1029 io.StringIO used to transform \n into \r\n. This problem must be fixed, if you want the stdout comparisons and doctests to succeed. - http://bugs.python.org/issue1047 converts PC/subprocess.c to full Unicode (no more PyString...) and test_subprocess passes without a change. - http://bugs.python.org/issue1048 corrects a bogus %zd format used somewhere by test_float, and prevents a crash... - http://bugs.python.org/issue1050 prevents test_marshal from crashing on debug builds where vc8 seems to insert additional items on the stack: reduce the recursion level. Would someone want to review (and discuss) them and apply to the branch? Tonight I plan to have a list of the remaining failing tests (before the buildbots ;-) ) and maybe propose corrections for some of those... -- Amaury Forgeot d'Arc From guido at python.org Wed Aug 29 19:53:28 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 10:53:28 -0700 Subject: [Python-3000] proposal: comparing bytes and str raises TypeError In-Reply-To: References: Message-ID: Thanks! I simply forgot about this. Can yuo check in the change to bytesobject.c? We'll deal with the fallout shortly. On 8/29/07, Jeremy Hylton wrote: > As I was cleaning up the http libraries, I noticed a lot of code that > has comparisons with string literals. As we change code to return > bytes instead of strings, these comparisons start to fail silently. > When you're lucky, you have a test that catches the failure. In the > httplib case, there were a couple places where the code got stuck in a > loop, because it was waiting for a socket to return "" before exiting. > There are lots of places where we are not so lucky. > > I made a local change to my bytesobject.c to raise an exception > whenever it is compared to a PyUnicode_Object. This has caught a > number of real bugs that weren't caught by the test suite. I think we > should make this the expected behavior for comparisons of bytes and > strings, because users are going to have the same problem and it's > hard to track down without changing the interpreter. > > The obvious downside is that you can't have a heterogeneous containers > that mix strings and bytes: > >>> L = ["1", b"1"] > >>> "1" in L > True > >>> "2" in L > Traceback (most recent call last): > File "", line 1, in > TypeError: can't compare str and bytes > > But I'm not sure that we actually need to support this case. > > Jeremy > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Wed Aug 29 20:12:46 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Aug 2007 20:12:46 +0200 Subject: [Python-3000] Can a Python object move in memory? In-Reply-To: <46D5B06D.3040002@trueblade.com> References: <46D5AE2D.7050005@trueblade.com> <46D5AFB4.30205@v.loewis.de> <46D5B06D.3040002@trueblade.com> Message-ID: <46D5B71E.3030409@v.loewis.de> >> Because the pointer may change? There is a (silent) promise that for >> a given PyUnicodeObject, the Py_UNICODE* will never change. > > Right, it's the pointer changing that I'm worried about. Should I not > bother with changing my code, then? Correct. If you think this promise should be given explicitly in the documentation, feel free to propose a documentation patch. Of course, if the underlying (rather, encapsulating) PyObject goes away, the pointer becomes invalid. IIUC, you have some guarantee that the unicode object will stay available all the time. Regards, Martin From theller at ctypes.org Wed Aug 29 20:37:09 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 29 Aug 2007 20:37:09 +0200 Subject: [Python-3000] buildbots In-Reply-To: <46D57D47.1090709@v.loewis.de> References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> Message-ID: Martin v. L?wis schrieb: >> Do you know if it is possible to configure windows so that debug assertions do NOT >> display a message box (it is very convenient for interactive testing, but not so >> for automatic tests)? > > You can use _set_error_mode(_OUT_TO_STDERR) to make assert() go to > stderr rather than to a message box. You can use > _CrtSetReportMode(_CRT_ASSERT /* or _CRT_WARN or CRT_ERROR */, > _CRTDBG_MODE_FILE) to make _ASSERT() go to a file; you need to > call _CrtSetReportFile( _CRT_ASSERT, _CRTDBG_FILE_STDERR ) in > addition to make the file stderr. > > Not sure what window precisely you got, so I can't comment which > of these (if any) would have made the message go away. Currently, the debug build of py3k fails in test_os.py with an assertion in the C library inside the execv call. This displays a dialogbox from the MSVC Debug Library: Debug Assertion Failed! Program: c:\svn\py3k\PCBuild\python_d.exe File: execv.c Line: 44 Expression: *argvector != NULL For information .... (Press Retry to debug the application) Abbrechen Wiederholen Ignorieren The last line is the labels on three buttons that are displayed. If I insert these statements into Modules\posixmodule.c: _CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_FILE); _CrtSetReportFile(_CRT_WARN, _CRTDBG_FILE_STDERR); _CrtSetReportMode(_CRT_ERROR, _CRTDBG_MODE_FILE); _CrtSetReportFile(_CRT_ERROR, _CRTDBG_FILE_STDERR); _CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE); _CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR); _set_error_mode(_OUT_TO_STDERR); and recompile and test then the dialog box looks like this: Die Anweisung in "0x10..." verweist auf Speicher in "0x00000000". Der Vorgang "read" konnte nicht im Speciher durchgef?hrt werden. Klicken Sie auf "OK", um das programm zu beenden. Klicken Sie auf "Abbrechen", um das programm zu debuggen. OK Abbrechen These messageboxes of course hang the tests on the windows build servers, so probably it would be good if they could be disabled completely. Thomas From jeremy at alum.mit.edu Wed Aug 29 21:04:48 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 29 Aug 2007 15:04:48 -0400 Subject: [Python-3000] ctype crashes In-Reply-To: References: Message-ID: Never mind. This was an optimized build. When I did a make clean and rebuilt, it went away. Jeremy On 8/29/07, Jeremy Hylton wrote: > I'm seeing a bunch of C extensions crash on my box. I'm uncertain > about a few issues, but I think I'm running 32-bit binary on a 64-bit > linux box. The crash I see in ctypes is the following: > > #0 0x080a483e in PyUnicodeUCS2_FromString (u=0x5
) > at ../Objects/unicodeobject.c:471 > #1 0xf7cd4f8e in z_get (ptr=0x0, size=4) > at /usr/local/google/home/jhylton/python/py3k/Modules/_ctypes/cfield.c:1380 > #2 0xf7ccdbb5 in Simple_get_value (self=0xf7ba8a04) > at /usr/local/google/home/jhylton/python/py3k/Modules/_ctypes/_ctypes.c:3976 > #3 0x0807f218 in PyObject_GenericGetAttr (obj=0xf7ba8a04, name=0xf7e26ea0) > at ../Objects/object.c:1098 > #4 0x080b63da in PyEval_EvalFrameEx (f=0x81ca8fc, throwflag=0) > at ../Python/ceval.c:1937 > > I'll look at this again sometime this afternoon, but I'm headed for lunch now. > > Jeremy > From skip at pobox.com Wed Aug 29 21:21:25 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 29 Aug 2007 14:21:25 -0500 Subject: [Python-3000] Will Py3K be friendlier to optimization opportunities? In-Reply-To: <46D5B0BD.6010209@v.loewis.de> References: <18133.44715.247285.482372@montanaro.dyndns.org> <46D5B0BD.6010209@v.loewis.de> Message-ID: <18133.50997.687840.234611@montanaro.dyndns.org> >> Is Python 3 likely to change in any way so as to make future >> performance optimization work more fruitful? Martin> In particular, people have repeatedly requested that the GIL be Martin> removed for Python 3. There is nothing remotely resembling a Martin> patch implementing such a feature at the moment, so this won't Martin> happen. I certainly wasn't expecting something to be available for review now or in the near future. I was actually mostly thinking about language syntax and semantics when I started writing that email. I think those are more likely to be frozen early on in the 3.0 development cycle. I seem to recall some message(s) on python-dev a long time ago about maybe restricting outside modification of a module's globals, e.g.: import a a.x = 1 # proposed as an error? which would have allowed easier optimization of global access. Skip From barry at python.org Wed Aug 29 21:33:02 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 29 Aug 2007 15:33:02 -0400 Subject: [Python-3000] proposal: comparing bytes and str raises TypeError In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 29, 2007, at 1:31 PM, Jeremy Hylton wrote: > I made a local change to my bytesobject.c to raise an exception > whenever it is compared to a PyUnicode_Object. +1. I hit several silent errors in the email package because of this. A TypeError would have been very helpful! - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtXJ7nEjvBPtnXfVAQLFAwP+MuHX4glrmiapgxdpF9jYxXdvEZ7Bt0sn VPq0KRgwj/t97CyqA15d2oo/ojkiZagk3erCKfVT8LQUHb73P9334gEVVWt6bIQn 2Cz8S40WhpOysr0FyLYbdhoPKTx4XihK1cmOZJ/Odv2G8SEjaKQfHlY5qAeHwAV7 M85o+Rc5U8o= =cT6d -----END PGP SIGNATURE----- From barry at python.org Wed Aug 29 22:03:31 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 29 Aug 2007 16:03:31 -0400 Subject: [Python-3000] Does bytes() need to support bytes(, )? In-Reply-To: References: <06E2D54D-1F9A-4B66-ACE8-692A6BF93CA6@python.org> <7A1473AB-611F-4D53-82EB-E9682F2741CD@python.org> Message-ID: <1ACC173E-77E7-430B-9781-71862A81202E@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 28, 2007, at 11:21 AM, Guido van Rossum wrote: >> Nope. So what would bytes(s) do? > > Raise TypeError (when s is a str). The argument to bytes() must be > either an int (then it creates a zero-filled bytes bytes array of that > length) or an iterable of ints (then it creates a bytes array > initialized with those ints -- if any int is out of range, an > exception is raised, and also if any value is not an int). +1 - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtXRFHEjvBPtnXfVAQLY+gP9HsP7Va5ZNdBLEO/yeOU+AQwmjyR+ei4Y KqRK6PNV+7dOGUPeExgfvZhmKoPmu11Q6EYQMFcCFN1/2xb/OooQYaSrT4nI6P3J eNxfmYrUu4H49myygC1IezswJWuestJi3KLawS8MFdLUqphQloH5QfZLBQsRIV8/ m2x3CVXfOrY= =bO41 -----END PGP SIGNATURE----- From martin at v.loewis.de Wed Aug 29 22:08:12 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Aug 2007 22:08:12 +0200 Subject: [Python-3000] buildbots In-Reply-To: References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> Message-ID: <46D5D22C.3010003@v.loewis.de> > If I insert these statements into Modules\posixmodule.c: > > _CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_FILE); > _CrtSetReportFile(_CRT_WARN, _CRTDBG_FILE_STDERR); > _CrtSetReportMode(_CRT_ERROR, _CRTDBG_MODE_FILE); > _CrtSetReportFile(_CRT_ERROR, _CRTDBG_FILE_STDERR); > _CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE); > _CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR); > > _set_error_mode(_OUT_TO_STDERR); > > and recompile and test then the dialog box looks like this: Do you get an output to stderr before that next dialog box? > Die Anweisung in "0x10..." verweist auf Speicher in "0x00000000". Der Vorgang > "read" konnte nicht im Speciher durchgef?hrt werden. > > Klicken Sie auf "OK", um das programm zu beenden. > Klicken Sie auf "Abbrechen", um das programm zu debuggen. That is not from the C library, but from the operating system. Apparently, the CRT continues after giving out the assertion failure. To work around that, it would be possible to install a report hook (using _CrtSetReportHook(2)). This hook would output the error message, and then TerminateProcess. > These messageboxes of course hang the tests on the windows build servers, > so probably it would be good if they could be disabled completely. I think this will be very difficult to achieve. Regards, Martin From guido at python.org Wed Aug 29 22:19:17 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 13:19:17 -0700 Subject: [Python-3000] py3k patches for Windows In-Reply-To: References: Message-ID: I've checked all those in except the change that prevents closing fds 0, 1, 2. Watch the buildbots, I have no Windows access myself. On 8/29/07, Amaury Forgeot d'Arc wrote: > Hello, > > I recently created patches that correct some problems in py3k on Windows. > They are: > > - http://bugs.python.org/issue1029 io.StringIO used to transform \n into \r\n. > This problem must be fixed, if you want the stdout comparisons and > doctests to succeed. > > - http://bugs.python.org/issue1047 converts PC/subprocess.c to full > Unicode (no more PyString...) and test_subprocess passes without a > change. > > - http://bugs.python.org/issue1048 corrects a bogus %zd format used > somewhere by test_float, and prevents a crash... > > - http://bugs.python.org/issue1050 prevents test_marshal from crashing > on debug builds where vc8 seems to insert additional items on the > stack: reduce the recursion level. > > Would someone want to review (and discuss) them and apply to the branch? > Tonight I plan to have a list of the remaining failing tests (before > the buildbots ;-) ) > and maybe propose corrections for some of those... > > -- > Amaury Forgeot d'Arc > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From db3l.net at gmail.com Wed Aug 29 22:34:13 2007 From: db3l.net at gmail.com (David Bolen) Date: Wed, 29 Aug 2007 16:34:13 -0400 Subject: [Python-3000] buildbots References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> Message-ID: "Martin v. L?wis" writes: >> These messageboxes of course hang the tests on the windows build servers, >> so probably it would be good if they could be disabled completely. > > I think this will be very difficult to achieve. Could the tests be run beneath a shim process that used SetErrorMode() to disable all the OS-based process failure dialog boxes? If I remember correctly the error mode is inherited, so an independent small exec module could reset the mode, and execute the normal test sequence as a child process. -- David From db3l.net at gmail.com Wed Aug 29 22:56:52 2007 From: db3l.net at gmail.com (David Bolen) Date: Wed, 29 Aug 2007 16:56:52 -0400 Subject: [Python-3000] buildbots References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> Message-ID: David Bolen writes: > "Martin v. L?wis" writes: > >>> These messageboxes of course hang the tests on the windows build servers, >>> so probably it would be good if they could be disabled completely. >> >> I think this will be very difficult to achieve. > > Could the tests be run beneath a shim process that used SetErrorMode() > to disable all the OS-based process failure dialog boxes? If I > remember correctly the error mode is inherited, so an independent > small exec module could reset the mode, and execute the normal test > sequence as a child process. Or if using ctypes is ok, perhaps it could be done right in the test runner. While I haven't done any local mods to preventthe C RTL boxes, selecting Ignore on them gets me to the OS level box, and: Python 3.0x (py3k, Aug 27 2007, 22:44:06) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import test_os [50256 refs] >>> test_os.test_main() dies with popup in test_execvpe_with_bad_program (test_os.ExecTests). But Python 3.0x (py3k, Aug 27 2007, 22:44:06) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import ctypes [39344 refs] >>> ctypes.windll.kernel32.SetErrorMode(7) 0 [40694 refs] >>> import test_os [55893 refs] >>> test_os.test_main() doesn't present the OS popup prior to process exit. -- David From collinw at gmail.com Wed Aug 29 23:02:39 2007 From: collinw at gmail.com (Collin Winter) Date: Wed, 29 Aug 2007 14:02:39 -0700 Subject: [Python-3000] Will Py3K be friendlier to optimization opportunities? In-Reply-To: <18133.50997.687840.234611@montanaro.dyndns.org> References: <18133.44715.247285.482372@montanaro.dyndns.org> <46D5B0BD.6010209@v.loewis.de> <18133.50997.687840.234611@montanaro.dyndns.org> Message-ID: <43aa6ff70708291402m29205719of476e8e1d4e7c965@mail.gmail.com> On 8/29/07, skip at pobox.com wrote: [snip] > I certainly wasn't expecting something to be available for review now or in > the near future. I was actually mostly thinking about language syntax and > semantics when I started writing that email. I think those are more likely > to be frozen early on in the 3.0 development cycle. I seem to recall some > message(s) on python-dev a long time ago about maybe restricting outside > modification of a module's globals, e.g.: > > import a > a.x = 1 # proposed as an error? > > which would have allowed easier optimization of global access. When thinking about these kinds of optimizations and restrictions, keep in mind their effect on testing. For example, I work on code that makes use of the ability to tinker with another module's view of os.path in order to simulate error conditions that would otherwise be hard to test. If you wanted to hide this kind of restriction behind an -O flag, that would be one thing, but having it on by default seems like a bad idea. Collin Winter From martin at v.loewis.de Thu Aug 30 00:15:53 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 30 Aug 2007 00:15:53 +0200 Subject: [Python-3000] buildbots In-Reply-To: References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> Message-ID: <46D5F019.9090706@v.loewis.de> >>> These messageboxes of course hang the tests on the windows build servers, >>> so probably it would be good if they could be disabled completely. >> I think this will be very difficult to achieve. > > Could the tests be run beneath a shim process that used SetErrorMode() > to disable all the OS-based process failure dialog boxes? I did not know about that - it may help. > If I > remember correctly the error mode is inherited, so an independent > small exec module could reset the mode, and execute the normal test > sequence as a child process. It would also be possible to put that into the interpreter itself, at least when running in debug mode. What does "Instead, the system sends the error to the calling process." mean? Regards, Martin From db3l.net at gmail.com Thu Aug 30 00:33:09 2007 From: db3l.net at gmail.com (David Bolen) Date: Wed, 29 Aug 2007 18:33:09 -0400 Subject: [Python-3000] buildbots In-Reply-To: <46D5F019.9090706@v.loewis.de> References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D5F019.9090706@v.loewis.de> Message-ID: <9f94e2360708291533h139b3d37y2d9aef068228579c@mail.gmail.com> On 8/29/07, "Martin v. L?wis" wrote: > > If I > > remember correctly the error mode is inherited, so an independent > > small exec module could reset the mode, and execute the normal test > > sequence as a child process. > > It would also be possible to put that into the interpreter itself, > at least when running in debug mode. Yep, although you might want to choose whether or not to do it in interactive mode I suppose. Or, as in my subsequent message, it could just be incorporated into the test runner (such as in regrtest.py). > What does > > "Instead, the system sends the error to the calling process." > > mean? It's somewhat dependent on the type of problem that was going to lead to the dialog box. For a catastrophic failure (e.g., GPF), such as those that only provide OK (or perhaps Cancel for debug) in the dialog, the process is still going to be abruptly terminated, as if OK was pressed with no further execution of code within the process itself. A parent process can detect this based on the exit code of the subprocess. For other less critical failures (like the box that pops up when trying to open a file on a removable device that isn't present), an error is simply returned to the calling process as the result of the original system call that triggered the failure - just like any other failing I/O operation, and equivalent I believe to hitting Cancel on the dialog that would otherwise have popped up. -- David From amauryfa at gmail.com Thu Aug 30 00:43:02 2007 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Thu, 30 Aug 2007 00:43:02 +0200 Subject: [Python-3000] py3k patches for Windows In-Reply-To: References: Message-ID: 2007/8/29, Guido van Rossum : > I've checked all those in except the change that prevents closing fds > 0, 1, 2. Watch the buildbots, I have no Windows access myself. Oops, the 0,1,2 fds stuff was not supposed to leave my workspace. I agree that it should be done differently. Note that there is currently only one Windows buildbot to watch - and it is a win64. Most tests that fail there pass on my machine. Nevertheless, the log is much smaller than before. I see additional errors like "can't use str as char buffer". I suppose it is because of the recent stricter distinction between bytes and str. Thanks for all, -- Amaury Forgeot d'Arc From db3l.net at gmail.com Thu Aug 30 00:58:23 2007 From: db3l.net at gmail.com (David Bolen) Date: Wed, 29 Aug 2007 18:58:23 -0400 Subject: [Python-3000] py3k patches for Windows References: Message-ID: "Amaury Forgeot d'Arc" writes: > Note that there is currently only one Windows buildbot to watch - and > it is a win64. Most tests that fail there pass on my machine. > Nevertheless, the log is much smaller than before. There's an offer of mine to host an additional Windows (win32) buildbot, for whatever versions are helpful, in the moderator queue for python-dev. Although in looking at the current 3.0 buildbot status page there would seem to be two others already running, so I wasn't sure if another was really needed. -- David From guido at python.org Thu Aug 30 01:06:40 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 16:06:40 -0700 Subject: [Python-3000] py3k patches for Windows In-Reply-To: References: Message-ID: On 8/29/07, Amaury Forgeot d'Arc wrote: > I see additional errors like "can't use str as char buffer". I suppose > it is because of the recent stricter distinction between bytes and > str. Indeed. It typically means that something is using PyObject_AsCharBuffer() instead of PyUnicode_AsStringAndSize(). Please fix occurrences you find and upload patches. It seems that any use of the 't#' format for PyArg_ParseTuple() also triggers this error. I don't understand the code for that format; I've been successful in the short term by changing these to 's#' but I don't think that's the correct solution... -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Thu Aug 30 01:08:50 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Wed, 29 Aug 2007 16:08:50 -0700 Subject: [Python-3000] py3k patches for Windows In-Reply-To: References: Message-ID: On 8/29/07, David Bolen wrote: > > There's an offer of mine to host an additional Windows (win32) > buildbot, for whatever versions are helpful, in the moderator queue > for python-dev. Hmm, that's odd. > Although in looking at the current 3.0 buildbot > status page there would seem to be two others already running, so I > wasn't sure if another was really needed. I think it would help. We have 4 windows buildbots IIRC, but 2 are offline and there are various problems. Given all the slight variations of Windows, I think it would be good to get another Windows bot. Note: that a bot for one branch is a bot for all branches. Only one branch will be run at a time though. Contact me offline with some of the details and I will try to get this setup tonight. Let me know what timezone you are in too. Thanks! n From guido at python.org Thu Aug 30 01:18:46 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 16:18:46 -0700 Subject: [Python-3000] Need Windows build instructions Message-ID: Can someone familiar with building Py3k on Windows add a section on how to build it on Windows to the new README? (Do a svn up first; I've rewritten most of it.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.peters at gmail.com Thu Aug 30 01:25:36 2007 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 29 Aug 2007 19:25:36 -0400 Subject: [Python-3000] py3k patches for Windows In-Reply-To: References: Message-ID: <1f7befae0708291625m79f1db4fyf3c3eb55e3b9f283@mail.gmail.com> [Neal Norwitz] > ... > We have 4 windows buildbots IIRC, but 2 are offline and there are > various problems. Given all the slight variations of Windows, I > think it would be good to get another Windows bot. FYI, my bot is offline now just because I'm having major HW problems (endless spontaneous reboots, possibly due to overheating). I'm paying a fellow who knows more about PC HW to come over this weekend, and I hope that will resolve it. If not, I'll just make the PSF buy me a new machine ;-) One way or another, I should get a bot back online within the next couple weeks. From nnorwitz at gmail.com Thu Aug 30 01:28:36 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Wed, 29 Aug 2007 16:28:36 -0700 Subject: [Python-3000] py3k patches for Windows In-Reply-To: <1f7befae0708291625m79f1db4fyf3c3eb55e3b9f283@mail.gmail.com> References: <1f7befae0708291625m79f1db4fyf3c3eb55e3b9f283@mail.gmail.com> Message-ID: On 8/29/07, Tim Peters wrote: > [Neal Norwitz] > > ... > > We have 4 windows buildbots IIRC, but 2 are offline and there are > > various problems. Given all the slight variations of Windows, I > > think it would be good to get another Windows bot. > > FYI, my bot is offline now just because I'm having major HW problems > (endless spontaneous reboots, possibly due to overheating). I'm > paying a fellow who knows more about PC HW to come over this weekend, > and I hope that will resolve it. If not, I'll just make the PSF buy > me a new machine ;-) One way or another, I should get a bot back > online within the next couple weeks. Do you want two machines? It's long past time for the PSF to buy you a machine. Getting you a new, faster machine would be great for the PSF. I still have a checkbook, how much do you need. :-) n From greg at krypto.org Thu Aug 30 01:47:28 2007 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 29 Aug 2007 16:47:28 -0700 Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support Message-ID: <20070829234728.GV24059@electricrain.com> Attached is what I've come up with so far. Only a single field is added to the PyBytesObject struct. This adds support to the bytes object for PyBUF_LOCKDATA buffer API operation. bytes objects can be marked temporarily read-only for use while the buffer api has handed them off to something which may run without the GIL (think IO). Any attempt to modify them during that time will raise an exception as I believe Martin suggested earlier. As an added bonus because its been discussed here, support for setting a bytes object immutable has been added since its pretty trivial once the read only export support was in place. Thats not required but was trivial to include. I'd appreciate any feedback. My TODO list for this patch: 0. Get feedback and make adjustments as necessary. 1. Deciding between PyBUF_SIMPLE and PyBUF_WRITEABLE for the internal uses of the _getbuffer() function. bytesobject.c contains both readonly and read-write uses of the buffers, i'll add boolean parameter for that. 2. More testing: a few tests in the test suite fail after this but the number was low and I haven't had time to look at why or what the failures were. 3. Exporting methods suggested in the TODO at the top of the file. 4. Unit tests for all of the functionality this adds. NOTE: after these changes I had to make clean and rm -rf build before things would not segfault on import. I suspect some things (modules?) were not properly recompiled after the bytesobject.h struct change otherwise. -gps -------------- next part -------------- Index: Include/bytesobject.h =================================================================== --- Include/bytesobject.h (revision 57679) +++ Include/bytesobject.h (working copy) @@ -17,17 +17,18 @@ * For the convenience of C programmers, the bytes type is considered * to contain a char pointer, not an unsigned char pointer. */ /* Object layout */ typedef struct { PyObject_VAR_HEAD /* XXX(nnorwitz): should ob_exports be Py_ssize_t? */ - int ob_exports; /* how many buffer exports */ + int ob_exports; /* How many buffer exports */ + int ob_readonly_exports; /* How many buffer exports as readonly */ Py_ssize_t ob_alloc; /* How many bytes allocated */ char *ob_bytes; } PyBytesObject; /* Type object */ PyAPI_DATA(PyTypeObject) PyBytes_Type; /* Type check macros */ Index: Objects/bytesobject.c =================================================================== --- Objects/bytesobject.c (revision 57679) +++ Objects/bytesobject.c (working copy) @@ -1,16 +1,156 @@ /* Bytes object implementation */ /* XXX TO DO: optimizations */ #define PY_SSIZE_T_CLEAN #include "Python.h" #include "structmember.h" +/* + * Constants for use with the PyBytesObject.ob_readonly_exports. + */ +#define IMMUTABLE (INT_MAX) +#define MAX_READONLY_EXPORTS (INT_MAX-1) + +/* + * Should we bounds check PyBytesObject.ob_exports and + * ob_readonly_exports when we increment them? + */ +#if MAX_READONLY_EXPORTS <= USHRT_MAX +#define BOUNDS_CHECK_EXPORTS 1 +#else +#undef BOUNDS_CHECK_EXPORTS +#endif + +/* + * XXX(gps): I added support for immutability because it was a trivial + * addition to the work I was already doing to add PyBUF_READLOCK + * support to bytes objects. It isn't required but is included as an + * example to decide if it should stay. + * + * TODO(gps) Do we want to provide an exported interface for any of + * these inlines for use by C code that uses Bytes objects directly + * rather than the buffer API? I suggest C code should prefer to use + * the buffer API (though it is heavier weight). + * + * APIs I think should be public in C and Python: + * is_readonly + * set_immutable & is_immutable + */ + +/* + * Set a bytes object to be immutable. If outstanding non-readonly + * exports exist this will raise an error instead. Once immutable, + * always immutable. This cannot be undone. + * + * Returns: 0 on success, 1 on failure with an exception set. + */ +Py_LOCAL_INLINE(int) set_immutable(PyBytesObject *obj) +{ + if (obj->ob_exports > 0) { + PyErr_SetString(PyExc_BufferError, + "bytes with outstanding non-readonly exports" + "cannot be made immutable."); + return 1; + } + obj->ob_readonly_exports = IMMUTABLE; + return 0; +} + +/* + * Is this bytes object immutable? 0: no, 1: yes + */ +Py_LOCAL_INLINE(int) is_immutable(PyBytesObject *obj) +{ + return obj->ob_readonly_exports == IMMUTABLE; +} + +/* + * Is this bytes object currently read only? 0: no, 1: yes + */ +Py_LOCAL_INLINE(int) is_readonly(PyBytesObject *obj) +{ + assert(is_immutable(obj) || obj->ob_readonly_exports <= obj->ob_exports); + return (obj->ob_readonly_exports > 0 && obj->ob_exports == 0); +} + +/* + * Increment the export count. For use by getbuffer. + * + * Returns: 0 on success, -1 on failure with an exception set. + * (-1 matches the required buffer API getbuffer return value) + */ +Py_LOCAL_INLINE(int) inc_exports(PyBytesObject *obj) +{ + obj->ob_exports++; +#ifdef BOUNDS_CHECK_EXPORTS + if (obj->ob_exports <= 0) { + PyErr_SetString(PyExc_RuntimeError, + "ob_exports integer overflow"); + obj->ob_exports--; + return -1; + } +#endif + return 0; +} + +/* + * Decrement the export count. For use by releasebuffer. + */ +Py_LOCAL_INLINE(void) dec_exports(PyBytesObject *obj) +{ + obj->ob_exports--; +} + + +/* + * Increment the readonly export count if the object is mutable. + * Must be called with the GIL held. + * + * For use by the buffer API to implement PyBUF_LOCKDATA requests. + */ +Py_LOCAL_INLINE(void) inc_readonly_exports(PyBytesObject *obj) +{ +#ifdef BOUNDS_CHECK_EXPORTS + if (obj->ob_readonly_exports == MAX_READONLY_EXPORTS) { + /* XXX(gps): include object id in this warning? */ + PyErr_WarnEx(PyExc_RuntimeWarning, + "readonly_exports overflow on bytes object; " + "marking it immutable.", 1); + obj->ob_readonly_exports = IMMUTABLE; + } +#endif + /* + * NOTE: Even if the above check isn't made, the values are such that + * incrementing ob_readonly_exports past the max value will cause it + * to become immutable (as a partial safety feature). + */ + if (obj->ob_readonly_exports != IMMUTABLE) { + obj->ob_readonly_exports++; + } +} + + +/* + * Decrement the readonly export count. + * Must be called with the GIL held. + * + * For use by the buffer API to implement PyBUF_LOCKDATA requests. + */ +Py_LOCAL_INLINE(void) dec_readonly_exports(PyBytesObject *obj) +{ + assert(is_immutable(obj) || obj->ob_readonly_exports <= obj->ob_exports); + if (obj->ob_readonly_exports != IMMUTABLE) { + obj->ob_readonly_exports--; + } +} + + /* The nullbytes are used by the stringlib during partition. * If partition is removed from bytes, nullbytes and its helper * Init/Fini should also be removed. */ static PyBytesObject *nullbytes = NULL; void PyBytes_Fini(void) @@ -49,35 +189,44 @@ return 1; } static int bytes_getbuffer(PyBytesObject *obj, PyBuffer *view, int flags) { int ret; void *ptr; + int readonly = 0; if (view == NULL) { - obj->ob_exports++; - return 0; + return inc_exports(obj); } if (obj->ob_bytes == NULL) ptr = ""; else ptr = obj->ob_bytes; - ret = PyBuffer_FillInfo(view, ptr, Py_Size(obj), 0, flags); + if (((flags & PyBUF_LOCKDATA) == PyBUF_LOCKDATA) && + obj->ob_exports == 0) { + inc_readonly_exports(obj); + readonly = -1; + } else { + readonly = is_readonly(obj); + } + ret = PyBuffer_FillInfo(view, ptr, Py_Size(obj), readonly, flags); if (ret >= 0) { - obj->ob_exports++; + return inc_exports(obj); } return ret; } static void bytes_releasebuffer(PyBytesObject *obj, PyBuffer *view) { - obj->ob_exports--; + dec_exports(obj); + if (view && view->readonly == -1) + dec_readonly_exports(obj); } static Py_ssize_t _getbuffer(PyObject *obj, PyBuffer *view) { PyBufferProcs *buffer = Py_Type(obj)->tp_as_buffer; if (buffer == NULL || @@ -85,16 +234,20 @@ buffer->bf_getbuffer == NULL) { PyErr_Format(PyExc_TypeError, "Type %.100s doesn't support the buffer API", Py_Type(obj)->tp_name); return -1; } + /* + * TODO(gps): make this PyBUF_WRITEABLE? or just verify sanity on + * our own before calling if the GIL is not being relinquished? + */ if (buffer->bf_getbuffer(obj, view, PyBUF_SIMPLE) < 0) return -1; return view->len; } /* Direct API functions */ PyObject * @@ -151,26 +304,40 @@ PyBytes_AsString(PyObject *self) { assert(self != NULL); assert(PyBytes_Check(self)); return PyBytes_AS_STRING(self); } +#define SET_RO_ERROR(bo) do { \ + if (is_immutable((PyBytesObject *)(bo))) \ + PyErr_SetString(PyExc_BufferError, \ + "Immutable flag set: object cannot be modified"); \ + else \ + PyErr_SetString(PyExc_BufferError, \ + "Readonly export exists: object cannot be modified"); \ + } while (0); + int PyBytes_Resize(PyObject *self, Py_ssize_t size) { void *sval; Py_ssize_t alloc = ((PyBytesObject *)self)->ob_alloc; assert(self != NULL); assert(PyBytes_Check(self)); assert(size >= 0); + if (is_readonly((PyBytesObject *)self)) { + SET_RO_ERROR(self); + return -1; + } + if (size < alloc / 2) { /* Major downsize; resize down to exact size */ alloc = size + 1; } else if (size < alloc) { /* Within allocated size; quick exit */ Py_Size(self) = size; ((PyBytesObject *)self)->ob_bytes[size] = '\0'; /* Trailing null */ @@ -275,17 +442,23 @@ } mysize = Py_Size(self); size = mysize + vo.len; if (size < 0) { PyObject_ReleaseBuffer(other, &vo); return PyErr_NoMemory(); } + if (size < self->ob_alloc) { + if (is_readonly((PyBytesObject *)self)) { + SET_RO_ERROR(self); + PyObject_ReleaseBuffer(other, &vo); + return NULL; + } Py_Size(self) = size; self->ob_bytes[Py_Size(self)] = '\0'; /* Trailing null byte */ } else if (PyBytes_Resize((PyObject *)self, size) < 0) { PyObject_ReleaseBuffer(other, &vo); return NULL; } memcpy(self->ob_bytes + mysize, vo.buf, vo.len); @@ -327,17 +500,22 @@ Py_ssize_t size; if (count < 0) count = 0; mysize = Py_Size(self); size = mysize * count; if (count != 0 && size / count != mysize) return PyErr_NoMemory(); + if (size < self->ob_alloc) { + if (is_readonly((PyBytesObject *)self)) { + SET_RO_ERROR(self); + return NULL; + } Py_Size(self) = size; self->ob_bytes[Py_Size(self)] = '\0'; /* Trailing null byte */ } else if (PyBytes_Resize((PyObject *)self, size) < 0) return NULL; if (mysize == 1) memset(self->ob_bytes, self->ob_bytes[0], size); @@ -487,16 +665,22 @@ "can't set bytes slice from %.100s", Py_Type(values)->tp_name); return -1; } needed = vbytes.len; bytes = vbytes.buf; } + if (is_readonly((PyBytesObject *)self)) { + SET_RO_ERROR(self); + res = -1; + goto finish; + } + if (lo < 0) lo = 0; if (hi < lo) hi = lo; if (hi > Py_Size(self)) hi = Py_Size(self); avail = hi - lo; @@ -553,16 +737,21 @@ if (i < 0 || i >= Py_Size(self)) { PyErr_SetString(PyExc_IndexError, "bytes index out of range"); return -1; } if (value == NULL) return bytes_setslice(self, i, i+1, NULL); + if (is_readonly((PyBytesObject *)self)) { + SET_RO_ERROR(self); + return -1; + } + ival = PyNumber_AsSsize_t(value, PyExc_ValueError); if (ival == -1 && PyErr_Occurred()) return -1; if (ival < 0 || ival >= 256) { PyErr_SetString(PyExc_ValueError, "byte must be in range(0, 256)"); return -1; } @@ -572,16 +761,21 @@ } static int bytes_ass_subscript(PyBytesObject *self, PyObject *item, PyObject *values) { Py_ssize_t start, stop, step, slicelen, needed; char *bytes; + if (is_readonly((PyBytesObject *)self)) { + SET_RO_ERROR(self); + return -1; + } + if (PyIndex_Check(item)) { Py_ssize_t i = PyNumber_AsSsize_t(item, PyExc_IndexError); if (i == -1 && PyErr_Occurred()) return -1; if (i < 0) i += PyBytes_GET_SIZE(self); @@ -1335,16 +1529,18 @@ PyDoc_STRVAR(translate__doc__, "B.translate(table [,deletechars]) -> bytes\n\ \n\ Return a copy of the bytes B, where all characters occurring\n\ in the optional argument deletechars are removed, and the\n\ remaining characters have been mapped through the given\n\ translation table, which must be a bytes of length 256."); +/* XXX(gps): bytes could also use an in place bytes_itranslate method? */ + static PyObject * bytes_translate(PyBytesObject *self, PyObject *args) { register char *input, *output; register const char *table; register Py_ssize_t i, c, changed = 0; PyObject *input_obj = (PyObject*)self; const char *table1, *output_start, *del_table=NULL; @@ -2030,16 +2226,18 @@ PyDoc_STRVAR(replace__doc__, "B.replace (old, new[, count]) -> bytes\n\ \n\ Return a copy of bytes B with all occurrences of subsection\n\ old replaced by new. If the optional argument count is\n\ given, only the first count occurrences are replaced."); +/* XXX(gps): bytes could also use an in place bytes_ireplace method? */ + static PyObject * bytes_replace(PyBytesObject *self, PyObject *args) { Py_ssize_t count = -1; PyObject *from, *to, *res; PyBuffer vfrom, vto; if (!PyArg_ParseTuple(args, "OO|n:replace", &from, &to, &count)) @@ -2382,16 +2580,21 @@ \n\ Reverse the order of the values in bytes in place."); static PyObject * bytes_reverse(PyBytesObject *self, PyObject *unused) { char swap, *head, *tail; Py_ssize_t i, j, n = Py_Size(self); + if (is_readonly((PyBytesObject *)self)) { + SET_RO_ERROR(self); + return NULL; + } + j = n / 2; head = self->ob_bytes; tail = head + n - 1; for (i = 0; i < j; i++) { swap = *head; *head++ = *tail; *tail-- = swap; } @@ -2407,16 +2610,21 @@ bytes_insert(PyBytesObject *self, PyObject *args) { int value; Py_ssize_t where, n = Py_Size(self); if (!PyArg_ParseTuple(args, "ni:insert", &where, &value)) return NULL; + if (is_readonly((PyBytesObject *)self)) { + SET_RO_ERROR(self); + return NULL; + } + if (n == PY_SSIZE_T_MAX) { PyErr_SetString(PyExc_OverflowError, "cannot add more objects to bytes"); return NULL; } if (value < 0 || value >= 256) { PyErr_SetString(PyExc_ValueError, "byte must be in range(0, 256)"); @@ -2472,16 +2680,21 @@ bytes_pop(PyBytesObject *self, PyObject *args) { int value; Py_ssize_t where = -1, n = Py_Size(self); if (!PyArg_ParseTuple(args, "|n:pop", &where)) return NULL; + if (is_readonly((PyBytesObject *)self)) { + SET_RO_ERROR(self); + return NULL; + } + if (n == 0) { PyErr_SetString(PyExc_OverflowError, "cannot pop an empty bytes"); return NULL; } if (where < 0) where += Py_Size(self); if (where < 0 || where >= Py_Size(self)) { @@ -2505,16 +2718,21 @@ bytes_remove(PyBytesObject *self, PyObject *arg) { int value; Py_ssize_t where, n = Py_Size(self); if (! _getbytevalue(arg, &value)) return NULL; + if (is_readonly((PyBytesObject *)self)) { + SET_RO_ERROR(self); + return NULL; + } + for (where = 0; where < n; where++) { if (self->ob_bytes[where] == value) break; } if (where == n) { PyErr_SetString(PyExc_ValueError, "value not found in bytes"); return NULL; } @@ -2783,16 +3001,18 @@ error: Py_DECREF(newbytes); return NULL; } PyDoc_STRVAR(reduce_doc, "Return state information for pickling."); +/* XXX(gps): should is_immutable() be in the pickle? */ + static PyObject * bytes_reduce(PyBytesObject *self) { PyObject *latin1; if (self->ob_bytes) latin1 = PyUnicode_DecodeLatin1(self->ob_bytes, Py_Size(self), NULL); else From thomas at python.org Thu Aug 30 01:56:53 2007 From: thomas at python.org (Thomas Wouters) Date: Thu, 30 Aug 2007 01:56:53 +0200 Subject: [Python-3000] refleak in test_io? Message-ID: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com> Am I the only one seeing a refleak in test_io? timberwolf:~/python/python/py3k > ./python -E -tt Lib/test/regrtest.py -R:: test_io test_io beginning 9 repetitions 123456789 ......... test_io leaked [62, 62, 62, 62] references, sum=248 1 test OK. It's in this particular piece of code: def test_destructor(self): record = [] class MyFileIO(io.FileIO): def __del__(self): record.append(1) io.FileIO.__del__(self) def close(self): record.append(2) io.FileIO.close(self) def flush(self): record.append(3) io.FileIO.flush(self) f = MyFileIO(test_support.TESTFN, "w") f.write("xxx") del f self.assertEqual(record, [1, 2, 3]) which you can simplify to: def test_destructor(self): class MyFileIO(io.FileIO): pass f = MyFileIO(test_support.TESTFN, "w") del f That leaks 30 references each time it's called. Taking the class definition out of the function stops the leak, so it smells like something, somewhere, is leaking a reference to the MyFileIO class, Instantiating the class is necessary to trigger the leak: the refcount of the class goes up after creating the instance, but does not go down after it's destroyed. However, creating and destroying another instance does not leak another reference, only the single reference is leaked. I tried recreating the leak with more controllable types, but I haven't got very far. It seems to be caused by some weird interaction between io.FileIO, _fileio._FileIO and io.IOBase, specifically io.IOBase.__del__() calling self.close(), and io.FileIO.close() calling _fileio._FileIO.close() *and* io.RawIOBase.close(). The weird thing is that the contents of RawIOBase.close() doesn't matter. The mere act of calling RawBaseIO.close(self) causes the leak. Remove the call, or change it into an attribute fetch, and the leak is gone. I'm stumped. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070830/39925414/attachment.htm From guido at python.org Thu Aug 30 02:04:53 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 17:04:53 -0700 Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support In-Reply-To: <20070829234728.GV24059@electricrain.com> References: <20070829234728.GV24059@electricrain.com> Message-ID: That's a huge patch to land so close before a release. I'm not sure I like the immutability API -- it won't be useful unless we add a hash method, and then we have all sorts of difficulties again -- the distinction between a hashable and an unhashable object should be made by type, not by value (tuples containing unhashable values notwithstanding). I don't understand the comment about using PyBUF_WRITABLE in _getbuffer() -- this is only used for data we're *reading* and I don't think the GIL is even released while we're reading such things. If you think it's important to get this in the 3.0a1 release, we should pair-program on it ASAP, preferable tomorrow morning. Otherwise, let's do a review next week. --Guido On 8/29/07, Gregory P. Smith wrote: > Attached is what I've come up with so far. Only a single field is > added to the PyBytesObject struct. This adds support to the bytes > object for PyBUF_LOCKDATA buffer API operation. bytes objects can be > marked temporarily read-only for use while the buffer api has handed > them off to something which may run without the GIL (think IO). Any > attempt to modify them during that time will raise an exception as I > believe Martin suggested earlier. > > As an added bonus because its been discussed here, support for setting > a bytes object immutable has been added since its pretty trivial once > the read only export support was in place. Thats not required but was > trivial to include. > > I'd appreciate any feedback. > > My TODO list for this patch: > > 0. Get feedback and make adjustments as necessary. > > 1. Deciding between PyBUF_SIMPLE and PyBUF_WRITEABLE for the internal > uses of the _getbuffer() function. bytesobject.c contains both readonly > and read-write uses of the buffers, i'll add boolean parameter for > that. > > 2. More testing: a few tests in the test suite fail after this but the > number was low and I haven't had time to look at why or what the > failures were. > > 3. Exporting methods suggested in the TODO at the top of the file. > > 4. Unit tests for all of the functionality this adds. > > NOTE: after these changes I had to make clean and rm -rf build before > things would not segfault on import. I suspect some things (modules?) > were not properly recompiled after the bytesobject.h struct change > otherwise. > > -gps > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Thu Aug 30 02:07:49 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Wed, 29 Aug 2007 17:07:49 -0700 Subject: [Python-3000] refleak in test_io? In-Reply-To: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com> References: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com> Message-ID: On 8/29/07, Thomas Wouters wrote: > > Am I the only one seeing a refleak in test_io? I know of leaks in 4 modules, but they all may point to the same one you identified: test_io leaked [62, 62] references, sum=124 test_urllib leaked [122, 122] references, sum=244 test_urllib2_localnet leaked [3, 3] references, sum=6 test_xmlrpc leaked [26, 26] references, sum=52 n From greg at krypto.org Thu Aug 30 02:49:45 2007 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 29 Aug 2007 17:49:45 -0700 Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support In-Reply-To: References: <20070829234728.GV24059@electricrain.com> Message-ID: <52dc1c820708291749v60d50326me7684678553ce3cb@mail.gmail.com> I'm inclined to let this one wait for 3.0a2, I'm out of python time for the week and will be out of town (but online) until next Thursday. Pairing up to finish it later on would be nice if needed. I'm happy if the immutable support is dropped, I just figured I'd include it as an example once I realized how easy it was. I don't want hashable bytes objects either (let someone implement that using a subclass in python :). As for the _getbuffer() stuff I left it as a comment because I hadn't looked into it in enough detail yet, you're right about the GIL. -gps On 8/29/07, Guido van Rossum wrote: > > That's a huge patch to land so close before a release. I'm not sure I > like the immutability API -- it won't be useful unless we add a hash > method, and then we have all sorts of difficulties again -- the > distinction between a hashable and an unhashable object should be made > by type, not by value (tuples containing unhashable values > notwithstanding). > > I don't understand the comment about using PyBUF_WRITABLE in > _getbuffer() -- this is only used for data we're *reading* and I don't > think the GIL is even released while we're reading such things. > > If you think it's important to get this in the 3.0a1 release, we > should pair-program on it ASAP, preferable tomorrow morning. > Otherwise, let's do a review next week. > > --Guido > > On 8/29/07, Gregory P. Smith wrote: > > Attached is what I've come up with so far. Only a single field is > > added to the PyBytesObject struct. This adds support to the bytes > > object for PyBUF_LOCKDATA buffer API operation. bytes objects can be > > marked temporarily read-only for use while the buffer api has handed > > them off to something which may run without the GIL (think IO). Any > > attempt to modify them during that time will raise an exception as I > > believe Martin suggested earlier. > > > > As an added bonus because its been discussed here, support for setting > > a bytes object immutable has been added since its pretty trivial once > > the read only export support was in place. Thats not required but was > > trivial to include. > > > > I'd appreciate any feedback. > > > > My TODO list for this patch: > > > > 0. Get feedback and make adjustments as necessary. > > > > 1. Deciding between PyBUF_SIMPLE and PyBUF_WRITEABLE for the internal > > uses of the _getbuffer() function. bytesobject.c contains both > readonly > > and read-write uses of the buffers, i'll add boolean parameter for > > that. > > > > 2. More testing: a few tests in the test suite fail after this but the > > number was low and I haven't had time to look at why or what the > > failures were. > > > > 3. Exporting methods suggested in the TODO at the top of the file. > > > > 4. Unit tests for all of the functionality this adds. > > > > NOTE: after these changes I had to make clean and rm -rf build before > > things would not segfault on import. I suspect some things (modules?) > > were not properly recompiled after the bytesobject.h struct change > > otherwise. > > > > -gps > > > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070829/855df381/attachment.htm From barry at python.org Thu Aug 30 03:35:11 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 29 Aug 2007 21:35:11 -0400 Subject: [Python-3000] [Python-3000-checkins] r57691 - python/branches/py3k/Lib/email In-Reply-To: <20070830011514.A50B51E4002@bag.python.org> References: <20070830011514.A50B51E4002@bag.python.org> Message-ID: <5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 29, 2007, at 9:15 PM, guido.van.rossum wrote: > Author: guido.van.rossum > Date: Thu Aug 30 03:15:14 2007 > New Revision: 57691 > > Added: > python/branches/py3k/Lib/email/ > - copied from r57592, sandbox/trunk/emailpkg/5_0-exp/email/ > Log: > Copying the email package back, despite its failings. Oh, okay! I have a few uncommitted changes that improve things a bit, but I'll commit those to the branch and kill off the sandbox branch. Note that there /are/ API changes involved here so the documentation and docstrings need to be updated and NEWS entries need to be written. I'll do all that after I fix the last few failures. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtYez3EjvBPtnXfVAQKL0AP6A5YGJdQ5vRDk1PaHlD/R6qnlFF4O8omT uFCw/JbWD+FEFfpzEgFGtlJcidPclPbSnhL6xRix1IDOz+O8f6jUHZ/rES+LDjhT 4XkK3cwqH/+qwl/QH92/M0Kz+uS7ADcXvxIKH+cvSZGc1c5W1J4jTb8SZtlAugyd m4deMR8/E5M= =tRyX -----END PGP SIGNATURE----- From guido at python.org Thu Aug 30 04:02:41 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 19:02:41 -0700 Subject: [Python-3000] [Python-3000-checkins] r57691 - python/branches/py3k/Lib/email In-Reply-To: <5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org> References: <20070830011514.A50B51E4002@bag.python.org> <5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org> Message-ID: Great! The more you can fix up by Friday the better, but I figured it's better to have a little lead time so we can fix up other things depending on it, and have a little test time. On 8/29/07, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Aug 29, 2007, at 9:15 PM, guido.van.rossum wrote: > > > Author: guido.van.rossum > > Date: Thu Aug 30 03:15:14 2007 > > New Revision: 57691 > > > > Added: > > python/branches/py3k/Lib/email/ > > - copied from r57592, sandbox/trunk/emailpkg/5_0-exp/email/ > > Log: > > Copying the email package back, despite its failings. > > Oh, okay! I have a few uncommitted changes that improve things a > bit, but I'll commit those to the branch and kill off the sandbox > branch. > > Note that there /are/ API changes involved here so the documentation > and docstrings need to be updated and NEWS entries need to be > written. I'll do all that after I fix the last few failures. > > - -Barry > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.7 (Darwin) > > iQCVAwUBRtYez3EjvBPtnXfVAQKL0AP6A5YGJdQ5vRDk1PaHlD/R6qnlFF4O8omT > uFCw/JbWD+FEFfpzEgFGtlJcidPclPbSnhL6xRix1IDOz+O8f6jUHZ/rES+LDjhT > 4XkK3cwqH/+qwl/QH92/M0Kz+uS7ADcXvxIKH+cvSZGc1c5W1J4jTb8SZtlAugyd > m4deMR8/E5M= > =tRyX > -----END PGP SIGNATURE----- > _______________________________________________ > Python-3000-checkins mailing list > Python-3000-checkins at python.org > http://mail.python.org/mailman/listinfo/python-3000-checkins > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Thu Aug 30 04:19:46 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 29 Aug 2007 21:19:46 -0500 Subject: [Python-3000] Will Py3K be friendlier to optimization opportunities? In-Reply-To: <43aa6ff70708291402m29205719of476e8e1d4e7c965@mail.gmail.com> References: <18133.44715.247285.482372@montanaro.dyndns.org> <46D5B0BD.6010209@v.loewis.de> <18133.50997.687840.234611@montanaro.dyndns.org> <43aa6ff70708291402m29205719of476e8e1d4e7c965@mail.gmail.com> Message-ID: <18134.10562.798661.664005@montanaro.dyndns.org> Collin> When thinking about these kinds of optimizations and Collin> restrictions, keep in mind their effect on testing. For example, Collin> I work on code that makes use of the ability to tinker with Collin> another module's view of os.path in order to simulate error Collin> conditions that would otherwise be hard to test. If you wanted Collin> to hide this kind of restriction behind an -O flag, that would Collin> be one thing, but having it on by default seems like a bad idea. You can achieve these sorts of effects by assigning an object to sys.modules[modulename]. Skip From skip at pobox.com Thu Aug 30 04:23:48 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 29 Aug 2007 21:23:48 -0500 Subject: [Python-3000] Will Py3K be friendlier to optimization opportunities? In-Reply-To: <18134.10562.798661.664005@montanaro.dyndns.org> References: <18133.44715.247285.482372@montanaro.dyndns.org> <46D5B0BD.6010209@v.loewis.de> <18133.50997.687840.234611@montanaro.dyndns.org> <43aa6ff70708291402m29205719of476e8e1d4e7c965@mail.gmail.com> <18134.10562.798661.664005@montanaro.dyndns.org> Message-ID: <18134.10804.103888.38682@montanaro.dyndns.org> skip> You can achieve these sorts of effects by assigning an object to skip> sys.modules[modulename]. I forgot you should always be able to assign to the module's __dict__ attribute as well: >>> import os >>> os.__dict__['foo'] = 'a' >>> os.foo 'a' S From barry at python.org Thu Aug 30 04:24:02 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 29 Aug 2007 22:24:02 -0400 Subject: [Python-3000] [Python-3000-checkins] r57691 - python/branches/py3k/Lib/email In-Reply-To: References: <20070830011514.A50B51E4002@bag.python.org> <5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org> Message-ID: <8FE35331-8B63-421B-BBB9-044983EA5760@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 29, 2007, at 10:02 PM, Guido van Rossum wrote: > Great! The more you can fix up by Friday the better, but I figured > it's better to have a little lead time so we can fix up other things > depending on it, and have a little test time. Guido, do you remember which revision from the sandbox you merged/ copied? It looks like it was missing some stuff I committed last night. If you know which revision you merged it'll make it easy for me to copy the latest stuff over. Sadly I did not use svnmerge. :( OTOH, if you get to it before I do, the sandbox branch is now completely up-to-date with all my changes. It needs to be merged to the head of the py3k trunk and then I can kill the sandbox branch. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtYqQ3EjvBPtnXfVAQLfkgP+LsMc8jST44xecY/G0TGlHDB3CNXSVqoA kZuVM/8YPjDfWdhTuaLaGHD9o6ld7OMAGQsYi3PYFG5tOgBM1dauCVvHh1ltSRTf rYCNcoW6hntCiGnCVqECsK2nCLhcMowI7R0FWylEbCY16Vobs3hHsJKAdMrqR120 9jqTgzr9aic= =fvgZ -----END PGP SIGNATURE----- From fdrake at acm.org Thu Aug 30 04:25:47 2007 From: fdrake at acm.org (Fred Drake) Date: Wed, 29 Aug 2007 22:25:47 -0400 Subject: [Python-3000] Will Py3K be friendlier to optimization opportunities? In-Reply-To: <18134.10562.798661.664005@montanaro.dyndns.org> References: <18133.44715.247285.482372@montanaro.dyndns.org> <46D5B0BD.6010209@v.loewis.de> <18133.50997.687840.234611@montanaro.dyndns.org> <43aa6ff70708291402m29205719of476e8e1d4e7c965@mail.gmail.com> <18134.10562.798661.664005@montanaro.dyndns.org> Message-ID: On Aug 29, 2007, at 10:19 PM, skip at pobox.com wrote: > You can achieve these sorts of effects by assigning an object to > sys.modules[modulename]. Only for imports that haven't happened yet, which tends to be fragile. -Fred -- Fred Drake From guido at python.org Thu Aug 30 03:30:38 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 18:30:38 -0700 Subject: [Python-3000] email package back Message-ID: In preparation of the release Friday, I put the email package back, from Barry's sandbox. This breaks a few things. Please help clean these up! Also a few things were disabled to cope with its temporary demise. If you remember doing one of these, please undo them! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Aug 30 05:21:44 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 20:21:44 -0700 Subject: [Python-3000] [Python-3000-checkins] r57691 - python/branches/py3k/Lib/email In-Reply-To: <8FE35331-8B63-421B-BBB9-044983EA5760@python.org> References: <20070830011514.A50B51E4002@bag.python.org> <5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org> <8FE35331-8B63-421B-BBB9-044983EA5760@python.org> Message-ID: On 8/29/07, Barry Warsaw wrote: > Guido, do you remember which revision from the sandbox you merged/ > copied? It looks like it was missing some stuff I committed last > night. If you know which revision you merged it'll make it easy for > me to copy the latest stuff over. Sadly I did not use svnmerge. :( > > OTOH, if you get to it before I do, the sandbox branch is now > completely up-to-date with all my changes. It needs to be merged to > the head of the py3k trunk and then I can kill the sandbox branch. Ouch, I didn't make a note of that. I misunderstood how svn copy worked, and assumed it would copy the latest revision -- but it copied my working copy. Fortunately I had done a sync not too long ago. I think I can reconstruct the diffs and will apply them manually to the py3k branch, then you can erase the sandbox. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Aug 30 05:28:16 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 20:28:16 -0700 Subject: [Python-3000] [Python-3000-checkins] r57691 - python/branches/py3k/Lib/email In-Reply-To: References: <20070830011514.A50B51E4002@bag.python.org> <5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org> <8FE35331-8B63-421B-BBB9-044983EA5760@python.org> Message-ID: No, I don't think I can recover the changes. Would it work to just copy the files over from the sandbox, forcing Lib/email in the py3k branch to be identical to emailpkg/5_0-exp/email in the sandbox? On 8/29/07, Guido van Rossum wrote: > On 8/29/07, Barry Warsaw wrote: > > Guido, do you remember which revision from the sandbox you merged/ > > copied? It looks like it was missing some stuff I committed last > > night. If you know which revision you merged it'll make it easy for > > me to copy the latest stuff over. Sadly I did not use svnmerge. :( > > > > OTOH, if you get to it before I do, the sandbox branch is now > > completely up-to-date with all my changes. It needs to be merged to > > the head of the py3k trunk and then I can kill the sandbox branch. > > Ouch, I didn't make a note of that. I misunderstood how svn copy > worked, and assumed it would copy the latest revision -- but it copied > my working copy. Fortunately I had done a sync not too long ago. > > I think I can reconstruct the diffs and will apply them manually to > the py3k branch, then you can erase the sandbox. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Thu Aug 30 05:33:53 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 29 Aug 2007 23:33:53 -0400 Subject: [Python-3000] [Python-3000-checkins] r57691 - python/branches/py3k/Lib/email In-Reply-To: References: <20070830011514.A50B51E4002@bag.python.org> <5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org> <8FE35331-8B63-421B-BBB9-044983EA5760@python.org> Message-ID: <6AFC1C7C-6BF7-4057-9E53-76030FEB214C@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 29, 2007, at 11:28 PM, Guido van Rossum wrote: > No, I don't think I can recover the changes. Would it work to just > copy the files over from the sandbox, forcing Lib/email in the py3k > branch to be identical to emailpkg/5_0-exp/email in the sandbox? Yes, that /should/ work. I'll lose my last commit to the py3k branch but that will be easy to recover. I'm going to sleep now so if you get to it before I wake up I won't do it in the morning before you wake up. And vice versa (or something like that :). Thanks, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtY6oXEjvBPtnXfVAQK3hgQAqqyeWk9qive/A/VsP6sQB/DUVZoMlWhV L1VVB133aFxii8TGyk+C8LvZOD0Z31/98vREW5aDMEhxGEhkk9kHQAQkLqSXEEWA a1kS0fJ0YMXOaDA8cbvFpJ2NWPJpFh1ki1wc+PtobO59O+rRnhnOdL5WyxB86ahR 9LIqkF5xol0= =dKTM -----END PGP SIGNATURE----- From guido at python.org Thu Aug 30 05:51:05 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Aug 2007 20:51:05 -0700 Subject: [Python-3000] [Python-3000-checkins] r57691 - python/branches/py3k/Lib/email In-Reply-To: <6AFC1C7C-6BF7-4057-9E53-76030FEB214C@python.org> References: <20070830011514.A50B51E4002@bag.python.org> <5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org> <8FE35331-8B63-421B-BBB9-044983EA5760@python.org> <6AFC1C7C-6BF7-4057-9E53-76030FEB214C@python.org> Message-ID: On 8/29/07, Barry Warsaw wrote: > On Aug 29, 2007, at 11:28 PM, Guido van Rossum wrote: > > No, I don't think I can recover the changes. Would it work to just > > copy the files over from the sandbox, forcing Lib/email in the py3k > > branch to be identical to emailpkg/5_0-exp/email in the sandbox? > > Yes, that /should/ work. I'll lose my last commit to the py3k branch > but that will be easy to recover. I'm going to sleep now so if you > get to it before I wake up I won't do it in the morning before you > wake up. And vice versa (or something like that :). OK, I did that. However (despite my promise in the checkin msg) I couldn't re-apply the changes which you applied to the py3kbranch. Can you reconstruct these yourself when you get up in the morning? I'm afrain I'll just break more stuff. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Thu Aug 30 07:35:32 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Aug 2007 07:35:32 +0200 Subject: [Python-3000] Need Windows build instructions In-Reply-To: References: Message-ID: <46D65724.7000602@v.loewis.de> > Can someone familiar with building Py3k on Windows add a section on > how to build it on Windows to the new README? Done, by pointing to PCbuild/readme.txt. Regards, Martin From theller at ctypes.org Thu Aug 30 08:08:40 2007 From: theller at ctypes.org (Thomas Heller) Date: Thu, 30 Aug 2007 08:08:40 +0200 Subject: [Python-3000] buildbots In-Reply-To: References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> Message-ID: David Bolen schrieb: > David Bolen writes: > >> "Martin v. L?wis" writes: >> >>>> These messageboxes of course hang the tests on the windows build servers, >>>> so probably it would be good if they could be disabled completely. >>> >>> I think this will be very difficult to achieve. >> >> Could the tests be run beneath a shim process that used SetErrorMode() >> to disable all the OS-based process failure dialog boxes? If I >> remember correctly the error mode is inherited, so an independent >> small exec module could reset the mode, and execute the normal test >> sequence as a child process. > > Or if using ctypes is ok, perhaps it could be done right in the test > runner. > > While I haven't done any local mods to preventthe C RTL boxes, > selecting Ignore on them gets me to the OS level box, and: > > Python 3.0x (py3k, Aug 27 2007, 22:44:06) [MSC v.1310 32 bit (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. >>>> import test_os > [50256 refs] >>>> test_os.test_main() > > dies with popup in test_execvpe_with_bad_program (test_os.ExecTests). But > > Python 3.0x (py3k, Aug 27 2007, 22:44:06) [MSC v.1310 32 bit (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. >>>> import ctypes > [39344 refs] >>>> ctypes.windll.kernel32.SetErrorMode(7) > 0 > [40694 refs] >>>> import test_os > [55893 refs] >>>> test_os.test_main() > > doesn't present the OS popup prior to process exit. Cool, this works! I suggest to apply this patch, which sets an environment variable in the Tools\buildbot\test.bat script, detects the Windows debug build, and calls SetErrorMode(7) as David suggested: Index: Lib/test/regrtest.py =================================================================== --- Lib/test/regrtest.py (revision 57666) +++ Lib/test/regrtest.py (working copy) @@ -208,6 +208,15 @@ flags on the command line. """ + if sys.platform == "win32": + if "_d.pyd" in [s[0] for s in imp.get_suffixes()]: + # running is a debug build. + if os.environ.get("PYTEST_NONINTERACTIVE", ""): + # If the PYTEST_NONINTERACTIVE environment variable is + # set, we do not want any message boxes. + import ctypes + ctypes.windll.kernel32.SetErrorMode(7) + test_support.record_original_stdout(sys.stdout) try: opts, args = getopt.getopt(sys.argv[1:], 'dhvgqxsS:rf:lu:t:TD:NLR:wM:', Index: Tools/buildbot/test.bat =================================================================== --- Tools/buildbot/test.bat (revision 57666) +++ Tools/buildbot/test.bat (working copy) @@ -1,3 +1,4 @@ @rem Used by the buildbot "test" step. cd PCbuild +set PYTEST_NONINTERACTIVE=1 call rt.bat -d -q -uall -rw Thomas From theller at ctypes.org Thu Aug 30 08:21:37 2007 From: theller at ctypes.org (Thomas Heller) Date: Thu, 30 Aug 2007 08:21:37 +0200 Subject: [Python-3000] buildbots In-Reply-To: References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> Message-ID: Thomas Heller schrieb: > > I suggest to apply this patch, which sets an environment variable in the > Tools\buildbot\test.bat script, detects the Windows debug build, and calls > SetErrorMode(7) as David suggested: If noone objects, I would like to apply this patch first, see if it avoids the test_os.py test hanging, and afterwards fix the test_os test. Thomas > > Index: Lib/test/regrtest.py > =================================================================== > --- Lib/test/regrtest.py (revision 57666) > +++ Lib/test/regrtest.py (working copy) > @@ -208,6 +208,15 @@ > flags on the command line. > """ > > + if sys.platform == "win32": > + if "_d.pyd" in [s[0] for s in imp.get_suffixes()]: > + # running is a debug build. > + if os.environ.get("PYTEST_NONINTERACTIVE", ""): > + # If the PYTEST_NONINTERACTIVE environment variable is > + # set, we do not want any message boxes. > + import ctypes > + ctypes.windll.kernel32.SetErrorMode(7) > + > test_support.record_original_stdout(sys.stdout) > try: > opts, args = getopt.getopt(sys.argv[1:], 'dhvgqxsS:rf:lu:t:TD:NLR:wM:', > Index: Tools/buildbot/test.bat > =================================================================== > --- Tools/buildbot/test.bat (revision 57666) > +++ Tools/buildbot/test.bat (working copy) > @@ -1,3 +1,4 @@ > @rem Used by the buildbot "test" step. > cd PCbuild > +set PYTEST_NONINTERACTIVE=1 > call rt.bat -d -q -uall -rw > From martin at v.loewis.de Thu Aug 30 08:26:33 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 30 Aug 2007 08:26:33 +0200 Subject: [Python-3000] buildbots In-Reply-To: References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> Message-ID: <46D66319.7030209@v.loewis.de> > I suggest to apply this patch, which sets an environment variable in the > Tools\buildbot\test.bat script, detects the Windows debug build, and calls > SetErrorMode(7) as David suggested: Sounds fine with me - although I would leave out the test for debug build, and just check the environment variable. Are you saying that calling SetErrorMode also makes the VC _ASSERT message boxes go away? Regards, Martin From db3l.net at gmail.com Thu Aug 30 08:32:40 2007 From: db3l.net at gmail.com (David Bolen) Date: Thu, 30 Aug 2007 02:32:40 -0400 Subject: [Python-3000] buildbots References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> Message-ID: "Martin v. L?wis" writes: > Are you saying that calling SetErrorMode also makes the VC _ASSERT > message boxes go away? I don't believe it should, no. The assert message boxes are from the VC runtime, whereas the OS error dialogs are from, well, the OS :-) Certainly in my manual tests, I still had to "Ignore" my way through the assert dialogs before checking the results on the OS dialogs. -- David From theller at ctypes.org Thu Aug 30 08:40:05 2007 From: theller at ctypes.org (Thomas Heller) Date: Thu, 30 Aug 2007 08:40:05 +0200 Subject: [Python-3000] buildbots In-Reply-To: <46D66319.7030209@v.loewis.de> References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> Message-ID: Martin v. L?wis schrieb: >> I suggest to apply this patch, which sets an environment variable in the >> Tools\buildbot\test.bat script, detects the Windows debug build, and calls >> SetErrorMode(7) as David suggested: > > Sounds fine with me - although I would leave out the test for debug > build, and just check the environment variable. > > Are you saying that calling SetErrorMode also makes the VC _ASSERT > message boxes go away? No. My mistake - I still had some _CrtSetReport... calls in a patched posixmodule.c. New patch (still detects the debug build because the name of the C runtime dll depends on it): Index: Lib/test/regrtest.py =================================================================== --- Lib/test/regrtest.py (revision 57666) +++ Lib/test/regrtest.py (working copy) @@ -208,6 +208,22 @@ flags on the command line. """ + if sys.platform == "win32": + import imp + if "_d.pyd" in [s[0] for s in imp.get_suffixes()]: + # running is a debug build. + if os.environ.get("PYTEST_NONINTERACTIVE", ""): + # If the PYTEST_NONINTERACTIVE environment variable is + # set, we do not want any message boxes. + import ctypes + # from + _CRT_ASSERT = 2 + _CRTDBG_MODE_FILE = 1 + _CRTDBG_FILE_STDERR = -5 + ctypes.cdll.msvcr71d._CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE); + ctypes.cdll.msvcr71d._CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR); + ctypes.windll.kernel32.SetErrorMode(7) + test_support.record_original_stdout(sys.stdout) try: opts, args = getopt.getopt(sys.argv[1:], 'dhvgqxsS:rf:lu:t:TD:NLR:wM:', Index: Tools/buildbot/test.bat =================================================================== --- Tools/buildbot/test.bat (revision 57666) +++ Tools/buildbot/test.bat (working copy) @@ -1,3 +1,4 @@ @rem Used by the buildbot "test" step. cd PCbuild +set PYTEST_NONINTERACTIVE=1 call rt.bat -d -q -uall -rw From nnorwitz at gmail.com Thu Aug 30 09:00:59 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Thu, 30 Aug 2007 00:00:59 -0700 Subject: [Python-3000] current status Message-ID: There are 6 tests that fail on all platforms AFAIK: 3 tests failed: test_mailbox test_old_mailbox test_unicode_file 3 skips unexpected on linux2: test_smtplib test_sundry test_ssl I believe test_smtplib, test_sundry fail for the same reason at least partially. They can't import email.base64mime.encode. There are decode functions, but encode is gone from base64mime. I don't know if that's the way it's supposed to be or not. But smtplib can't be imported because encode is missing. Some of the failures in test_mailbox and test_old_mailbox are the same, but I think test_mailbox might have more problems. I hopefully fixed some platform specific problems, but others remain: * test_normalization fails on several boxes (where locale is not C maybe?) * On ia64, test_tarfile.PAXUnicodeTest.test_utf7_filename generates this exception: Objects/exceptions.c:1392: PyUnicodeDecodeError_Create: Assertion `start < 2147483647' failed. * On ia64 and Win64 (IIRC), this fails: self.assertEqual(round(1e20), 1e20) AssertionError: 0 != 1e+20 * On PPC64, all the dbm code seems to be crashing * File "Lib/test/test_nis.py", line 27, in test_maps if nis.match(k, nismap) != v: SystemError: can't use str as char buffer * On Solaris, hashlib can't import _md5 which creates a bunch of problems. * On Win64, there's this assert: SystemError: Objects\longobject.c:412: bad argument to internal function I don't see how it's getting triggered based on the traceback though Win64 has a bunch of weird issues: http://python.org/dev/buildbot/3.0/amd64%20XP%203.0/builds/40/step-test/0 n From martin at v.loewis.de Thu Aug 30 09:01:29 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Aug 2007 09:01:29 +0200 Subject: [Python-3000] buildbots In-Reply-To: References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> Message-ID: <46D66B49.8070600@v.loewis.de> > New patch (still detects the debug build because the name of the C runtime > dll depends on it): I know it will be difficult to talk you into not using ctypes :-), but... I think this should go into the interpreter code itself. One problem with your patch is that it breaks if Python is build with VC8. It should still require an environment variable, say PYTHONNOERRORWINDOW, whether or not it should be considered only in debug releases, I don't know. One place to put it would be Modules/main.c (where all the other environment variables are considered). Regards, Martin From talin at acm.org Thu Aug 30 09:03:30 2007 From: talin at acm.org (Talin) Date: Thu, 30 Aug 2007 00:03:30 -0700 Subject: [Python-3000] string.Formatter class In-Reply-To: <46D4E8F6.30508@trueblade.com> References: <46D40B88.4080202@trueblade.com> <46D4AD40.9070006@trueblade.com> <46D4E8F6.30508@trueblade.com> Message-ID: <46D66BC2.3060708@acm.org> Eric Smith wrote: > Eric Smith wrote: >> Jim Jewett wrote: > >>> but you might want to take inspiration from the "tail" of an >>> elementtree node, and return the field with the literal next to it as >>> a single object. >>> >>> (literal_text, field_name, format_spec, conversion) >> I think I like that best. > > I implemented this in r57641. I think it simplifies things. At least, > it's easier to explain. Actually...I'm in the middle of writing the docs for the reference manual, and I'm finding this a little harder to explain. Not *much* harder, but a little bit. I would probably have gone with one of the following: # Test for str vs tuple literal_text (field_name, format_spec, conversion) # Test for length of the tuple (literal_text) (field_name, format_spec, conversion) # Test for 'None' format_spec (literal_text, None, None) (field_name, format_spec, conversion) However, I'm not adamant about this - it's up to you what you like best, I'll come up with a way to explain it. Also I recognize that your method is probably more efficient for the nominal use case -- less tuple creation. Also I wanted to ask: How about making the built-in 'format' function have a default value of "" for the second argument? So I can just say: format(x) as a synonym for: str(x) > Due to an optimization dealing with escaped braces, it's possible for > (literal, None, None, None) to be returned more than once. I don't > think that's a problem, as long as it's documented. If you look at > string.py's Formatter.vformat, I don't think it complicates the > implementation at all. It's also possible for the literal text to be an empty string if you have several consecutive format fields - correct? > Thanks for the suggestion. From theller at ctypes.org Thu Aug 30 09:25:06 2007 From: theller at ctypes.org (Thomas Heller) Date: Thu, 30 Aug 2007 09:25:06 +0200 Subject: [Python-3000] buildbots In-Reply-To: <46D66B49.8070600@v.loewis.de> References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> Message-ID: Martin v. L?wis schrieb: >> New patch (still detects the debug build because the name of the C runtime >> dll depends on it): > > I know it will be difficult to talk you into not using ctypes :-), > but... > > I think this should go into the interpreter code itself. One problem > with your patch is that it breaks if Python is build with VC8. ctypes isn't perfect - it needs a way to reliably access the currently used C runtime library on windows. But that is off-topic for this thread. > It should still require an environment variable, say > PYTHONNOERRORWINDOW, whether or not it should be considered only > in debug releases, I don't know. One place to put it would be > Modules/main.c (where all the other environment variables are > considered). IMO all this is currently a hack for the buildbot only. Maybe it should be converted into something more useful. About debug release: The _CrtSetReport... functions are only available in the debug libray. So, they have to live inside a #ifdef _DEBUG/#endif block. The set_error_mode() function is more useful; AFAIK it also prevents that a dialog box is shown when an extension module cannot be loaded because the extension module depends on a dll that is not found or doesn't have the entry points that the extension links to. So, an environment variable would be useful, but maybe there should also be a Python function available that calls set_error_mode(). sys.set_error_mode()? Thomas From martin at v.loewis.de Thu Aug 30 09:39:06 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Aug 2007 09:39:06 +0200 Subject: [Python-3000] buildbots In-Reply-To: References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> Message-ID: <46D6741A.9040801@v.loewis.de> > So, an environment variable would be useful, but maybe there should also be > a Python function available that calls set_error_mode(). sys.set_error_mode()? Even though this would be somewhat lying - I'd put it into msvcrt.set_error_mode. For the _CrtSet functions, one might expose them as-is; they do belong to msvcrt, so the module would be the proper place. For SetErrorMode, still put it into msvcrt - it's at least Windows-specific. Regards, Martin From martin at v.loewis.de Thu Aug 30 10:38:08 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 30 Aug 2007 10:38:08 +0200 Subject: [Python-3000] refleak in test_io? In-Reply-To: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com> References: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com> Message-ID: <46D681F0.2050105@v.loewis.de> > I tried recreating the leak with more controllable types, but I haven't > got very far. It seems to be caused by some weird interaction between > io.FileIO, _fileio._FileIO and io.IOBase, specifically io.IOBase.__del_ > _() calling self.close(), and io.FileIO.close() calling > _fileio._FileIO.close() *and* io.RawIOBase.close(). The weird thing is > that the contents of RawIOBase.close() doesn't matter. The mere act of > calling RawBaseIO.close (self) causes the leak. Remove the call, or > change it into an attribute fetch, and the leak is gone. I'm stumped. I think the problem is that the class remains referenced in io.RawIOBase._abc_cache: py> io.RawIOBase._abc_cache set() py> class f(io.RawIOBase):pass ... py> isinstance(f(), io.RawIOBase) True py> io.RawIOBase._abc_cache {} py> del f py> io.RawIOBase._abc_cache {} Each time test_destructor is called, another class will be added to _abc_cache. Regards, Martin From eric+python-dev at trueblade.com Thu Aug 30 12:55:18 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 30 Aug 2007 06:55:18 -0400 Subject: [Python-3000] string.Formatter class In-Reply-To: <46D66BC2.3060708@acm.org> References: <46D40B88.4080202@trueblade.com> <46D4AD40.9070006@trueblade.com> <46D4E8F6.30508@trueblade.com> <46D66BC2.3060708@acm.org> Message-ID: <46D6A216.9010104@trueblade.com> Talin wrote: > Eric Smith wrote: >> Eric Smith wrote: >>> Jim Jewett wrote: >> >>>> but you might want to take inspiration from the "tail" of an >>>> elementtree node, and return the field with the literal next to it as >>>> a single object. >>>> >>>> (literal_text, field_name, format_spec, conversion) >>> I think I like that best. >> >> I implemented this in r57641. I think it simplifies things. At least, >> it's easier to explain. > > Actually...I'm in the middle of writing the docs for the reference > manual, and I'm finding this a little harder to explain. Not *much* > harder, but a little bit. I think it's easier because it's always: Output the (possibly zero length) literal text then, format and output the field, if field_name is non-None But I'm flexible. > I would probably have gone with one of the following: > > # Test for str vs tuple > literal_text > (field_name, format_spec, conversion) > > # Test for length of the tuple > (literal_text) > (field_name, format_spec, conversion) > > # Test for 'None' format_spec > (literal_text, None, None) > (field_name, format_spec, conversion) If you want to change, I'd go with this last one. Actually, I had it working this way, once, but I thought that re-using the first item (which I called literal_or_field_name) was too obscure. > However, I'm not adamant about this - it's up to you what you like best, > I'll come up with a way to explain it. Also I recognize that your method > is probably more efficient for the nominal use case -- less tuple creation. Also, it requires fewer iterations. Instead of 2 iterations per field_name in the string: yield literal yield field_name, format_spec, conversion it's just one: yield literal, field_name, format_spec, conversion Like you, I don't feel strongly about which way it works. But the Jim's suggestion that it's how elementtree works sort of convinced me. > Also I wanted to ask: How about making the built-in 'format' function > have a default value of "" for the second argument? So I can just say: > > format(x) > > as a synonym for: > > str(x) Makes sense to me. It would really call x.__format__(''), which the PEP suggests (but does not require) be the same as str(x). >> Due to an optimization dealing with escaped braces, it's possible for >> (literal, None, None, None) to be returned more than once. I don't >> think that's a problem, as long as it's documented. If you look at >> string.py's Formatter.vformat, I don't think it complicates the >> implementation at all. > > It's also possible for the literal text to be an empty string if you > have several consecutive format fields - correct? Correct. The literal text will always be a zero-or-greater length string. From eric+python-dev at trueblade.com Thu Aug 30 13:05:45 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 30 Aug 2007 07:05:45 -0400 Subject: [Python-3000] buildbots In-Reply-To: <46D66B49.8070600@v.loewis.de> References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> Message-ID: <46D6A489.9010206@trueblade.com> Martin v. L?wis wrote: >> New patch (still detects the debug build because the name of the C runtime >> dll depends on it): > > I know it will be difficult to talk you into not using ctypes :-), > but... > > I think this should go into the interpreter code itself. One problem > with your patch is that it breaks if Python is build with VC8. > > It should still require an environment variable, say > PYTHONNOERRORWINDOW, whether or not it should be considered only > in debug releases, I don't know. One place to put it would be > Modules/main.c (where all the other environment variables are > considered). It should also not be used with pythonw.exe, correct? In that case, you want the various dialog boxes. From rrr at ronadam.com Thu Aug 30 13:12:13 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 30 Aug 2007 06:12:13 -0500 Subject: [Python-3000] string.Formatter class In-Reply-To: <46D40B88.4080202@trueblade.com> References: <46D40B88.4080202@trueblade.com> Message-ID: <46D6A60D.2070503@ronadam.com> Eric Smith wrote: > One of the things that PEP 3101 deliberately under specifies is the > Formatter class, leaving decisions up to the implementation. Now that a > working implementation exists, I think it's reasonable to tighten it up. > > I have checked in a Formatter class that specifies the following methods > (in addition to the ones already defined in the PEP): > > parse(format_string) > Loops over the format_string and returns an iterable of tuples > (literal_text, field_name, format_spec, conversion). This is used by > vformat to break the string in to either literal text, or fields that > need expanding. If literal_text is None, then expand (field_name, > format_spec, conversion) and append it to the output. If literal_text > is not None, append it to the output. > > get_field(field_name, args, kwargs, used_args) > Given a field_name as returned by parse, convert it to an object to be > formatted. The default version takes strings of the form defined in the > PEP, such as "0[name]" or "label.title". It records which args have > been used in used_args. args and kwargs are as passed in to vformat. Rather than pass the used_args set out and have it modified in a different methods, I think it would be better to pass the arg_used back along with the object. That keeps all the code that is involved in checking used args is in one method. The arg_used value may be useful in other ways as well. obj, arg_used = self.get_field(field_name, args, kwargs) used_args.add(arg_used) > convert_field(value, conversion) > Converts the value (returned by get_field) using the conversion > (returned by the parse tuple). The default version understands 'r' > (repr) and 's' (str). > Or, define your own conversion character: > ================= > class XFormatter(Formatter): > def convert_field(self, value, conversion): > if conversion == 'x': > return None > if conversion == 'r': > return repr(value) > if conversion == 's': > return str(value) > return value > fmt = XFormatter() > print(fmt.format("{0!r}:{0!x}", fmt)) > ================= > which prints: > <__main__.XFormatter object at 0xf6f6d2cc>:None I wonder if this is splitting things up a bit too finely? If the format function takes a conversion argument, it makes it possible to do everything by overriding format_field. def format_field(self, value, format_spec, conversion): return format(value, format_spec, conversion) Adding this to Talins suggestion, the signature of format could be... format(value, format_spec="", conversion="") Then the above example becomes... class XFormatter(Formatter): def format_field(self, value, format_spec, conversion): if conversion == 'x': return "None" return format(value, format_spec, conversion) It just seems cleaner to me. Cheers, Ron From martin at v.loewis.de Thu Aug 30 13:24:34 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Aug 2007 13:24:34 +0200 Subject: [Python-3000] buildbots In-Reply-To: <46D6A489.9010206@trueblade.com> References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> <46D6A489.9010206@trueblade.com> Message-ID: <46D6A8F2.2060106@v.loewis.de> >> It should still require an environment variable, say >> PYTHONNOERRORWINDOW, whether or not it should be considered only >> in debug releases, I don't know. One place to put it would be >> Modules/main.c (where all the other environment variables are >> considered). > > It should also not be used with pythonw.exe, correct? In that case, you > want the various dialog boxes. I'm not sure. If PYTHONNOERRORWINDOW is set, I would expect that it does not create error windows, even if it creates windows otherwise just fine. If you don't want this, don't set PYTHONNOERRORWINDOW. Regards, Martin From barry at python.org Thu Aug 30 13:27:02 2007 From: barry at python.org (Barry Warsaw) Date: Thu, 30 Aug 2007 07:27:02 -0400 Subject: [Python-3000] [Python-3000-checkins] r57691 - python/branches/py3k/Lib/email In-Reply-To: References: <20070830011514.A50B51E4002@bag.python.org> <5DF36BD5-BD4A-4935-A25A-6900D08194DD@python.org> <8FE35331-8B63-421B-BBB9-044983EA5760@python.org> <6AFC1C7C-6BF7-4057-9E53-76030FEB214C@python.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 29, 2007, at 11:51 PM, Guido van Rossum wrote: > On 8/29/07, Barry Warsaw wrote: >> On Aug 29, 2007, at 11:28 PM, Guido van Rossum wrote: >>> No, I don't think I can recover the changes. Would it work to just >>> copy the files over from the sandbox, forcing Lib/email in the py3k >>> branch to be identical to emailpkg/5_0-exp/email in the sandbox? >> >> Yes, that /should/ work. I'll lose my last commit to the py3k branch >> but that will be easy to recover. I'm going to sleep now so if you >> get to it before I wake up I won't do it in the morning before you >> wake up. And vice versa (or something like that :). > > OK, I did that. However (despite my promise in the checkin msg) I > couldn't re-apply the changes which you applied to the py3kbranch. Can > you reconstruct these yourself when you get up in the morning? I'm > afrain I'll just break more stuff. Thanks Guido. Yep, will do. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtaph3EjvBPtnXfVAQIpxwQAnqjZnL7j7LjYTaSURvkAXdycju+IC3FS p2jDnWAMA4TYEjEsyN/OEhaOMVhkPz7cEa+TYcEDe+toCkNHHq6rdEQH3ouI3y9n mFgzEPHPu1GrhqJUp4hT4prUqU/oDbeRL9ulryTzv4JNCaIrZsmElscbUWWbrp3W z4+LAJFD9mo= =iAFM -----END PGP SIGNATURE----- From barry at python.org Thu Aug 30 13:33:19 2007 From: barry at python.org (Barry Warsaw) Date: Thu, 30 Aug 2007 07:33:19 -0400 Subject: [Python-3000] current status In-Reply-To: References: Message-ID: <5BCDBECB-9509-4F76-A6D8-4DD53AEA5CC7@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 30, 2007, at 3:00 AM, Neal Norwitz wrote: > There are 6 tests that fail on all platforms AFAIK: > > 3 tests failed: > test_mailbox test_old_mailbox test_unicode_file > 3 skips unexpected on linux2: > test_smtplib test_sundry test_ssl > > I believe test_smtplib, test_sundry fail for the same reason at least > partially. They can't import email.base64mime.encode. There are > decode functions, but encode is gone from base64mime. I don't know if > that's the way it's supposed to be or not. But smtplib can't be > imported because encode is missing. For now, I'll restore .encode() for a1 though it may eventually go away. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtaq/3EjvBPtnXfVAQIkRQP/WA+alW/UqOxRABBfqOxIvjFsp0Yaif/w HcJRIrDXeZmMFF5EYX3k2iwYkJ5vQoaEtL2fbPniOU4Vu5HdPBddctjo5yzBKmGE PsRuHCk4Q+YXoOOxNN9/vqEZnhHPjho6CTZi6wGs08czF7JqqC2vzuFFF3Fn/Iks X77MbAKUgqM= =7pmB -----END PGP SIGNATURE----- From eric+python-dev at trueblade.com Thu Aug 30 13:38:18 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 30 Aug 2007 07:38:18 -0400 Subject: [Python-3000] buildbots In-Reply-To: <46D6A8F2.2060106@v.loewis.de> References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> <46D6A489.9010206@trueblade.com> <46D6A8F2.2060106@v.loewis.de> Message-ID: <46D6AC2A.1080907@trueblade.com> Martin v. L?wis wrote: >>> It should still require an environment variable, say >>> PYTHONNOERRORWINDOW, whether or not it should be considered only >>> in debug releases, I don't know. One place to put it would be >>> Modules/main.c (where all the other environment variables are >>> considered). >> It should also not be used with pythonw.exe, correct? In that case, you >> want the various dialog boxes. > > I'm not sure. If PYTHONNOERRORWINDOW is set, I would expect that it does > not create error windows, even if it creates windows otherwise just > fine. If you don't want this, don't set PYTHONNOERRORWINDOW. But unlike Unix, these text messages are guaranteed to be lost. I don't see the point in setting up a situation where they'd be lost, but I don't feel that strongly about it. As you say, don't set the environment variable. From theller at ctypes.org Thu Aug 30 13:36:11 2007 From: theller at ctypes.org (Thomas Heller) Date: Thu, 30 Aug 2007 13:36:11 +0200 Subject: [Python-3000] buildbots In-Reply-To: <46D6741A.9040801@v.loewis.de> References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> <46D6741A.9040801@v.loewis.de> Message-ID: Martin v. L?wis schrieb: >> So, an environment variable would be useful, but maybe there should also be >> a Python function available that calls set_error_mode(). sys.set_error_mode()? > > Even though this would be somewhat lying - I'd put it into > msvcrt.set_error_mode. For the _CrtSet functions, one might > expose them as-is; they do belong to msvcrt, so the module > would be the proper place. For SetErrorMode, still put it > into msvcrt - it's at least Windows-specific. These are all great ideas, but I'm afraid it doesn't fit into the time I have available to spend on this. My primary goal is to care about the buildbots. Ok, so I'll keep clicking 'Abort' on the message boxes whenever I see them, and I will soon try to fix the assertion in test_os.py. Thomas From martin at v.loewis.de Thu Aug 30 13:52:54 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Aug 2007 13:52:54 +0200 Subject: [Python-3000] buildbots In-Reply-To: <46D6AC2A.1080907@trueblade.com> References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> <46D6A489.9010206@trueblade.com> <46D6A8F2.2060106@v.loewis.de> <46D6AC2A.1080907@trueblade.com> Message-ID: <46D6AF96.9030404@v.loewis.de> > But unlike Unix, these text messages are guaranteed to be lost. Not really. We are probably talking about release builds primarily, where the only such message is the system error (as the assertions aren't compiled in, anyway). If such an error occurs, the message is lost - but an error code is returned to the API function that caused the error. This error code should then translate to a Python exception (e.g. an ImportError if the dialog tried to say that a DLL could not be loaded). Whether or not that exception then also gets lost depends on the application. Regards, Martin From db3l.net at gmail.com Thu Aug 30 13:49:10 2007 From: db3l.net at gmail.com (David Bolen) Date: Thu, 30 Aug 2007 07:49:10 -0400 Subject: [Python-3000] buildbots References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> <46D6741A.9040801@v.loewis.de> Message-ID: "Martin v. L?wis" writes: >> So, an environment variable would be useful, but maybe there should also be >> a Python function available that calls set_error_mode(). sys.set_error_mode()? > > Even though this would be somewhat lying - I'd put it into > msvcrt.set_error_mode. For the _CrtSet functions, one might > expose them as-is; they do belong to msvcrt, so the module > would be the proper place. For SetErrorMode, still put it > into msvcrt - it's at least Windows-specific. For SetErrorMode, if you're just looking for a non-ctypes wrapping, it's already covered by pywin32's win32api, which seems simple enough to obtain (and likely to already be present) if you're working at this level with Win32 calls as a user of Python. Nor is ctypes very complicated as a fallback. I'm not sure this is a common enough a call to need a built-in wrapping. For this particular case of wanting to use it when developing Python itself, it actually feels a bit more appropriate to me to make the call external to the Python executable under test since it's really a behavior being imposed by the test environment. If a mechanism was implemented to have Python issue the call itself, I'd probably limit it to this specific use case. -- David From thomas at python.org Thu Aug 30 14:17:51 2007 From: thomas at python.org (Thomas Wouters) Date: Thu, 30 Aug 2007 14:17:51 +0200 Subject: [Python-3000] refleak in test_io? In-Reply-To: <46D681F0.2050105@v.loewis.de> References: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com> <46D681F0.2050105@v.loewis.de> Message-ID: <9e804ac0708300517y636fb92bsdee0c798458bea@mail.gmail.com> On 8/30/07, "Martin v. L?wis" wrote: > > > I tried recreating the leak with more controllable types, but I haven't > > got very far. It seems to be caused by some weird interaction between > > io.FileIO, _fileio._FileIO and io.IOBase, specifically io.IOBase.__del_ > > _() calling self.close(), and io.FileIO.close() calling > > _fileio._FileIO.close() *and* io.RawIOBase.close(). The weird thing is > > that the contents of RawIOBase.close() doesn't matter. The mere act of > > calling RawBaseIO.close (self) causes the leak. Remove the call, or > > change it into an attribute fetch, and the leak is gone. I'm stumped. > > I think the problem is that the class remains referenced in > io.RawIOBase._abc_cache: > > py> io.RawIOBase._abc_cache > set() > py> class f(io.RawIOBase):pass > ... > py> isinstance(f(), io.RawIOBase) > True > py> io.RawIOBase._abc_cache > {} > py> del f > py> io.RawIOBase._abc_cache > {} > > Each time test_destructor is called, another class will be added to > _abc_cache. Ahh, thanks, I missed that cache. After browsing the code a bit it seems to me the _abc_cache and _abc_negative_cache need to be turned into weak sets. (Since a class can appear in any number of caches, positive and negative, we can't just check refcounts on the items in the caches.) Do we have a weak set implementation anywhere yet? I think I have one lying around I wrote for someone else a while back, it could be added to the weakref module. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070830/400f61d8/attachment.htm From thomas at python.org Thu Aug 30 14:24:39 2007 From: thomas at python.org (Thomas Wouters) Date: Thu, 30 Aug 2007 14:24:39 +0200 Subject: [Python-3000] refleak in test_io? In-Reply-To: References: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com> Message-ID: <9e804ac0708300524l12490ae5j9aa31392c91c5ebd@mail.gmail.com> On 8/30/07, Neal Norwitz wrote: > > On 8/29/07, Thomas Wouters wrote: > > > > Am I the only one seeing a refleak in test_io? > > I know of leaks in 4 modules, but they all may point to the same one > you identified: > > test_io leaked [62, 62] references, sum=124 > test_urllib leaked [122, 122] references, sum=244 > test_urllib2_localnet leaked [3, 3] references, sum=6 > test_xmlrpc leaked [26, 26] references, sum=52 FWIW, they do. Removing the subclass-cache fixes all these refleaks (but it's not really a solution ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070830/9541eba3/attachment.htm From martin at v.loewis.de Thu Aug 30 14:33:08 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 30 Aug 2007 14:33:08 +0200 Subject: [Python-3000] buildbots In-Reply-To: References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> <46D6741A.9040801@v.loewis.de> Message-ID: <46D6B904.1020505@v.loewis.de> > For SetErrorMode, if you're just looking for a non-ctypes wrapping, > it's already covered by pywin32's win32api, which seems simple enough > to obtain (and likely to already be present) if you're working at this > level with Win32 calls as a user of Python. Nor is ctypes very > complicated as a fallback. I'm not sure this is a common enough a > call to need a built-in wrapping. However, we can't use pywin32 on the buildbot slaves - it's not installed. > For this particular case of wanting to use it when developing Python > itself, it actually feels a bit more appropriate to me to make the > call external to the Python executable under test since it's really a > behavior being imposed by the test environment. If a mechanism was > implemented to have Python issue the call itself, I'd probably limit > it to this specific use case. That covers the SetErrorMode case, but not the CRT assertions - their messagebox settings don't get inherited through CreateProcess. Not sure why you want to limit it - I think it's a useful feature on its own to allow Python to run without somebody clicking buttons. (it would be a useful feature for windows as well to stop producing these messages, and report them through some other mechanism). Regards, Martin From guido at python.org Thu Aug 30 15:43:21 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Aug 2007 06:43:21 -0700 Subject: [Python-3000] current status In-Reply-To: References: Message-ID: On 8/30/07, Neal Norwitz wrote: > There are 6 tests that fail on all platforms AFAIK: > > 3 tests failed: > test_mailbox test_old_mailbox test_unicode_file > 3 skips unexpected on linux2: > test_smtplib test_sundry test_ssl Martin fixed test_unicode_file (I think he may be the only one who understood that test). test_ssl is not working because the ssl support in socket.py has been disabled -- with the latest merge from the trunk, Bill Janssen's server-side code came in, but it doesn't work yet with 3.0, which has a completely different set of classes in socket.py. I hope it's OK to release 3.0a1 without SSL support. > I believe test_smtplib, test_sundry fail for the same reason at least > partially. They can't import email.base64mime.encode. There are > decode functions, but encode is gone from base64mime. I don't know if > that's the way it's supposed to be or not. But smtplib can't be > imported because encode is missing. Barry said he'd fix this. > Some of the failures in test_mailbox and test_old_mailbox are the > same, but I think test_mailbox might have more problems. > > I hopefully fixed some platform specific problems, but others remain: > > * test_normalization fails on several boxes (where locale is not C maybe?) Oh, good suggestion. I was wondering about this myself. Alas, I have no idea what that test does. > * On ia64, test_tarfile.PAXUnicodeTest.test_utf7_filename generates > this exception: > Objects/exceptions.c:1392: PyUnicodeDecodeError_Create: Assertion > `start < 2147483647' failed. That's probably an uninitialized variable 'startinpos' in one of the functions that calls unicode_decode_call_errorhandler(). It's the 7th parameter. The header of that function is 150 characters wide. Yuck! Someone will need to reproduce the bug and then point gdb at it and it should be obvious. :-) > * On ia64 and Win64 (IIRC), this fails: self.assertEqual(round(1e20), 1e20) > AssertionError: 0 != 1e+20 > > * On PPC64, all the dbm code seems to be crashing > > * File "Lib/test/test_nis.py", line 27, in test_maps > if nis.match(k, nismap) != v: > SystemError: can't use str as char buffer Someone fixed this by changing t# into s#. Do we still need t#? Can someone who understands it explain what it does? > * On Solaris, hashlib can't import _md5 which creates a bunch of problems. I've seen this on other platforms that have an old openssl version. I think we don't have _md5 any more, so the code that looks for it is broken -- but this means we're more dependent on openssl than I'm comfortable with, even though the old _md5 modulehad an RSA copyright. :-( > * On Win64, there's this assert: > SystemError: Objects\longobject.c:412: bad argument to internal function > I don't see how it's getting triggered based on the traceback though > > Win64 has a bunch of weird issues: > > http://python.org/dev/buildbot/3.0/amd64%20XP%203.0/builds/40/step-test/0 -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Aug 30 15:47:24 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Aug 2007 06:47:24 -0700 Subject: [Python-3000] refleak in test_io? In-Reply-To: <9e804ac0708300517y636fb92bsdee0c798458bea@mail.gmail.com> References: <9e804ac0708291656r13c6b39cg19545e4f5a17e12d@mail.gmail.com> <46D681F0.2050105@v.loewis.de> <9e804ac0708300517y636fb92bsdee0c798458bea@mail.gmail.com> Message-ID: On 8/30/07, Thomas Wouters wrote: > > > On 8/30/07, "Martin v. L?wis" wrote: > > > I tried recreating the leak with more controllable types, but I haven't > > > got very far. It seems to be caused by some weird interaction between > > > io.FileIO, _fileio._FileIO and io.IOBase, specifically io.IOBase.__del_ > > > _() calling self.close(), and io.FileIO.close() calling > > > _fileio._FileIO.close() *and* io.RawIOBase.close(). The weird thing is > > > that the contents of RawIOBase.close() doesn't matter. The mere act of > > > calling RawBaseIO.close (self) causes the leak. Remove the call, or > > > change it into an attribute fetch, and the leak is gone. I'm stumped. > > > > I think the problem is that the class remains referenced in > > io.RawIOBase._abc_cache: > > > > py> io.RawIOBase._abc_cache > > set() > > py> class f(io.RawIOBase):pass > > ... > > py> isinstance(f(), io.RawIOBase) > > True > > py> io.RawIOBase._abc_cache > > {} > > py> del f > > py> io.RawIOBase._abc_cache > > {} > > > > Each time test_destructor is called, another class will be added to > > _abc_cache. > > Ahh, thanks, I missed that cache. After browsing the code a bit it seems to > me the _abc_cache and _abc_negative_cache need to be turned into weak sets. > (Since a class can appear in any number of caches, positive and negative, we > can't just check refcounts on the items in the caches.) Do we have a weak > set implementation anywhere yet? I think I have one lying around I wrote for > someone else a while back, it could be added to the weakref module. It should be made into weak refs indeed. Post 3.0a1 I suspect. I'll add an issue so we won't forget. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From walter at livinglogic.de Thu Aug 30 17:48:52 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Thu, 30 Aug 2007 17:48:52 +0200 Subject: [Python-3000] current status In-Reply-To: References: Message-ID: <46D6E6E4.4030902@livinglogic.de> Guido van Rossum wrote: > [...[ > >> * On ia64, test_tarfile.PAXUnicodeTest.test_utf7_filename generates >> this exception: >> Objects/exceptions.c:1392: PyUnicodeDecodeError_Create: Assertion >> `start < 2147483647' failed. > > That's probably an uninitialized variable 'startinpos' in one of the > functions that calls unicode_decode_call_errorhandler(). It's the 7th > parameter. The header of that function is 150 characters wide. Yuck! Seems that a linefeed has gone missing there. > Someone will need to reproduce the bug and then point gdb at it and it > should be obvious. :-) I've added an initialization to the "illegal special character" branch of the code. However test_tarfile.py still segfaults for me in the py3k branch. The top of the stacktrace is: #0 0xb7eec07f in memcpy () from /lib/tls/libc.so.6 #1 0xb7a905bc in s_pack_internal (soself=0xb77dc97c, args=0xb77cddfc, offset=0, buf=0x8433c4c "") at /var/home/walter/checkouts/Python/py3k/Modules/_struct.c:1667 #2 0xb7a90a32 in s_pack (self=0xb77dc97c, args=0xb77cddfc) at /var/home/walter/checkouts/Python/py3k/Modules/_struct.c:1741 #3 0x08085f96 in PyCFunction_Call (func=0xb7a72a0c, arg=0xb77cddfc, kw=0x0) at Objects/methodobject.c:73 Servus, Walter From walter at livinglogic.de Thu Aug 30 17:53:42 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Thu, 30 Aug 2007 17:53:42 +0200 Subject: [Python-3000] current status In-Reply-To: <46D6E6E4.4030902@livinglogic.de> References: <46D6E6E4.4030902@livinglogic.de> Message-ID: <46D6E806.1080207@livinglogic.de> Walter D?rwald wrote: > [...] > However test_tarfile.py still segfaults for me in the py3k branch. The > top of the stacktrace is: > > #0 0xb7eec07f in memcpy () from /lib/tls/libc.so.6 > #1 0xb7a905bc in s_pack_internal (soself=0xb77dc97c, args=0xb77cddfc, > offset=0, buf=0x8433c4c "") > at /var/home/walter/checkouts/Python/py3k/Modules/_struct.c:1667 > #2 0xb7a90a32 in s_pack (self=0xb77dc97c, args=0xb77cddfc) at > /var/home/walter/checkouts/Python/py3k/Modules/_struct.c:1741 > #3 0x08085f96 in PyCFunction_Call (func=0xb7a72a0c, arg=0xb77cddfc, > kw=0x0) at Objects/methodobject.c:73 I forgot to mention that it fails in test_100_char_name (__main__.WriteTest) ... Servus, Walter From theller at ctypes.org Thu Aug 30 18:10:16 2007 From: theller at ctypes.org (Thomas Heller) Date: Thu, 30 Aug 2007 18:10:16 +0200 Subject: [Python-3000] current status In-Reply-To: References: Message-ID: Neal Norwitz schrieb: > * On Win64, there's this assert: > SystemError: Objects\longobject.c:412: bad argument to internal function > I don't see how it's getting triggered based on the traceback though Python/getargs.c, line 672: ival = PyInt_AsSsize_t(arg); calls Objects/longobject.c PyLong_AsSsize_t(), but that accepts only PyLong_Objects (in contrast to PyLong_AsLong which calls nb_int). Thomas From guido at python.org Thu Aug 30 19:02:26 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Aug 2007 10:02:26 -0700 Subject: [Python-3000] Release 3.0a1 Countdown Message-ID: Tomorrow (Friday August 31) I want to do the 3.0a1 release. I want to do it early in the day (US west coast time). It's going to be a lightweight release -- I plan to put out a source tarball only, if Martin wants to contribute an MSI installer that would be great. I plan to lock the tree Friday morning early (perhaps as early as 6am PDT, i.e. 15:00 in Germany). So get your stuff in and working (on as many platforms as possible) by then! I'll spend most of today writing up a what's new document and release notes. I'm still hoping that the following unit tests that currently fail everywhere can be fixed: test_mailbox test_old_mailbox test_smtplib (test_sundry now passes BTW, I don't see any tests being skipped unexpectedly.) I've set up a task list on the spreadsheet we used for the sprint. Here's the invitation: http://spreadsheets.google.com/ccc?key=pBLWM8elhFAmKbrhhh0ApQA&inv=guido at python.org&t=3328567089265242420&guest Use the tabs at the bottom to go to the "Countdown" sheet and you can watch me procrastinate in real time. :-) Don't hesitate to add items! Some other things I expect to land today: Thomas Wouters's patch for the ref leak in the ABC cache (issue 1061) Thomas Wouters's noslice feature issue 1753395 (Georg) possibly more work on PEP 3109 and 3134 (Collin) Also I'd appreciate it if people would check the buildbots and the bug tracker for possible issues. Thanks everyone for the large number of improvements that came in this week! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Thu Aug 30 19:49:16 2007 From: barry at python.org (Barry Warsaw) Date: Thu, 30 Aug 2007 13:49:16 -0400 Subject: [Python-3000] current status In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 30, 2007, at 3:00 AM, Neal Norwitz wrote: > Some of the failures in test_mailbox and test_old_mailbox are the > same, but I think test_mailbox might have more problems. It does, and I won't be spending any more time before a1 looking at it. The problem is that MH.__setitem__() opens its file in binary mode, then passes a string to the base class's _dump_message() method. It then tries to write a string to a binary file and you get a TypeError. You can't just encode strings to bytes in _dump_message () though because sometimes the file you're passed is a text file and so you trade one failure for another. I don't think it's quite right to do the conversion in MH.__setitem__ () either though because _dump_message() isn't prepared to handle bytes. Maybe it should be, but the basic problem is that you can get passed either a text or binary file object and you need to be able to write either strings or bytes to either. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtcDHXEjvBPtnXfVAQJPvgP+L2cGjpioinZE/PQ/zLdQu0CebCIygpBj RYOvSF/Mw1xiK4sOfHEdfG8LaYAgfL2mAP9smn+s5osodbPXP4kYPHTbMgzSN7oT BhMvvMeqeosz6/sLb0hdEKdk+54zo3yqh62DeLBuYSLMhaLVoVShFdlTvOEs8YPQ qZGQsiu57Wo= =+sdc -----END PGP SIGNATURE----- From ntoronto at cs.byu.edu Thu Aug 30 21:30:33 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Thu, 30 Aug 2007 13:30:33 -0600 Subject: [Python-3000] Python-love (was Release 3.0a1 Countdown) In-Reply-To: References: Message-ID: <46D71AD9.9050905@cs.byu.edu> Guido van Rossum wrote: > Thanks everyone for the large number of improvements that came in this week! Can I echo this in general? I just lurk here, being fascinated by the distributed language development process, so I don't have much license to post and steal precious developer attention. But I'd like to thank everyone whose blood, sweat, and tears - volunteered - have produced the first programming language and set of libraries that I really fell in love with. What I see here and in the PEPs has got me seriously stoked for 3.0. I feel I can represent thousands of developers when I say: Here's to you, guys. Awesome work. Thank you so much. Developing in Python is the smoothest and most joyous development I've ever done. Sorry if this is too off-topic. If there were a "python-love" list I'd post it there. :) Neil From martin at v.loewis.de Thu Aug 30 21:54:34 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Aug 2007 21:54:34 +0200 Subject: [Python-3000] Release 3.0a1 Countdown In-Reply-To: References: Message-ID: <46D7207A.8080108@v.loewis.de> > Tomorrow (Friday August 31) I want to do the 3.0a1 release. I want to > do it early in the day (US west coast time). It's going to be a > lightweight release -- I plan to put out a source tarball only, if > Martin wants to contribute an MSI installer that would be great. I see what I can do. An x86 installer would certainly be possible; for AMD64, I don't have a test machine right now (but I could produce one "blindly"). Regards, Martin From db3l.net at gmail.com Thu Aug 30 22:15:18 2007 From: db3l.net at gmail.com (David Bolen) Date: Thu, 30 Aug 2007 16:15:18 -0400 Subject: [Python-3000] buildbots References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> <46D6741A.9040801@v.loewis.de> <46D6B904.1020505@v.loewis.de> Message-ID: "Martin v. L?wis" writes: > However, we can't use pywin32 on the buildbot slaves - it's not > installed. Agreed, thus my original suggestion of a standalone wrapper executable (or using ctypes). But for end users of Python on Windows, this is a direct Windows-specific API wrapping, for which using the pywin32 wrapper seems appropriate, or if needed, ctypes use is trivial (for a call taking and returning a single ulong). >> For this particular case of wanting to use it when developing Python >> itself, it actually feels a bit more appropriate to me to make the >> call external to the Python executable under test since it's really a >> behavior being imposed by the test environment. If a mechanism was >> implemented to have Python issue the call itself, I'd probably limit >> it to this specific use case. > > That covers the SetErrorMode case, but not the CRT assertions - their > messagebox settings don't get inherited through CreateProcess. Agreed - for Python in debug mode, the CRT stuff needs specific support (although Thomas' example using ctypes, albeit somewhat ugly, did manage to still keep out in the test case runner). I can see exporting access to them in debug builds could be helpful as they aren't otherwise wrapped. > Not sure why you want to limit it - I think it's a useful feature on > its own to allow Python to run without somebody clicking buttons. > (it would be a useful feature for windows as well to stop producing > these messages, and report them through some other mechanism). I just think that if someone needs the functionality they'll have an easy time with existing methods. And I'm not sure it's something to encourage average use of, if only because Python (and it's child, potentially unrelated, processes) will behave differently than other applications. But it's not like I'm vehemently opposed or anything. At this stage I'd think having anything that prevented the popups for the buildbots would be beneficial. Putting it up in the test code (such as regrtest), seems less intrusive and complicated, even if it involves slightly ugly code, than deciding how to incorporate it into the Python core, which could always be done subsequently. -- David From martin at v.loewis.de Thu Aug 30 22:40:47 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 30 Aug 2007 22:40:47 +0200 Subject: [Python-3000] buildbots In-Reply-To: References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> <46D6741A.9040801@v.loewis.de> <46D6B904.1020505@v.loewis.de> Message-ID: <46D72B4F.6040908@v.loewis.de> > Agreed, thus my original suggestion of a standalone wrapper executable > (or using ctypes). That doesn't work well, either - how do we get this wrapper onto the build slaves? It would work if such wrapper shipped with the operating system. > I just think that if someone needs the functionality they'll have an > easy time with existing methods. I don't think it's that easy. It took three people two days to find out how to do it correctly (and I'm still not convinced the code I committed covers all cases). > And I'm not sure it's something to > encourage average use of, if only because Python (and it's child, > potentially unrelated, processes) will behave differently than other > applications. I completely disagree. It's a gross annoyance of Windows that it performs user interaction in a library call. I suspect there are many cases where people really couldn't tolerate such user interaction, and where they appreciate builtin support for a window-less operation. > But it's not like I'm vehemently opposed or anything. At this stage > I'd think having anything that prevented the popups for the buildbots > would be beneficial. Ok, I committed PYTHONNOERRORWINDOW. > Putting it up in the test code (such as > regrtest), seems less intrusive and complicated, It might be less intrusive (although I don't see why this is a desirable property); it is certainly more complicated than calling C APIs using C code. Regards, Martin From eric+python-dev at trueblade.com Fri Aug 31 01:05:27 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 30 Aug 2007 19:05:27 -0400 Subject: [Python-3000] string.Formatter class In-Reply-To: <46D6A60D.2070503@ronadam.com> References: <46D40B88.4080202@trueblade.com> <46D6A60D.2070503@ronadam.com> Message-ID: <46D74D37.5040007@trueblade.com> Ron Adam wrote: >> get_field(field_name, args, kwargs, used_args) >> Given a field_name as returned by parse, convert it to an object to be >> formatted. The default version takes strings of the form defined in >> the PEP, such as "0[name]" or "label.title". It records which args >> have been used in used_args. args and kwargs are as passed in to >> vformat. > > Rather than pass the used_args set out and have it modified in a > different methods, I think it would be better to pass the arg_used back > along with the object. That keeps all the code that is involved in > checking used args is in one method. The arg_used value may be useful > in other ways as well. > > obj, arg_used = self.get_field(field_name, args, kwargs) > used_args.add(arg_used) I'm really not wild about either solution, but I suppose yours is less objectionable than mine. I'll check this change in tonight (before the deadline). I think you'd have to say: if args_used is not None: used_args.add(args_used) as it's possible that the field was not derived from the args or kwargs. > I wonder if this is splitting things up a bit too finely? If the format > function takes a conversion argument, it makes it possible to do > everything by overriding format_field. > > def format_field(self, value, format_spec, conversion): > return format(value, format_spec, conversion) > > > Adding this to Talins suggestion, the signature of format could be... > > format(value, format_spec="", conversion="") But this conflates conversions with formatting, which the PEP takes pains not to do. I'd rather leave them separate, but I'll let Talin make the call. Eric. From amauryfa at gmail.com Fri Aug 31 02:54:32 2007 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 31 Aug 2007 02:54:32 +0200 Subject: [Python-3000] New PyException_HEAD fails to compile on Windows Message-ID: Hello, the windows version of py3k suddenly stopped compiling because of an extra semicolon in PyException_HEAD: PyObject_HEAD definition already ends with a semicolon. gcc seems more tolerant though... -- Amaury Forgeot d'Arc From collinw at gmail.com Fri Aug 31 02:56:47 2007 From: collinw at gmail.com (Collin Winter) Date: Thu, 30 Aug 2007 17:56:47 -0700 Subject: [Python-3000] New PyException_HEAD fails to compile on Windows In-Reply-To: References: Message-ID: <43aa6ff70708301756i7f0b3fcg16fdb25f5089e51a@mail.gmail.com> On 8/30/07, Amaury Forgeot d'Arc wrote: > Hello, > > the windows version of py3k suddenly stopped compiling because of an > extra semicolon in PyException_HEAD: PyObject_HEAD definition already > ends with a semicolon. > > gcc seems more tolerant though... Sorry, fix on the way... Collin Winter From rrr at ronadam.com Fri Aug 31 03:11:47 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 30 Aug 2007 20:11:47 -0500 Subject: [Python-3000] string.Formatter class In-Reply-To: <46D74D37.5040007@trueblade.com> References: <46D40B88.4080202@trueblade.com> <46D6A60D.2070503@ronadam.com> <46D74D37.5040007@trueblade.com> Message-ID: <46D76AD3.1040301@ronadam.com> Eric Smith wrote: > Ron Adam wrote: >>> get_field(field_name, args, kwargs, used_args) >>> Given a field_name as returned by parse, convert it to an object to >>> be formatted. The default version takes strings of the form defined >>> in the PEP, such as "0[name]" or "label.title". It records which >>> args have been used in used_args. args and kwargs are as passed in >>> to vformat. >> >> Rather than pass the used_args set out and have it modified in a >> different methods, I think it would be better to pass the arg_used >> back along with the object. That keeps all the code that is involved >> in checking used args is in one method. The arg_used value may be >> useful in other ways as well. >> >> obj, arg_used = self.get_field(field_name, args, kwargs) >> used_args.add(arg_used) > > I'm really not wild about either solution, but I suppose yours is less > objectionable than mine. I'll check this change in tonight (before the > deadline). Cool. I looked at other possible ways, but this seemed to be the easiest to live with. The alternative is to use an attributes to pass and hold values, but sense the Formatter class isn't a data class, that doesn't seem appropriate. > I think you'd have to say: > > if args_used is not None: > used_args.add(args_used) > > as it's possible that the field was not derived from the args or kwargs. How? From what I can see an exception would be raised in the get_value method. When would I ever want to get a None for args_used? >> I wonder if this is splitting things up a bit too finely? If the >> format function takes a conversion argument, it makes it possible to >> do everything by overriding format_field. >> >> def format_field(self, value, format_spec, conversion): >> return format(value, format_spec, conversion) >> >> >> Adding this to Talins suggestion, the signature of format could be... >> >> format(value, format_spec="", conversion="") > > But this conflates conversions with formatting, which the PEP takes > pains not to do. I'd rather leave them separate, but I'll let Talin > make the call. Yes the PEP is pretty specific on the format() function signature. Cheers, Ron From eric+python-dev at trueblade.com Fri Aug 31 03:17:05 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 30 Aug 2007 21:17:05 -0400 Subject: [Python-3000] format_spec parameter to format() builtin defaults to "" [was: Re: string.Formatter class] In-Reply-To: <46D66BC2.3060708@acm.org> References: <46D40B88.4080202@trueblade.com> <46D4AD40.9070006@trueblade.com> <46D4E8F6.30508@trueblade.com> <46D66BC2.3060708@acm.org> Message-ID: <46D76C11.7050208@trueblade.com> Talin wrote: > Also I wanted to ask: How about making the built-in 'format' function > have a default value of "" for the second argument? So I can just say: > > format(x) > > as a synonym for: > > str(x) I implemented this in r57797. Eric. From eric+python-dev at trueblade.com Fri Aug 31 03:26:45 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 30 Aug 2007 21:26:45 -0400 Subject: [Python-3000] string.Formatter class In-Reply-To: <46D76AD3.1040301@ronadam.com> References: <46D40B88.4080202@trueblade.com> <46D6A60D.2070503@ronadam.com> <46D74D37.5040007@trueblade.com> <46D76AD3.1040301@ronadam.com> Message-ID: <46D76E55.6000709@trueblade.com> Ron Adam wrote: > > > Eric Smith wrote: >> Ron Adam wrote: >>>> get_field(field_name, args, kwargs, used_args) >>>> Given a field_name as returned by parse, convert it to an object to >>>> be formatted. The default version takes strings of the form defined >>>> in the PEP, such as "0[name]" or "label.title". It records which >>>> args have been used in used_args. args and kwargs are as passed in >>>> to vformat. >>> >>> Rather than pass the used_args set out and have it modified in a >>> different methods, I think it would be better to pass the arg_used >>> back along with the object. That keeps all the code that is involved >>> in checking used args is in one method. The arg_used value may be >>> useful in other ways as well. >>> >>> obj, arg_used = self.get_field(field_name, args, kwargs) >>> used_args.add(arg_used) >> >> I'm really not wild about either solution, but I suppose yours is less >> objectionable than mine. I'll check this change in tonight (before >> the deadline). > > Cool. I looked at other possible ways, but this seemed to be the > easiest to live with. The alternative is to use an attributes to pass > and hold values, but sense the Formatter class isn't a data class, that > doesn't seem appropriate. I agree that attributes seem like the wrong way to go about it. It also makes recursively calling the formatter impossible without saving state. I'm still planning on making this change tonight. >> I think you'd have to say: >> >> if args_used is not None: >> used_args.add(args_used) >> >> as it's possible that the field was not derived from the args or kwargs. > > How? From what I can see an exception would be raised in the get_value > method. > > When would I ever want to get a None for args_used? I meant arg_used. You're right. I was confusing get_field with get_value. (Surely we can pick better names!) Eric. From talin at acm.org Fri Aug 31 03:31:51 2007 From: talin at acm.org (Talin) Date: Thu, 30 Aug 2007 18:31:51 -0700 Subject: [Python-3000] string.Formatter class In-Reply-To: <46D74D37.5040007@trueblade.com> References: <46D40B88.4080202@trueblade.com> <46D6A60D.2070503@ronadam.com> <46D74D37.5040007@trueblade.com> Message-ID: <46D76F87.6050006@acm.org> Eric Smith wrote: > Ron Adam wrote: >> I wonder if this is splitting things up a bit too finely? If the format >> function takes a conversion argument, it makes it possible to do >> everything by overriding format_field. >> >> def format_field(self, value, format_spec, conversion): >> return format(value, format_spec, conversion) >> >> >> Adding this to Talins suggestion, the signature of format could be... >> >> format(value, format_spec="", conversion="") > > But this conflates conversions with formatting, which the PEP takes > pains not to do. I'd rather leave them separate, but I'll let Talin > make the call. Correct. There's no reason for 'format' to handle conversions, when its trivial for a caller to do it themselves: format(repr(value), format_spec) -- Talin From talin at acm.org Fri Aug 31 03:36:06 2007 From: talin at acm.org (Talin) Date: Thu, 30 Aug 2007 18:36:06 -0700 Subject: [Python-3000] Need Decimal.__format__ Message-ID: <46D77086.3030207@acm.org> I'm looking for a volunteer who understands the Decimal class well enough to write a __format__ method for it. It should handle all of the same format specifiers as float.__format__, but it should not use the same implementation as float (so as to preserve accuracy.) Also, I'm interested in suggestions as to any other standard types that ought to have a __format__ method, other than the obvious Date/Time classes. What kinds of things do people usually want to print? -- Talin From eric+python-dev at trueblade.com Fri Aug 31 04:03:03 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 30 Aug 2007 22:03:03 -0400 Subject: [Python-3000] Need Decimal.__format__ In-Reply-To: <46D77086.3030207@acm.org> References: <46D77086.3030207@acm.org> Message-ID: <46D776D7.7050907@trueblade.com> Talin wrote: > I'm looking for a volunteer who understands the Decimal class well > enough to write a __format__ method for it. It should handle all of the > same format specifiers as float.__format__, but it should not use the > same implementation as float (so as to preserve accuracy.) If no one else steps up, I can look at it. But I doubt I can finish it by a1. > Also, I'm interested in suggestions as to any other standard types that > ought to have a __format__ method, other than the obvious Date/Time > classes. What kinds of things do people usually want to print? I can do datetime.datetime and datetime.date, if no one else already has. I think they're just aliases for strftime. Is there any problem with re-using the C implemenation exactly? static PyMethodDef date_methods[] = { ... {"strftime", (PyCFunction)date_strftime, METH_VARARGS | METH_KEYWORDS, PyDoc_STR("format -> strftime() style string.")}, {"__format__", (PyCFunction)date_strftime, METH_VARARGS | METH_KEYWORDS, PyDoc_STR("Alias for strftime.")}, ... I just want to make sure there's no requirement that the function pointer be unique within the array, or anything like that. From guido at python.org Fri Aug 31 04:07:46 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Aug 2007 19:07:46 -0700 Subject: [Python-3000] Need Decimal.__format__ In-Reply-To: <46D776D7.7050907@trueblade.com> References: <46D77086.3030207@acm.org> <46D776D7.7050907@trueblade.com> Message-ID: On 8/30/07, Eric Smith wrote: > Talin wrote: > > I'm looking for a volunteer who understands the Decimal class well > > enough to write a __format__ method for it. It should handle all of the > > same format specifiers as float.__format__, but it should not use the > > same implementation as float (so as to preserve accuracy.) > > If no one else steps up, I can look at it. But I doubt I can finish it > by a1. No, that's not Talin's point: we're not expecting this in a1, but a2 would be good. > > Also, I'm interested in suggestions as to any other standard types that > > ought to have a __format__ method, other than the obvious Date/Time > > classes. What kinds of things do people usually want to print? > > I can do datetime.datetime and datetime.date, if no one else already > has. I think they're just aliases for strftime. Is there any problem > with re-using the C implemenation exactly? > > static PyMethodDef date_methods[] = { > ... > {"strftime", (PyCFunction)date_strftime, METH_VARARGS | METH_KEYWORDS, > PyDoc_STR("format -> strftime() style string.")}, > {"__format__", (PyCFunction)date_strftime, METH_VARARGS | METH_KEYWORDS, > PyDoc_STR("Alias for strftime.")}, > > ... > > I just want to make sure there's no requirement that the function > pointer be unique within the array, or anything like that. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rrr at ronadam.com Fri Aug 31 04:21:18 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 30 Aug 2007 21:21:18 -0500 Subject: [Python-3000] string.Formatter class In-Reply-To: <46D76E55.6000709@trueblade.com> References: <46D40B88.4080202@trueblade.com> <46D6A60D.2070503@ronadam.com> <46D74D37.5040007@trueblade.com> <46D76AD3.1040301@ronadam.com> <46D76E55.6000709@trueblade.com> Message-ID: <46D77B1E.6030206@ronadam.com> Eric Smith wrote: > Ron Adam wrote: >> >> >> Eric Smith wrote: >>> Ron Adam wrote: >>>>> get_field(field_name, args, kwargs, used_args) >>>>> Given a field_name as returned by parse, convert it to an object to >>>>> be formatted. The default version takes strings of the form >>>>> defined in the PEP, such as "0[name]" or "label.title". It records >>>>> which args have been used in used_args. args and kwargs are as >>>>> passed in to vformat. >>>> >>>> Rather than pass the used_args set out and have it modified in a >>>> different methods, I think it would be better to pass the arg_used >>>> back along with the object. That keeps all the code that is >>>> involved in checking used args is in one method. The arg_used value >>>> may be useful in other ways as well. >>>> >>>> obj, arg_used = self.get_field(field_name, args, kwargs) >>>> used_args.add(arg_used) >>> >>> I'm really not wild about either solution, but I suppose yours is >>> less objectionable than mine. I'll check this change in tonight >>> (before the deadline). >> >> Cool. I looked at other possible ways, but this seemed to be the >> easiest to live with. The alternative is to use an attributes to pass >> and hold values, but sense the Formatter class isn't a data class, >> that doesn't seem appropriate. > > I agree that attributes seem like the wrong way to go about it. It also > makes recursively calling the formatter impossible without saving state. > > I'm still planning on making this change tonight. > >>> I think you'd have to say: >>> >>> if args_used is not None: >>> used_args.add(args_used) >>> >>> as it's possible that the field was not derived from the args or kwargs. >> >> How? From what I can see an exception would be raised in the >> get_value method. >> >> When would I ever want to get a None for args_used? > > I meant arg_used. I understood. > You're right. I was confusing get_field with get_value. (Surely we can > pick better names!) Hmm... how about this? if field_name is not None: name, sub_names = field_name._formatter_field_name_split() obj = self.get_value(name, args, kwargs) obj = self.get_sub_value(obj, sub_names) obj = self.convert_field(obj, conversion) used_args.add(name) # format the object and append to the result result.append(self.format_field(obj, format_spec)) This doesn't require passing an arg_used value as it's available in vformat. Get_sub_value replaces get_field. def get_sub_value(self, sub_value_name, args, kwargs): # Get sub value of an object. # (indices or attributes) for is_attr, i in sub_value_name: if is_attr: obj = getattr(obj, i) else: obj = obj[i] return obj While it moves more into vformat, I think it's clearer what everything does. Cheers, Ron From guido at python.org Fri Aug 31 06:52:08 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Aug 2007 21:52:08 -0700 Subject: [Python-3000] Release Countdown Message-ID: I'm pretty happy where we stand now -- I just squashed the last two failing tests (test_mailbox and test_oldmailbox). It is 9:30 pm here and I'm tired, so I'm going to try and get a good night's sleep and do the release as early as I can tomorrow. Remember, I'll freeze the branch (not a real lock, just a request to stop submitting) tomorrow (Friday) around 6 am my time -- that's 9 am US east coast, 15:00 in most of western Europe. I'd appreciate it if there were no broken unit tests then. :-) If there are urgent things that I need to look at, put them in the bug tracker, set priority to urgent, version to Python 3.0, and assign them to me. Please exercise restraint in making last-minute sweeping changes (except to the docs). Please do review README, RELNOTES (new!), Misc/NEWS, and especially Doc/whatsnew/3.0.rst, (and the rest of the docs) and add what you feel ought to be added. You can also preview the web page I plan to use for the release -- it's not linked from anywhere yet, but here it is anyway: http://www.python.org/download/releases/3.0/. Those of you lucky enough to be able to edit it, please go ahead; others, add suggestions to the bug tracker as above. Note that this page ends with a complete copy of the release notes; I expect to be adding more release notes after the release has been published, once we figure out what else isn't working. I expect 3.0a2 to follow within 2-4 weeks; the alpha release process is relatively light-weight now that I've figured out most of the details. PS. PEP 101 needs a serious rewrite. It still talks about CVS. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From theller at ctypes.org Fri Aug 31 08:07:15 2007 From: theller at ctypes.org (Thomas Heller) Date: Fri, 31 Aug 2007 08:07:15 +0200 Subject: [Python-3000] buildbots In-Reply-To: <46D72B4F.6040908@v.loewis.de> References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> <46D6741A.9040801@v.loewis.de> <46D6B904.1020505@v.loewis.de> <46D72B4F.6040908@v.loewis.de> Message-ID: Martin v. L?wis schrieb: >> Agreed, thus my original suggestion of a standalone wrapper executable >> (or using ctypes). > > That doesn't work well, either - how do we get this wrapper onto the > build slaves? It would work if such wrapper shipped with the > operating system. > >> I just think that if someone needs the functionality they'll have an >> easy time with existing methods. > > I don't think it's that easy. It took three people two days to find out > how to do it correctly (and I'm still not convinced the code I committed > covers all cases). > >> And I'm not sure it's something to >> encourage average use of, if only because Python (and it's child, >> potentially unrelated, processes) will behave differently than other >> applications. > > I completely disagree. It's a gross annoyance of Windows that it > performs user interaction in a library call. I suspect there are > many cases where people really couldn't tolerate such user > interaction, and where they appreciate builtin support for > a window-less operation. True, but we're talking about automatic testing on the buildbots in this case. >> But it's not like I'm vehemently opposed or anything. At this stage >> I'd think having anything that prevented the popups for the buildbots >> would be beneficial. > > Ok, I committed PYTHONNOERRORWINDOW. It works, but does not have the desired effect (on the buildbots, again). PCBuild\rt.bat does run 'python_d -E ...', which ignores the environment variables. The debug assertion in Lib\test\test_os.py is fixed now, but the test hangs on Windows in the next debug assertion in _bsddb_d.pyd (IIRC). Any suggestions? >> Putting it up in the test code (such as >> regrtest), seems less intrusive and complicated, > > It might be less intrusive (although I don't see why this is a > desirable property); it is certainly more complicated than > calling C APIs using C code. > > Regards, > Martin Thomas From talin at acm.org Fri Aug 31 08:53:22 2007 From: talin at acm.org (Talin) Date: Thu, 30 Aug 2007 23:53:22 -0700 Subject: [Python-3000] PATCH: library reference docs for PEP 3101 Message-ID: <46D7BAE2.3050709@acm.org> I just posted on the tracker a patch which adds extensive documentation for PEP 3101 to the Python Library Reference. This includes: str.format() format() __format__ Formatter format string syntax format specification mini-language http://bugs.python.org/issue1068 (Eric, my description of the Formatter overloaded methods may not match your latest revisions. Feel free to point out any errors.) Oh, and thanks to Georg for making it possible for me to actually write library documentation :) -- Talin From g.brandl at gmx.net Fri Aug 31 11:24:37 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 31 Aug 2007 11:24:37 +0200 Subject: [Python-3000] PATCH: library reference docs for PEP 3101 In-Reply-To: <46D7BAE2.3050709@acm.org> References: <46D7BAE2.3050709@acm.org> Message-ID: Talin schrieb: > I just posted on the tracker a patch which adds extensive documentation > for PEP 3101 to the Python Library Reference. This includes: > > str.format() > format() > __format__ > Formatter > format string syntax > format specification mini-language > > http://bugs.python.org/issue1068 > > (Eric, my description of the Formatter overloaded methods may not match > your latest revisions. Feel free to point out any errors.) > > Oh, and thanks to Georg for making it possible for me to actually write > library documentation :) I hope it was a pleasant experience :) I've committed the patch together with more string fixes. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From eric+python-dev at trueblade.com Fri Aug 31 11:35:22 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Fri, 31 Aug 2007 05:35:22 -0400 Subject: [Python-3000] PATCH: library reference docs for PEP 3101 In-Reply-To: <46D7BAE2.3050709@acm.org> References: <46D7BAE2.3050709@acm.org> Message-ID: <46D7E0DA.6000503@trueblade.com> Talin wrote: > I just posted on the tracker a patch which adds extensive documentation > for PEP 3101 to the Python Library Reference. This includes: > > str.format() > format() > __format__ > Formatter > format string syntax > format specification mini-language > > http://bugs.python.org/issue1068 > > (Eric, my description of the Formatter overloaded methods may not match > your latest revisions. Feel free to point out any errors.) This is awesome! Thanks. The only 2 differences are: - in the implementation for float formatting, a type of '' is the same as 'g'. I think the PEP originally had the wording it does so that float(1.0, '') would match str(1.0). This case now matches, because of the change that says zero length format_spec's are the same as str(). However, if there's anything else in the format_spec (still with no type), it doesn't match what str() would do. >>> str(1.0) '1.0' >>> format(1.0) '1.0' >>> format(1.0, "-") '1' >>> format(1.0, "g") '1' Actually, str() doesn't add a decimal for exponential notation: >>> str(1e100) '1e+100' I'd like to see the docs just say that an empty type is the same as 'g', but I'm not sure of the use case for what the documentation currently says. - I changed Formatter.get_field to something like: .. method:: get_field(field_name, args, kwargs) Given *field_name* as returned by :meth:`parse` (see above), convert it to an object to be formatted. Returns a tuple (obj, used_key). The default version takes strings of the form defined in :pep:`3101`, such as "0[name]" or "label.title". *args* and *kwargs* are as passed in to :meth:`vformat`. The return value *used_key* has the same meaning as the *key* parameter to :meth:`get_value`. From g.brandl at gmx.net Fri Aug 31 11:41:38 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 31 Aug 2007 11:41:38 +0200 Subject: [Python-3000] str.decode; buffers Message-ID: Two short issues: * Shouldn't str.decode() be removed? Every call to it says "TypeError: decoding str is not supported". * Using e.g. b"abc".find("a") gives "SystemError: can't use str as char buffer". This should be a TypeError IMO. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Fri Aug 31 12:20:13 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 31 Aug 2007 12:20:13 +0200 Subject: [Python-3000] PATCH: library reference docs for PEP 3101 In-Reply-To: <46D7E0DA.6000503@trueblade.com> References: <46D7BAE2.3050709@acm.org> <46D7E0DA.6000503@trueblade.com> Message-ID: Eric Smith schrieb: > Talin wrote: >> I just posted on the tracker a patch which adds extensive documentation >> for PEP 3101 to the Python Library Reference. This includes: >> >> str.format() >> format() >> __format__ >> Formatter >> format string syntax >> format specification mini-language >> >> http://bugs.python.org/issue1068 >> >> (Eric, my description of the Formatter overloaded methods may not match >> your latest revisions. Feel free to point out any errors.) > > This is awesome! Thanks. > > The only 2 differences are: > > - in the implementation for float formatting, a type of '' is the same > as 'g'. I think the PEP originally had the wording it does so that > float(1.0, '') would match str(1.0). This case now matches, because of > the change that says zero length format_spec's are the same as str(). > However, if there's anything else in the format_spec (still with no > type), it doesn't match what str() would do. > >>> str(1.0) > '1.0' > >>> format(1.0) > '1.0' > >>> format(1.0, "-") > '1' > >>> format(1.0, "g") > '1' > Actually, str() doesn't add a decimal for exponential notation: > >>> str(1e100) > '1e+100' > > I'd like to see the docs just say that an empty type is the same as 'g', > but I'm not sure of the use case for what the documentation currently says. Can you suggest a patch? > - I changed Formatter.get_field to something like: > > .. method:: get_field(field_name, args, kwargs) > > Given *field_name* as returned by :meth:`parse` (see above), > convert it to an object to be formatted. Returns a tuple (obj, > used_key). The default version takes strings of the form > defined in :pep:`3101`, such as "0[name]" or "label.title". > *args* and *kwargs* are as passed in to :meth:`vformat`. The > return value *used_key* has the same meaning as the *key* > parameter to :meth:`get_value`. I changed this. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From eric+python-dev at trueblade.com Fri Aug 31 13:22:29 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Fri, 31 Aug 2007 07:22:29 -0400 Subject: [Python-3000] PATCH: library reference docs for PEP 3101 In-Reply-To: References: <46D7BAE2.3050709@acm.org> <46D7E0DA.6000503@trueblade.com> Message-ID: <46D7F9F5.8090209@trueblade.com> Georg Brandl wrote: > Eric Smith schrieb: >> Talin wrote: >>> I just posted on the tracker a patch which adds extensive documentation >>> for PEP 3101 to the Python Library Reference. This includes: >>> >>> str.format() >>> format() >>> __format__ >>> Formatter >>> format string syntax >>> format specification mini-language >>> >>> http://bugs.python.org/issue1068 >>> >>> (Eric, my description of the Formatter overloaded methods may not match >>> your latest revisions. Feel free to point out any errors.) >> This is awesome! Thanks. >> >> The only 2 differences are: >> >> - in the implementation for float formatting, a type of '' is the same >> as 'g'. I think the PEP originally had the wording it does so that >> float(1.0, '') would match str(1.0). This case now matches, because of >> the change that says zero length format_spec's are the same as str(). >> However, if there's anything else in the format_spec (still with no >> type), it doesn't match what str() would do. >> >>> str(1.0) >> '1.0' >> >>> format(1.0) >> '1.0' >> >>> format(1.0, "-") >> '1' >> >>> format(1.0, "g") >> '1' >> Actually, str() doesn't add a decimal for exponential notation: >> >>> str(1e100) >> '1e+100' >> >> I'd like to see the docs just say that an empty type is the same as 'g', >> but I'm not sure of the use case for what the documentation currently says. > > Can you suggest a patch? If we want the docs to match the code, instead of: None similar to ``'g'``, except that it prints at least one digit after the decimal point. it would be: None the same as 'g'. But before you do that, I want see what Talin says. I'm not sure if instead we shouldn't modify the code to match the docs. (Sorry about not doing a real diff. I'm short on time, and haven't checked out the new docs yet.) Eric. From barry at python.org Fri Aug 31 13:21:19 2007 From: barry at python.org (Barry Warsaw) Date: Fri, 31 Aug 2007 07:21:19 -0400 Subject: [Python-3000] Release Countdown In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 31, 2007, at 12:52 AM, Guido van Rossum wrote: > I'm pretty happy where we stand now -- I just squashed the last two > failing tests (test_mailbox and test_oldmailbox). It is 9:30 pm here > and I'm tired, so I'm going to try and get a good night's sleep and do > the release as early as I can tomorrow. G'morning Guido! I've re-enabled test_email because it now passes completely, although I had to cheat a bit on the last couple of failures. I'll address those XXXs after a1. For me on OS X, I'm still getting a failure in test_plistlib and an unexpected skip in test_ssl. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtf5r3EjvBPtnXfVAQKn1gP9HsX7xYt3O7XGPV/TAXv1W25Coh7aMLwK IOS2TrUDbhgehuaGEcS3u2Q4HBGsDwhURCguLXpSQKch8b4At2qvUXlesOaIixh1 wpwZ5NuiFn43MG/a4MGc9L2VUuRgSyFnl0HsNw9NvklMt+o8p90cCYYaa1McKwaY vhyf00oBTeQ= =7zNb -----END PGP SIGNATURE----- From thomas at python.org Fri Aug 31 13:25:47 2007 From: thomas at python.org (Thomas Wouters) Date: Fri, 31 Aug 2007 13:25:47 +0200 Subject: [Python-3000] Release Countdown In-Reply-To: References: Message-ID: <9e804ac0708310425p42dd5461s13a1e4ddd6c943e9@mail.gmail.com> On 8/31/07, Barry Warsaw wrote: > For me on OS X, I'm still getting a failure in test_plistlib and an > unexpected skip in test_ssl. The skip is intentional; the ssl module is in a state of flux, having the latest changes from the trunk applied, but not adjusted to the new layout of the socket.socket class. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070831/f0b44a91/attachment.htm From eric+python-dev at trueblade.com Fri Aug 31 13:41:49 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Fri, 31 Aug 2007 07:41:49 -0400 Subject: [Python-3000] Release Countdown In-Reply-To: References: Message-ID: <46D7FE7D.5020909@trueblade.com> Barry Warsaw wrote: > For me on OS X, I'm still getting a failure in test_plistlib and an > unexpected skip in test_ssl. If it helps, the test_plistlib errors follow. $ ./python.exe Lib/test/test_plistlib.py -v test_appleformatting (__main__.TestPlistlib) ... ERROR test_appleformattingfromliteral (__main__.TestPlistlib) ... ERROR test_bytes (__main__.TestPlistlib) ... ERROR test_bytesio (__main__.TestPlistlib) ... ERROR test_controlcharacters (__main__.TestPlistlib) ... ok test_create (__main__.TestPlistlib) ... ok test_io (__main__.TestPlistlib) ... ERROR test_nondictroot (__main__.TestPlistlib) ... ok ====================================================================== ERROR: test_appleformatting (__main__.TestPlistlib) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_plistlib.py", line 140, in test_appleformatting pl = plistlib.readPlistFromBytes(TESTDATA) File "/py3k/Lib/plat-mac/plistlib.py", line 102, in readPlistFromBytes return readPlist(BytesIO(data)) File "/py3k/Lib/plat-mac/plistlib.py", line 77, in readPlist rootObject = p.parse(pathOrFile) File "/py3k/Lib/plat-mac/plistlib.py", line 405, in parse parser.ParseFile(fileobj) File "/py3k/Lib/plat-mac/plistlib.py", line 417, in handleEndElement handler() File "/py3k/Lib/plat-mac/plistlib.py", line 467, in end_data self.addObject(Data.fromBase64(self.getData())) File "/py3k/Lib/plat-mac/plistlib.py", line 374, in fromBase64 return cls(binascii.a2b_base64(data)) SystemError: can't use str as char buffer ====================================================================== ERROR: test_appleformattingfromliteral (__main__.TestPlistlib) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_plistlib.py", line 147, in test_appleformattingfromliteral pl2 = plistlib.readPlistFromBytes(TESTDATA) File "/py3k/Lib/plat-mac/plistlib.py", line 102, in readPlistFromBytes return readPlist(BytesIO(data)) File "/py3k/Lib/plat-mac/plistlib.py", line 77, in readPlist rootObject = p.parse(pathOrFile) File "/py3k/Lib/plat-mac/plistlib.py", line 405, in parse parser.ParseFile(fileobj) File "/py3k/Lib/plat-mac/plistlib.py", line 417, in handleEndElement handler() File "/py3k/Lib/plat-mac/plistlib.py", line 467, in end_data self.addObject(Data.fromBase64(self.getData())) File "/py3k/Lib/plat-mac/plistlib.py", line 374, in fromBase64 return cls(binascii.a2b_base64(data)) SystemError: can't use str as char buffer ====================================================================== ERROR: test_bytes (__main__.TestPlistlib) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_plistlib.py", line 133, in test_bytes data = plistlib.writePlistToBytes(pl) File "/py3k/Lib/plat-mac/plistlib.py", line 109, in writePlistToBytes writePlist(rootObject, f) File "/py3k/Lib/plat-mac/plistlib.py", line 93, in writePlist writer.writeValue(rootObject) File "/py3k/Lib/plat-mac/plistlib.py", line 250, in writeValue self.writeDict(value) File "/py3k/Lib/plat-mac/plistlib.py", line 278, in writeDict self.writeValue(value) File "/py3k/Lib/plat-mac/plistlib.py", line 256, in writeValue self.writeArray(value) File "/py3k/Lib/plat-mac/plistlib.py", line 284, in writeArray self.writeValue(value) File "/py3k/Lib/plat-mac/plistlib.py", line 252, in writeValue self.writeData(value) File "/py3k/Lib/plat-mac/plistlib.py", line 263, in writeData maxlinelength = 76 - len(self.indent.replace("\t", " " * 8) * TypeError: Type str doesn't support the buffer API ====================================================================== ERROR: test_bytesio (__main__.TestPlistlib) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_plistlib.py", line 155, in test_bytesio plistlib.writePlist(pl, b) File "/py3k/Lib/plat-mac/plistlib.py", line 93, in writePlist writer.writeValue(rootObject) File "/py3k/Lib/plat-mac/plistlib.py", line 250, in writeValue self.writeDict(value) File "/py3k/Lib/plat-mac/plistlib.py", line 278, in writeDict self.writeValue(value) File "/py3k/Lib/plat-mac/plistlib.py", line 256, in writeValue self.writeArray(value) File "/py3k/Lib/plat-mac/plistlib.py", line 284, in writeArray self.writeValue(value) File "/py3k/Lib/plat-mac/plistlib.py", line 252, in writeValue self.writeData(value) File "/py3k/Lib/plat-mac/plistlib.py", line 263, in writeData maxlinelength = 76 - len(self.indent.replace("\t", " " * 8) * TypeError: Type str doesn't support the buffer API ====================================================================== ERROR: test_io (__main__.TestPlistlib) ---------------------------------------------------------------------- Traceback (most recent call last): File "Lib/test/test_plistlib.py", line 127, in test_io plistlib.writePlist(pl, test_support.TESTFN) File "/py3k/Lib/plat-mac/plistlib.py", line 93, in writePlist writer.writeValue(rootObject) File "/py3k/Lib/plat-mac/plistlib.py", line 250, in writeValue self.writeDict(value) File "/py3k/Lib/plat-mac/plistlib.py", line 278, in writeDict self.writeValue(value) File "/py3k/Lib/plat-mac/plistlib.py", line 256, in writeValue self.writeArray(value) File "/py3k/Lib/plat-mac/plistlib.py", line 284, in writeArray self.writeValue(value) File "/py3k/Lib/plat-mac/plistlib.py", line 252, in writeValue self.writeData(value) File "/py3k/Lib/plat-mac/plistlib.py", line 263, in writeData maxlinelength = 76 - len(self.indent.replace("\t", " " * 8) * TypeError: Type str doesn't support the buffer API ---------------------------------------------------------------------- Ran 8 tests in 0.060s FAILED (errors=5) Traceback (most recent call last): File "Lib/test/test_plistlib.py", line 185, in test_main() File "Lib/test/test_plistlib.py", line 181, in test_main test_support.run_unittest(TestPlistlib) File "/py3k/Lib/test/test_support.py", line 541, in run_unittest _run_suite(suite) File "/py3k/Lib/test/test_support.py", line 523, in _run_suite raise TestFailed(msg) test.test_support.TestFailed: errors occurred; run in verbose mode for details From nicko at nicko.org Fri Aug 31 14:01:35 2007 From: nicko at nicko.org (Nicko van Someren) Date: Fri, 31 Aug 2007 13:01:35 +0100 Subject: [Python-3000] Need Decimal.__format__ In-Reply-To: <46D77086.3030207@acm.org> References: <46D77086.3030207@acm.org> Message-ID: <76E400CE-5A66-4409-A3DC-A9A4045D6CEB@nicko.org> On 31 Aug 2007, at 02:36, Talin wrote: ... > Also, I'm interested in suggestions as to any other standard types > that > ought to have a __format__ method, other than the obvious Date/Time > classes. What kinds of things do people usually want to print? For years I've thought that various collection types would benefit from better formatters. If space is limited (e.g. fixed width fields) then lists, tuples and sets are often better displayed truncated, with an ellipsis in lieu of the remainder of the contents. Being able to have a standard way to print your list as [2, 3, 5, 7, 11,...] when you only have 20 characters would frequently be useful. Having a formatter directive for collections to ask for the contents without the enclosing []/()/{} would also be useful. It's not clear to me that there's much to be gained from building in more complex formatters for dictionary-like objects, since it will be hard to describe the plethora of different ways it could be done, and while having multi-line formatting options for long lists/sets would be nice it may deviate too far from the standard usage of string formatting, but displaying simple collections is a sufficiently common task that I think it's worth looking at. Nicko From martin at v.loewis.de Fri Aug 31 14:05:19 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 31 Aug 2007 14:05:19 +0200 Subject: [Python-3000] Release Countdown In-Reply-To: <46D7FE7D.5020909@trueblade.com> References: <46D7FE7D.5020909@trueblade.com> Message-ID: <46D803FF.9000909@v.loewis.de> >> For me on OS X, I'm still getting a failure in test_plistlib and an >> unexpected skip in test_ssl. > > If it helps, the test_plistlib errors follow. In case it isn't clear: test_plistlib will fail *only* on OS X, because it isn't run elsewhere. So somebody with OS X needs to fix it. Regards, Martin From eric+python-dev at trueblade.com Fri Aug 31 14:17:28 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Fri, 31 Aug 2007 08:17:28 -0400 Subject: [Python-3000] Release Countdown In-Reply-To: <46D803FF.9000909@v.loewis.de> References: <46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de> Message-ID: <46D806D8.4070905@trueblade.com> Martin v. L?wis wrote: >>> For me on OS X, I'm still getting a failure in test_plistlib and an >>> unexpected skip in test_ssl. >> If it helps, the test_plistlib errors follow. > > In case it isn't clear: test_plistlib will fail *only* on OS X, because > it isn't run elsewhere. So somebody with OS X needs to fix it. Yes, thanks. I won't have time to fix it before a1, I thought maybe this might jog the memory of someone who is more familiar with the test to suggest a fix. If not, I can take a look at it post a1, and/or we could disable the test for a1. Eric. From eric+python-dev at trueblade.com Fri Aug 31 14:19:55 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Fri, 31 Aug 2007 08:19:55 -0400 Subject: [Python-3000] Need Decimal.__format__ In-Reply-To: <46D77086.3030207@acm.org> References: <46D77086.3030207@acm.org> Message-ID: <46D8076B.8070005@trueblade.com> Talin wrote: > I'm looking for a volunteer who understands the Decimal class well > enough to write a __format__ method for it. It should handle all of the > same format specifiers as float.__format__, but it should not use the > same implementation as float (so as to preserve accuracy.) > > Also, I'm interested in suggestions as to any other standard types that > ought to have a __format__ method, other than the obvious Date/Time > classes. What kinds of things do people usually want to print? I have a patch for adding __format__ to datetime, date, and time. For a zero length format_spec, they return str(self), otherwise self.strftime(format_spec). I can whip up some tests and check it in if you want this before a1, but if you want more discussion on what it should do then we can wait. Let me know. But since the deadline is in 40 minutes, I guess we can do it for a2. As for what other types, I can't think of any. I've scanned through my real work code, and int, float, string, and datetime pretty much cover it. Eric. From barry at python.org Fri Aug 31 14:40:19 2007 From: barry at python.org (Barry Warsaw) Date: Fri, 31 Aug 2007 08:40:19 -0400 Subject: [Python-3000] Release Countdown In-Reply-To: <9e804ac0708310425p42dd5461s13a1e4ddd6c943e9@mail.gmail.com> References: <9e804ac0708310425p42dd5461s13a1e4ddd6c943e9@mail.gmail.com> Message-ID: <1D9688DF-92D5-4602-A93B-0D3998FD8891@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 31, 2007, at 7:25 AM, Thomas Wouters wrote: > On 8/31/07, Barry Warsaw wrote: > For me on OS X, I'm still getting a failure in test_plistlib and an > unexpected skip in test_ssl. > > The skip is intentional; the ssl module is in a state of flux, > having the latest changes from the trunk applied, but not adjusted > to the new layout of the socket.socket class. Does that mean the skip is intentionally unexpected, or unexpectedly intentional? :) - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtgMM3EjvBPtnXfVAQKzkwQApEm7j9iF1PCXZzYvNo6JH7Bu7BEv7TZ6 YFMQOHh4BXay4TmQxvx3jhjD4jnql01e6dBRCaNJ0xCNhsBXMOsAHc/EUkdYR7QF 5D8ozpw3uPEkhWh7AeQpynFuLdtObWmApKEXxjbDFmP/hq5LifAfHUwakx6z4F50 /iRWLNp7k6w= =aXRL -----END PGP SIGNATURE----- From barry at python.org Fri Aug 31 14:41:26 2007 From: barry at python.org (Barry Warsaw) Date: Fri, 31 Aug 2007 08:41:26 -0400 Subject: [Python-3000] Release Countdown In-Reply-To: <46D806D8.4070905@trueblade.com> References: <46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de> <46D806D8.4070905@trueblade.com> Message-ID: <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 31, 2007, at 8:17 AM, Eric Smith wrote: > Martin v. L?wis wrote: >>>> For me on OS X, I'm still getting a failure in test_plistlib and an >>>> unexpected skip in test_ssl. >>> If it helps, the test_plistlib errors follow. >> >> In case it isn't clear: test_plistlib will fail *only* on OS X, >> because >> it isn't run elsewhere. So somebody with OS X needs to fix it. > > Yes, thanks. I won't have time to fix it before a1, I thought maybe > this might jog the memory of someone who is more familiar with the > test > to suggest a fix. > > If not, I can take a look at it post a1, and/or we could disable the > test for a1. I took a 5 minute crack at it this morning but didn't get anywhere and won't have any more time to work on it before a1. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRtgMdnEjvBPtnXfVAQKooAP5AfZCcbg682Ff/hig8Y2ZUoWdlvCpNvgL hFLac958MYT6VmqH6/HwXnwcW1CD7l7/7RkooFGAfecG1Rr88THQHvh0k6W09Hur lwSb65yflVRbGer0RsERgUcgZ5S1bZkzo/0NGCbmQB99RPhzTEDfSLWmFKqOyFa1 /GpuHVoIFWA= =8/zS -----END PGP SIGNATURE----- From martin at v.loewis.de Fri Aug 31 14:52:36 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 31 Aug 2007 14:52:36 +0200 Subject: [Python-3000] buildbots In-Reply-To: References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> <46D6741A.9040801@v.loewis.de> <46D6B904.1020505@v.loewis.de> <46D72B4F.6040908@v.loewis.de> Message-ID: <46D80F14.5080009@v.loewis.de> > Any suggestions? I've now backed out my first patch, and implemented an extension to msvcrt, as well as a command line option for regrtest. Let's see how this works. Regards, Martin From theller at ctypes.org Fri Aug 31 15:11:47 2007 From: theller at ctypes.org (Thomas Heller) Date: Fri, 31 Aug 2007 15:11:47 +0200 Subject: [Python-3000] buildbots In-Reply-To: <46D80F14.5080009@v.loewis.de> References: <46D453E9.4020903@ctypes.org> <46D45CCA.3050206@v.loewis.de> <46D462EB.4070600@ctypes.org> <46D4721A.2040208@ctypes.org> <46D57D47.1090709@v.loewis.de> <46D5D22C.3010003@v.loewis.de> <46D66319.7030209@v.loewis.de> <46D66B49.8070600@v.loewis.de> <46D6741A.9040801@v.loewis.de> <46D6B904.1020505@v.loewis.de> <46D72B4F.6040908@v.loewis.de> <46D80F14.5080009@v.loewis.de> Message-ID: Martin v. L?wis schrieb: >> Any suggestions? > > I've now backed out my first patch, and implemented an extension > to msvcrt, as well as a command line option for regrtest. Let's > see how this works. > > Regards, > Martin At least the tests on the win32 buildbot now do not hang any longer if I do not click the abort button on the message box. See for example http://python.org/dev/buildbot/3.0/x86%20XP-3%203.0/builds/57/step-test/0 Thanks, Thomas From theller at ctypes.org Fri Aug 31 15:16:49 2007 From: theller at ctypes.org (Thomas Heller) Date: Fri, 31 Aug 2007 15:16:49 +0200 Subject: [Python-3000] Merging between trunk and py3k? Message-ID: Will commits still be merged between trunk and py3k in the future (after the 3.0a1 release), or must this now be down by the developers themselves? Or is it less work for the one who does the merge if applicable bug fixes are comitted to both trunk and py3k branch? Thomas From guido at python.org Fri Aug 31 15:25:17 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Aug 2007 06:25:17 -0700 Subject: [Python-3000] Py3k Branch FROZEN Message-ID: Please don't submit anything to the py3k branch until I announce it's unfrozen or I specifically ask you to do something. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 31 15:45:20 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Aug 2007 06:45:20 -0700 Subject: [Python-3000] str.decode; buffers In-Reply-To: References: Message-ID: Yes on both accounts. Checkin coming up. On 8/31/07, Georg Brandl wrote: > Two short issues: > > * Shouldn't str.decode() be removed? Every call to it says > "TypeError: decoding str is not supported". > > * Using e.g. b"abc".find("a") gives "SystemError: can't use str as char buffer". > This should be a TypeError IMO. > > Georg > > -- > Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. > Four shall be the number of spaces thou shalt indent, and the number of thy > indenting shall be four. Eight shalt thou not indent, nor either indent thou > two, excepting that thou then proceed to four. Tabs are right out. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 31 15:49:49 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Aug 2007 06:49:49 -0700 Subject: [Python-3000] str.decode; buffers In-Reply-To: References: Message-ID: FWIW I think "s".find(b"b") should also raise a TypeError, but I don't have the guts to tackle that today. On 8/31/07, Guido van Rossum wrote: > Yes on both accounts. Checkin coming up. > > On 8/31/07, Georg Brandl wrote: > > Two short issues: > > > > * Shouldn't str.decode() be removed? Every call to it says > > "TypeError: decoding str is not supported". > > > > * Using e.g. b"abc".find("a") gives "SystemError: can't use str as char buffer". > > This should be a TypeError IMO. > > > > Georg > > > > -- > > Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. > > Four shall be the number of spaces thou shalt indent, and the number of thy > > indenting shall be four. Eight shalt thou not indent, nor either indent thou > > two, excepting that thou then proceed to four. Tabs are right out. > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 31 15:57:53 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Aug 2007 06:57:53 -0700 Subject: [Python-3000] Merging between trunk and py3k? In-Reply-To: References: Message-ID: I haven't heard yet that merging is impossible or useless; there's still a lot of similarity between the trunk and the branch. As long that remains the case, I'd like to continue to do merges (except for those files that have been completely rewritten or removed, like README, bufferobject.* or intobject.*). Once we stop merging, I'd like to reformat all C code to conform to the new coding standard (4-space indents, no tabs, no trailing whitespace, 80-col line length strictly enforced). But I expect that'll be a long time in the future. --Guido On 8/31/07, Thomas Heller wrote: > Will commits still be merged between trunk and py3k in the future > (after the 3.0a1 release), or must this now be down by the developers > themselves? > > Or is it less work for the one who does the merge if applicable bug fixes are > comitted to both trunk and py3k branch? > > Thomas > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 31 18:24:47 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Aug 2007 09:24:47 -0700 Subject: [Python-3000] Python 3.0a1 released! Message-ID: The release is available from http://python.org/download/releases/3.0/ I'll send a longer announcement to python-list and python-announce-list. Please blog about this if you have a blog! Thanks to all who helped out! It's been a great ride. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 31 18:25:39 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Aug 2007 09:25:39 -0700 Subject: [Python-3000] Py3k Branch UNFROZEN Message-ID: The branch is now unfrozen. I tagged the release as r30a1. On 8/31/07, Guido van Rossum wrote: > Please don't submit anything to the py3k branch until I announce it's > unfrozen or I specifically ask you to do something. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Aug 31 18:36:08 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Aug 2007 09:36:08 -0700 Subject: [Python-3000] Release Countdown In-Reply-To: <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org> References: <46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de> <46D806D8.4070905@trueblade.com> <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org> Message-ID: On 8/31/07, Barry Warsaw wrote: > On Aug 31, 2007, at 8:17 AM, Eric Smith wrote: > > Martin v. L?wis wrote: > >>>> For me on OS X, I'm still getting a failure in test_plistlib and an > >>>> unexpected skip in test_ssl. > >>> If it helps, the test_plistlib errors follow. > >> > >> In case it isn't clear: test_plistlib will fail *only* on OS X, > >> because > >> it isn't run elsewhere. So somebody with OS X needs to fix it. > > > > Yes, thanks. I won't have time to fix it before a1, I thought maybe > > this might jog the memory of someone who is more familiar with the > > test > > to suggest a fix. > > > > If not, I can take a look at it post a1, and/or we could disable the > > test for a1. > > I took a 5 minute crack at it this morning but didn't get anywhere > and won't have any more time to work on it before a1. No worry, I cracked it, just in time before the release. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Fri Aug 31 18:46:41 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 31 Aug 2007 18:46:41 +0200 Subject: [Python-3000] Compiling Python 3.0 with MS Visual Studio 2005 Message-ID: I tried to compile Python 3.0 with MS Visual Studio 2005 on Windows XP SP2 (German) and I run into multiple problems with 3rd party modules. The problem with time on German installations of Windows still exists. It renders Python 3.0a on Windows for Germans pretty useless. :/ * the import of _time still fails on my (German) windows box because my time zone contains umlauts. "set TZ=GMT" fixes the problem. * bzip2: The readme claims that libbz2.lib is compiled automatically but it's not the case for pcbuild8. I had to compile it manually using the libbz2.dsp project file with the target "Release". * bsddb: The recipe for _bsddb isn't working because Berkeley_DB.sln was created with an older version of Visual Studio. But one can convert the file and build it with Visual Studio easily without using the shell. Only the db_static project is required. The bsddb project has another issue. The file _bsddb.pyd ends in win32pgo and not in win32debug or win32release. * MSI: msi.lib is missing on x86. It's must be installed with the platform SDK. The blog entry http://blogs.msdn.com/heaths/archive/2005/12/15/504399.aspx explains the background and how to install msi.lib * sqlite3: The sqlite3.dll isn't copied into the build directories win32debug and win32release. This breaks the imports and tests. * SSL: The _ssl project is not listed in the project map and the compilation of openssl fails with an error in crypto/des/enc_read.c:150 "The POSIX name for this item is deprecated. Instead, use the ISO C++ conformant name: _read." Christian From duda.piotr at gmail.com Fri Aug 31 19:26:08 2007 From: duda.piotr at gmail.com (Piotr Duda) Date: Fri, 31 Aug 2007 19:26:08 +0200 Subject: [Python-3000] os.stat() raises UnicodeDecodeError on Polish windows if file not exist Message-ID: <3df8f2650708311026k28cb1713vefde51d049750f44@mail.gmail.com> In 3.0a1 on Polish winxp os.stat() raises UnicodeDecodeError (utf-8 codec can't decode ...) if file not exist, it is probably caused by localized error messages returned by FormatMessage. -- ???????? ?????? From lists at cheimes.de Fri Aug 31 20:14:26 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 31 Aug 2007 20:14:26 +0200 Subject: [Python-3000] os.stat() raises UnicodeDecodeError on Polish windows if file not exist In-Reply-To: <3df8f2650708311026k28cb1713vefde51d049750f44@mail.gmail.com> References: <3df8f2650708311026k28cb1713vefde51d049750f44@mail.gmail.com> Message-ID: Piotr Duda wrote: > In 3.0a1 on Polish winxp os.stat() raises UnicodeDecodeError (utf-8 > codec can't decode ...) if file not exist, it is probably caused by > localized error messages returned by FormatMessage. On German Win XP, too Christian From amauryfa at gmail.com Fri Aug 31 22:31:10 2007 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 31 Aug 2007 22:31:10 +0200 Subject: [Python-3000] os.stat() raises UnicodeDecodeError on Polish windows if file not exist In-Reply-To: References: <3df8f2650708311026k28cb1713vefde51d049750f44@mail.gmail.com> Message-ID: Hello, Christian Heimes wrote: > Piotr Duda wrote: > > In 3.0a1 on Polish winxp os.stat() raises UnicodeDecodeError (utf-8 > > codec can't decode ...) if file not exist, it is probably caused by > > localized error messages returned by FormatMessage. > > On German Win XP, too Would you please test with the following patch? it seems to correct the problem on my French Windows XP. Maybe we can have it corrected before a complete European tour... -- Amaury Forgeot d'Arc -------------- next part -------------- A non-text attachment was scrubbed... Name: errors.diff Type: application/octet-stream Size: 4574 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070831/2a318b5d/attachment.obj From guido at python.org Fri Aug 31 22:59:42 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Aug 2007 13:59:42 -0700 Subject: [Python-3000] os.stat() raises UnicodeDecodeError on Polish windows if file not exist In-Reply-To: References: <3df8f2650708311026k28cb1713vefde51d049750f44@mail.gmail.com> Message-ID: Can you guys please put this in the bug tracker too? On 8/31/07, Amaury Forgeot d'Arc wrote: > Hello, > Christian Heimes wrote: > > Piotr Duda wrote: > > > In 3.0a1 on Polish winxp os.stat() raises UnicodeDecodeError (utf-8 > > > codec can't decode ...) if file not exist, it is probably caused by > > > localized error messages returned by FormatMessage. > > > > On German Win XP, too > > Would you please test with the following patch? it seems to correct > the problem on my French Windows XP. > Maybe we can have it corrected before a complete European tour... > > -- > Amaury Forgeot d'Arc > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Fri Aug 31 23:00:47 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 31 Aug 2007 17:00:47 -0400 Subject: [Python-3000] Release Countdown In-Reply-To: References: <46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de> <46D806D8.4070905@trueblade.com> <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org> Message-ID: On 8/31/07, Guido van Rossum wrote: > > >>>> For me on OS X, I'm still getting a failure in test_plistlib and an > No worry, I cracked it, just in time before the release. Seeing the recent changes to plistlib does make me think that bytes is more awkward than it should be. The changes I would suggest: (1) Allow bytes methods to take a literal string (which will obviously be in the source file's encoding). Needing to change for line in data.asBase64(maxlinelength).split("\n"): to for line in data.asBase64(maxlinelength).split(b"\n"): (even when I know the "integers" represent ASCII letters) is exactly the sort of type-checking that annoys me in Java. http://svn.python.org/view/python/branches/py3k/Lib/plat-mac/plistlib.py?rev=57844&r1=57744&r2=57844 (2) There really ought to be an immutable bytes type, and the literal (or at least a literal, if capitalization matters) ought to be the immutable. PLISTHEADER = b"""\ """ If the value of PLISTHEADER does change during the run, it will almost certainly be a bug. I could code defensively by only ever passing copies, but that seems wasteful, and it could hide other bugs. If something does try to modify (not replace, modify) it, then there was probably a typo or API misunderstanding; I *want* an exception. http://svn.python.org/view/python/branches/py3k/Lib/plat-mac/plistlib.py?rev=57563&r1=57305&r2=57563 -jJ From guido at python.org Fri Aug 31 23:03:46 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Aug 2007 14:03:46 -0700 Subject: [Python-3000] Release Countdown In-Reply-To: References: <46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de> <46D806D8.4070905@trueblade.com> <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org> Message-ID: On 8/31/07, Jim Jewett wrote: > On 8/31/07, Guido van Rossum wrote: > > > > >>>> For me on OS X, I'm still getting a failure in test_plistlib and an > > > No worry, I cracked it, just in time before the release. > > Seeing the recent changes to plistlib does make me think that bytes is > more awkward than it should be. The changes I would suggest: > > (1) Allow bytes methods to take a literal string (which will > obviously be in the source file's encoding). Yuck, yuck about the source file encoding part. Also, there is no way to tell that a particular argument was passed a literal. The very definition of "this was a literal" is iffy -- is x a literal when passed to f below? x = "abc" f(x) > Needing to change > > for line in data.asBase64(maxlinelength).split("\n"): > to > for line in data.asBase64(maxlinelength).split(b"\n"): > > (even when I know the "integers" represent ASCII letters) is exactly > the sort of type-checking that annoys me in Java. > > http://svn.python.org/view/python/branches/py3k/Lib/plat-mac/plistlib.py?rev=57844&r1=57744&r2=57844 > > > (2) There really ought to be an immutable bytes type, and the literal > (or at least a literal, if capitalization matters) ought to be the > immutable. > > PLISTHEADER = b"""\ > > PLIST 1.0//EN" "http://www.apple.com/DTDs/ > PropertyList-1.0.dtd"> > """ > > If the value of PLISTHEADER does change during the run, it will almost > certainly be a bug. I could code defensively by only ever passing > copies, but that seems wasteful, and it could hide other bugs. If > something does try to modify (not replace, modify) it, then there was > probably a typo or API misunderstanding; I *want* an exception. Sounds like you're worrying to much. Do you have any indication that this is going to be a common problem? > http://svn.python.org/view/python/branches/py3k/Lib/plat-mac/plistlib.py?rev=57563&r1=57305&r2=57563 > > -jJ > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Fri Aug 31 23:23:41 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 31 Aug 2007 23:23:41 +0200 Subject: [Python-3000] Compiling Python 3.0 with MS Visual Studio 2005 In-Reply-To: References: Message-ID: <46D886DD.2070601@v.loewis.de> Christian Heimes schrieb: > I tried to compile Python 3.0 with MS Visual Studio 2005 on Windows XP > SP2 (German) and I run into multiple problems with 3rd party modules. > The problem with time on German installations of Windows still exists. Not for me - it works fine here. Are you sure your source is up-to-date? I can't comment on PCbuild8 problems - this directory is largely unmaintained. Regards, Martin From pfdubois at gmail.com Thu Aug 30 19:11:03 2007 From: pfdubois at gmail.com (Paul Dubois) Date: Thu, 30 Aug 2007 10:11:03 -0700 Subject: [Python-3000] Patch for Doc/tutorial Message-ID: Attached is a patch for changes to the tutorial. I made it by doing: svn diff tutorial > tutorial.diff in the Doc directory. I hope this is what is wanted; if not let me know what to do. Unfortunately cygwin will not run Sphinx correctly even using 2.5, much less 3.0. And running docutils by hand gets a lot of errors because Sphinx has hidden a lot of the definitions used in the tutorial. So the bottom line is I have only an imperfect idea if I have screwed up any formatting. I would like to rewrite the classes.rst file in particular, and it is the one that I did not check to be sure the examples worked, but first I need to do something about getting me a real Linux so I don't have these problems. So unless someone is hot to trot I'd like to remain 'owner' of this issue on the spreadsheet. Whoever puts in these patches, I would appreciate being notified that it is done. Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070830/b366efa3/attachment-0001.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: tutorial.diff Type: application/octet-stream Size: 54448 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070830/b366efa3/attachment-0001.obj