From brett at python.org Sat Aug 1 00:17:53 2009 From: brett at python.org (Brett Cannon) Date: Fri, 31 Jul 2009 15:17:53 -0700 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: References: Message-ID: On Fri, Jul 31, 2009 at 14:16, Jacob Rus wrote: > Hi all, > > In an attempt to figure out some twisted.web code, I was reading > through the Python Standard Library?s mimetypes module today, and > was shocked at the poor quality of the code. I wonder how the > mimetypes code made it into the standard library, and whether anyone > has ever bothered to read it or update it: it is an embarrassment. > Much of the code is redundant, portions fail to execute, control > flow is routed through a horribly confusing mess of spaghetti, and > most of the complexity has no clear benefit as far as I can tell. I > probably should drop the subject and get back to work, but as a good > citizen, it?s hard to just ignore this sort of thing. > I have not looked at the code nor ever used it (that I can remember) so I can't directly address the quality. But I can say the code was added in 1997 which puts it as an addition in Python 1.4. That's why before Python took off mainstream and began to tighten up the quality control on the standard library. I also would like to stay that I am not embarrassed by anything in Python. It's unfortunate if the mimetypes module's code is a mess, but I think putting at embarrassing is taking a little far and borderline insulting (which I don't think you meant to do). > > mimetypes.py stores its types in a pair of dictionaries, one for > "strict" use, and the other for "non-standard types". It creates the > strict dictionary by default out of apache's mime.types file, and > then overrides the entries it finds with a set of exceptions. Then > it creates the non-standard dictionary, which is set to match if the > strict parameter is set to False when guessing types. Just in this > basic design, and in the list of types in the file, there are > several problems: > > * Various apache mime types files are read, if found, but the > ordering of the files is such that older versions of apache are > sometimes read after newer ones, overriding updated mime types > with out-of-date versions if multiple versions of apache are > installed on the system. > > * The vast majority of types declared in mimetypes.py are > duplicates of types already declared by Apache. In a few cases > this is to change the apache default (make an exception, that > is), but in most cases the mime type and extension are > completely identical. This huge number of redundant types makes > the file substantially harder to follow. No comments are > provided to explain why various sets of exceptions are made to > Apache's default mime types, and in several cases mimetypes.py > seems to just be out of date as compared to recent versions of > Apache, for instance not knowing about the 'text/troff' type > which was registered in January 2006 in RFC 4263. > > * The 'non-standard' type dictionary is nearly useless, because > all of the types it declares are already in apache's mime.types > file, meaning that types are, as far as I can tell trying to > follow ugly program flow, *never* drawn from the non-strict > dictionary, except in the improbable situation where the > mimetypes module is initialized with a custom set of > apache-mime.types?like files, which does not include those > 'non-standard' types. I personally cannot see a use case for > initializing the module with a custom set of mime types, but > then leaving the very few types included as non-strict to the > defaults: this seems like a fragile and pathological use case. > Given this, I don?t see any benefit to dragging the 'strict' > parameter along all the way through the code, and would advise > getting rid of it altogether. Does anyone know of any code that > uses the mimetypes module with strict set to False, where the > non-strict code path ever *actually* is executed? > > But though these problems, which affect actual use of the code and > are therefore probably most important, are significant, they really > pale in comparison to the awful quality of implementation. I'll try > to briefly outline my understanding of how code flows in > mimetypes.py, and what the problems are. I haven't stepped through > the code in a debugger, this is just from reading it, so I apologize > in advance if I get something wrong. This is, however, some of the > worst code I?ve seen in the standard library or anywhere else. > > * It defines __all__: I didn?t even realize __all__ could be used > for single-file modules (w/o submodules), but it definitely > shouldn?t be here. __all__ is used to control what a module exports when used in an import *, nothing more. Thus it's use in a module compared to a package is completely legitimate. > This specific __all__ oddly does not include > all of the documented variables and functions in the mimetypes > class. It?s not clear why someone calling import * here wouldn?t > want the bits not included. If something is documented by not listed in __all__ that is a bug. > > > * It creates a _default_mime_types() function which declares a > bunch of global variables, and then immediately calls > _default_mime_types() below the definition. There is literally > no difference in result between this and just putting those > variables at the top level of the file, so I have no idea why > this function exists, except to make the code more confusing. > It could potentially be used for testing, but that's a guess. > > * It allows command line usage: I don?t think this is necessary > for a part of the standard library like this. There are better > tools for finding mime types from the command line which ship > with most operating systems. Yeah, various modules have command-line versions which are not truly necessary. This can probably stand to go. > > > * Its API is pretty poorly designed. It offers 6 functions when > about 3 are needed, and it takes a couple reads-through of the > code to figure out exactly what any of them are supposed to do. > > * The operation is crazy: It defines a MimeTypes class which > actually stores the type mappings, but this class is designed to > be a singleton. The way that such a design is enforced is > through the use of the module-global 'init' function, which > makes an instance of the class, and then maps all of the > functions in the module global namespace to instance methods. > But confusingly, all such functions are also defined > independently of the init function, with definitions such as: > > def guess_type(url, strict=True): > if not inited: > init() > return guess_type(url, strict) > > I?d be amazed if anyone could guess what that code was trying to > do. I did a double-take when I saw it. > Probably came from someone who is very OO happy. Not everyone comes to Python ready to embrace its procedural or slightly functional facets. > > Of course, that return call is only ever reached the first time > this function is called, if init() has not happened yet. This > was all presumably done for lazy initialization, so that the > type information would only be loaded when needed. Needless to > say, there are more pythonic ways to accomplish such a goal. > > Oh, also, the other good one here is that it means that someone > who writes `from mimetypes import guess_types` gets something > different than someone who writes: > `import mimetypes; guess_types = mimetypes.guess_types`. In the > former case, this wrapper function is saved as guess_type, which > each time just calls the (changed after init()) > mimetypes.guess_types function. This caused a performance > nightmare before March of this year, when there was no check for > `if not inited` before running init() (amazing!?). > > * Because the type datastore is set up to be a singleton, any time > init() is called in one section of code, it resets any types > which have been added manually: this means that if init() is > called by different pieces of code in the same python program, > they will interfere with each-others? type databases, and break > each-other. This is extremely fragile and, in my opinion, crazy. > It is hard for me to imagine any use case that would benefit > from this ability to clobber custom type mappings, and I very > much doubt that any code calling the mimetypes module realizes > that the contract of the API is so flimsy by definition. In > practice, I would not advise consumers of this API to ever call > init() manually, or to ever add custom mime type mappings, > because they are setting themselves up for hard-to-track bugs > down the line. > > * The 'inited' flag is a documented part of the interface, in the > standard library documentation. I cannot imagine any reason to > set this flag manually: setting it to false when it was true > will have no effect, because the top-level functions have > already been replaced by instance methods of the 'db' MimeTypes > instance. Setting it to true when it was false will make the > code just break outright. > > * In python 3, this has been changed a bit. There?s still an > inited flag, and it still in the docs, but now awful code from > above has been changed slightly, to: > > def guess_type(url, strict=True): > if _db is None: > init() > return _db.guess_type(url, strict) > > Which is still embarrassingly confusing. On the upside, the > inited flag now does literally nothing, but remains defined, and > in the docs. > > * The 'types_map' and 'common_types' (for 'strict' and > 'common' types, respectively) dictionaries are also a documented > part of the interface. When init() is called, a new MimeTypes > instance makes a (different) types_map which is a tuple of two > dictionaries, for 'strict' and 'common' types. Then this > instance reads the apache mime.types files and adds the types to > its pair of self.types_map dictionaries, and then after that > looks at the global types_map and common_types dictionaries and > adds *those* types to its self.types_map. Then at the end it > replaces the global types_map with self.types_map[True] and > replaces common_types with self.types_map[False]. Unfortunately, > while changing these dictionaries will have an effect on the > operation of the library, it will not update the types_map_inv > mapping, so inverse lookups will not behave as the changer > expects. If these dictionaries are going to remain documented, > the documentation should be clear to describe them as read only > to avoid very confusing bugs. > > * Speaking of these dictionaries, .copy() is called on those two > and a few other inside MimeTypes.__init__(), which happens every > time the global init() function is called, but then init() puts > the copies back in the global namespace, meaning that the > original is discarded. Basically the only reason for the .copy() > is to make sure that the correct updates are applied to the > apache mimetype defaults, but the code will gladly re-read all > of the apache files even after its mapped types are already in > these dictionaries, essentially making re-initializing a (very > expensive) no-op. All we?re doing is a lot of unnecessary extra > disk reads and memory allocations and deallocations. The only > time this has any effect is when a non-singleton MimeTypes > instance is created, as in the read_mime_types function. > > * And that read_mime_types function is a doozy. It tries to open a > filename, spits back None if there?s an IOError (instead of > raising the exception as it should), and then creates a new > MimeTypes instance (remember, this is identical to the singleton > MimeTypes instance because it starts itself from that one?s > mappings), adds any new types it finds in the file with that > name, and then returns the 'strict' types_map from it. I?m not > sure whether any sane user of this API would expect it to return > the existing type mappings *plus* the extra ones in the provided > filename, but I really can?t imagine this function ever being > particularly useful: it requires you are reading mime types in > apache format, but not the apache mime type files you already > looked at, and then the only way to find out what new mappings > were defined is to take the difference of the default mappings > with the result of the function. > > * The code itself, on a line-by-line basis, is unpythonic and > unnecessarily verbose, confusing, and slow. The code should be > rewritten to use python 2.3?2.6 features: even leaving its > functionality identical it could be cut to about half the number > of lines, and made clearer. > > In case the above doesn?t make this clear: this code is extremely > confusing. Yeah, kind of picked up on that. =) > Trying to read it has caused all the people around me to > look up as I shout "what the fuck??!" at the screen every few > minutes, as each new revelation gives another surprise. I?m not > convinced that I completely understand what the code does, because > it has been quite effectively obfuscated, but I understand enough to > want to throw the whole thing out, and start essentially from > scratch. > > So the question is, what should be done about this? I?d like to hear > how people use the mimetypes module, and how they expect it to work, > to figure out the sanest possible mostly-backwards-compatible > replacement which could be dropped in (ideally this would just allow > the use of default mimetypes and rip out the ability to alter the > default datastore: or is there some easy way to change this away > from a singleton without breaking code which calls these methods?), > and then extend that replacement to support a somewhat saner model > for anyone who actually wants to extend the set of mappings. My > guess is that replacement code could actually fix subtle bugs in > existing uses of this module, by people who had a sane expectation > of how it was supposed to work. > > At the very least, the parts about figuring out exactly which > exceptions to Apache?s set of default types are useful would be a > good idea, and I?d maybe even recommend including an up-to-date copy > of Apache?s mime.types file in the Python Standard Library, and then > only overriding its definitions for future versions of Apache (and > then overriding the combination of both of those with further > exceptions deemed useful for python, with comments explaining why > each exception), so that we?re not bothering to look up horribly > out-of-date types in multiple locations from Apache 1, 1.2, 1.3, > etc. I?d also recommend making the API for overriding definitions be > the same as the code used to declare the default overrides, because > as it is there are three ways do define types: a) in a mime.types > formatted file, b) in a python dictionary that gets initialized with > a confusing bit of code, and c) through the add_type function. > > Does anyone else have thoughts about this, or maybe some good (it > had better be *really* good) explanations why this code is the way > it is? I'd be happy to try to rewrite it, but I think I?d need a bit > of help figuring out how to make the rewrite backwards-compatible. So the problem of changing fundamentally how the code works, even for a cleanup, is that it will break someone's code out there because they depended on the module's crazy way of doing things. Now if they are cheating and looking at things that are meant to be hidden you might be able to clean things up, but if the semantics are exposed to the user, then there is not much we can do w/o breaking someone's code. Honestly, if the code is as bad as it seems -- including its API --, the best bet would be to come up with a new module for handling MIME types from scratch, put it up on the Cheeseshop/PyPI, and get the community behind it. If the community picks it up as the de-facto replacement for mimetypes and the code has settled we can then talk about adding it to the standard library and begin deprecating mimetypes. And thanks for willing to volunteer to fix this. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From amcnabb at mcnabbs.org Sat Aug 1 00:01:10 2009 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Fri, 31 Jul 2009 16:01:10 -0600 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: References: Message-ID: <20090731220110.GB11156@mcnabbs.org> On Fri, Jul 31, 2009 at 09:16:02PM +0000, Jacob Rus wrote: > > * The operation is crazy: It defines a MimeTypes class which > actually stores the type mappings, but this class is designed to > be a singleton. The way that such a design is enforced is > through the use of the module-global 'init' function, which > makes an instance of the class, and then maps all of the > functions in the module global namespace to instance methods. > But confusingly, all such functions are also defined > independently of the init function, with definitions such as: > > def guess_type(url, strict=True): > if not inited: > init() > return guess_type(url, strict) I can't speak for any of your other complaints, but I know that this weird init stuff is fixed in trunk. For the other stuff, you seem to have some very good points. I'm sure a patch would be welcome. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From rdmurray at bitdance.com Sat Aug 1 00:33:20 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Fri, 31 Jul 2009 18:33:20 -0400 (EDT) Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: References: Message-ID: On Fri, 31 Jul 2009 at 15:17, Brett Cannon wrote: >> * It creates a _default_mime_types() function which declares a >> bunch of global variables, and then immediately calls >> _default_mime_types() below the definition. There is literally >> no difference in result between this and just putting those >> variables at the top level of the file, so I have no idea why >> this function exists, except to make the code more confusing. >> > > It could potentially be used for testing, but that's a guess. regrtest calls it from dash_R_cleanup as part of "clear[ing] assorted module caches". --David From brett at python.org Sat Aug 1 00:52:17 2009 From: brett at python.org (Brett Cannon) Date: Fri, 31 Jul 2009 15:52:17 -0700 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: References: Message-ID: On Fri, Jul 31, 2009 at 15:38, Jacob Rus wrote: > Brett Cannon wrote: > > Jacob Rus wrote: > >> * It defines __all__: I didn?t even realize __all__ could be used > >> for single-file modules (w/o submodules), but it definitely > >> shouldn?t be here. > > > > __all__ is used to control what a module exports when used in an import > *, > > nothing more. Thus it's use in a module compared to a package is > completely > > legitimate. > > > >> This specific __all__ oddly does not include > >> all of the documented variables and functions in the mimetypes > >> class. It?s not clear why someone calling import * here wouldn?t > >> want the bits not included. > > > > If something is documented by not listed in __all__ that is a bug. > > In this case, everything in the module is documented, including parts > that should be private, but only a small number are in __all__. My > recommendation would be to make those private parts be _ variables and > remove them from the docs (using them has no legitimate use cases I > can see), and rip out __all__. > Well, if the module had stuff that did not lead with an underscore then you can't remove it. You can deprecate it under the old name and rename it with an underscore, but backwards-compatibility says someone out there is using those functions so you can't just batch rename them w/o the proper warning. > > >> * It creates a _default_mime_types() function which declares a > >> bunch of global variables, and then immediately calls > >> _default_mime_types() below the definition. There is literally > >> no difference in result between this and just putting those > >> variables at the top level of the file, so I have no idea why > >> this function exists, except to make the code more confusing. > > > > It could potentially be used for testing, but that's a guess. > > Here's an abridged version of this function. I don?t think there?s any > reason for this that I can see. > > def _default_mime_types(): > global suffix_map > global encodings_map > global types_map > global common_types > > suffix_map = { > '.tgz': '.tar.gz', #... > } > > encodings_map = { > '.gz': 'gzip', #... > } > > types_map = { > '.a' : 'application/octet-stream', #... > } > > common_types = { > '.jpg' : 'image/jpg', #... > } > > _default_mime_types() > As R. David pointed out, it is being used by regrtest to clean up after running the test suite. > > > Probably came from someone who is very OO happy. Not everyone comes to > > Python ready to embrace its procedural or slightly functional facets. > > Yes, it seems so to me too. > > > So the problem of changing fundamentally how the code works, even for a > > cleanup, is that it will break someone's code out there because they > > depended on the module's crazy way of doing things. Now if they are > cheating > > and looking at things that are meant to be hidden you might be able to > clean > > things up, but if the semantics are exposed to the user, then there is > not > > much we can do w/o breaking someone's code. > > The problem is that the semantics as documented are really ambiguous, > and what I would consider the reasonable interpretation is different > from what the code actually does. So anyone using this code naively is > going to run into trouble, and anyone relying on how the code actually > works is going behind the back of the docs, but they sort of have to > in order to use much of the functionality of the module. I agree this > puts us in a tricky spot. > Well, perhaps the docs can be updated to match the code where cleanup would change the semantics. > > > Honestly, if the code is as bad as it seems -- including its API --, the > > best bet would be to come up with a new module for handling MIME types > from > > scratch, put it up on the Cheeseshop/PyPI, and get the community behind > it. > > If the community picks it up as the de-facto replacement for mimetypes > and > > the code has settled we can then talk about adding it to the standard > > library and begin deprecating mimetypes. > > And thanks for willing to volunteer to fix this. > > Okay. Well I'd still like to hear a bit about what people really need > before trying to make a new API. I'm not such an experienced API > designer, and I haven?t really plumbed the depths of mimetypes use > cases (though it seems to me like quite a simple module of not more > than 100 lines of code or so would suffice). I'm sure you can get help from the community with any of this. > At the very least, I > think some changes can be made to this code without altering its basic > function, which would clean up the actual mime types it returns, > comment the exceptions to Apache and explain why they're there, and > make the code flow understandable to someone reading the code. That all sounds reasonable. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacobolus at gmail.com Sat Aug 1 01:07:34 2009 From: jacobolus at gmail.com (Jacob Rus) Date: Fri, 31 Jul 2009 16:07:34 -0700 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: References: Message-ID: Brett Cannon wrote: >>>> ?* It creates a _default_mime_types() function which declares a >>>> ? ?bunch of global variables, and then immediately calls >>>> ? ?_default_mime_types() below the definition. There is literally >>>> ? ?no difference in result between this and just putting those >>>> ? ?variables at the top level of the file, so I have no idea why >>>> ? ?this function exists, except to make the code more confusing. >>> >>> It could potentially be used for testing, but that's a guess. >> >> Here's an abridged version of this function. I don?t think there?s any >> reason for this that I can see. >> >> ? ?def _default_mime_types(): >> ? ? ? ?global suffix_map >> ? ? ? ?global encodings_map >> ? ? ? ?global types_map >> ? ? ? ?global common_types >> >> ? ? ? ?suffix_map = { >> ? ? ? ? ? ?'.tgz': '.tar.gz', #... >> ? ? ? ? ? ?} >> >> ? ? ? ?encodings_map = { >> ? ? ? ? ? ?'.gz': 'gzip', #... >> ? ? ? ? ? ?} >> >> ? ? ? ?types_map = { >> ? ? ? ? ? ?'.a' ? ? ?: 'application/octet-stream', #... >> ? ? ? ? ? ?} >> >> ? ? ? ?common_types = { >> ? ? ? ? ? ?'.jpg' : 'image/jpg', #... >> ? ? ? ? ? ?} >> >> ? ?_default_mime_types() > > As R. David pointed out, it is being used by regrtest to clean up after > running the test suite. Yeah, basically the issue is that the default mime types should be separate objects from the final set after apache's files have been parsed and custom additions have been made. If these ones at the top level are renamed and not modified after creation, if new objects with all the updated stuff is put at these names, and if the test code is changed to instead reset the ones at these names based on the default objects, I think that will maybe fix things. I'll try to write some potential patches in the next day or two and submit them here for advice. >> The problem is that the semantics as documented are really ambiguous, >> and what I would consider the reasonable interpretation is different >> from what the code actually does. So anyone using this code naively is >> going to run into trouble, and anyone relying on how the code actually >> works is going behind the back of the docs, but they sort of have to >> in order to use much of the functionality of the module. I agree this >> puts us in a tricky spot. > > Well, perhaps the docs can be updated to match the code where cleanup would > change the semantics. I think that would make the docs extremely confusing, and I?m not even sure it would be possible. The current semantics are vaguely okay if an API consumer sticks to straight-forward use cases, such as any which don?t break when the current docs are followed (anything complicated is going to break unless the code is read a few times), and assuming such uses it would be possible to swap out most of the implementation for something relatively straight-forward. But if any of the edges are pushed, the semantics quickly turn insane, to the point I?m not sure they?re document-able. Anyone expecting the code to work that way is going to have a buggy program anyway, so I?m not sure it makes sense to bend over backwards leaving the particular set of bugs unchanged. Cheers, Jacob Rus From jacobolus at gmail.com Sat Aug 1 00:38:32 2009 From: jacobolus at gmail.com (Jacob Rus) Date: Fri, 31 Jul 2009 15:38:32 -0700 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: References: Message-ID: Brett Cannon wrote: > Jacob Rus wrote: >> ?* It defines __all__: I didn?t even realize __all__ could be used >> ? ?for single-file modules (w/o submodules), but it definitely >> ? ?shouldn?t be here. > > __all__ is used to control what a module exports when used in an import *, > nothing more. Thus it's use in a module compared to a package is completely > legitimate. > >> This specific __all__ oddly does not include >> ? ?all of the documented variables and functions in the mimetypes >> ? ?class. It?s not clear why someone calling import * here wouldn?t >> ? ?want the bits not included. > > If something is documented by not listed in __all__ that is a bug. In this case, everything in the module is documented, including parts that should be private, but only a small number are in __all__. My recommendation would be to make those private parts be _ variables and remove them from the docs (using them has no legitimate use cases I can see), and rip out __all__. >> ?* It creates a _default_mime_types() function which declares a >> ? ?bunch of global variables, and then immediately calls >> ? ?_default_mime_types() below the definition. There is literally >> ? ?no difference in result between this and just putting those >> ? ?variables at the top level of the file, so I have no idea why >> ? ?this function exists, except to make the code more confusing. > > It could potentially be used for testing, but that's a guess. Here's an abridged version of this function. I don?t think there?s any reason for this that I can see. def _default_mime_types(): global suffix_map global encodings_map global types_map global common_types suffix_map = { '.tgz': '.tar.gz', #... } encodings_map = { '.gz': 'gzip', #... } types_map = { '.a' : 'application/octet-stream', #... } common_types = { '.jpg' : 'image/jpg', #... } _default_mime_types() > Probably came from someone who is very OO happy. Not everyone comes to > Python ready to embrace its procedural or slightly functional facets. Yes, it seems so to me too. > So the problem of changing fundamentally how the code works, even for a > cleanup, is that it will break someone's code out there because they > depended on the module's crazy way of doing things. Now if they are cheating > and looking at things that are meant to be hidden you might be able to clean > things up, but if the semantics are exposed to the user, then there is not > much we can do w/o breaking someone's code. The problem is that the semantics as documented are really ambiguous, and what I would consider the reasonable interpretation is different from what the code actually does. So anyone using this code naively is going to run into trouble, and anyone relying on how the code actually works is going behind the back of the docs, but they sort of have to in order to use much of the functionality of the module. I agree this puts us in a tricky spot. > Honestly, if the code is as bad as it seems -- including its API --, the > best bet would be to come up with a new module for handling MIME types from > scratch, put it up on the Cheeseshop/PyPI, and get the community behind it. > If the community picks it up as the de-facto replacement for mimetypes and > the code has settled we can then talk about adding it to the standard > library and begin deprecating mimetypes. > And thanks for willing to volunteer to fix this. Okay. Well I'd still like to hear a bit about what people really need before trying to make a new API. I'm not such an experienced API designer, and I haven?t really plumbed the depths of mimetypes use cases (though it seems to me like quite a simple module of not more than 100 lines of code or so would suffice). At the very least, I think some changes can be made to this code without altering its basic function, which would clean up the actual mime types it returns, comment the exceptions to Apache and explain why they're there, and make the code flow understandable to someone reading the code. Cheers, Jacob Rus From jacobolus at gmail.com Sat Aug 1 04:53:02 2009 From: jacobolus at gmail.com (Jacob Rus) Date: Sat, 1 Aug 2009 02:53:02 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?standard_library_mimetypes_module=09pathol?= =?utf-8?q?ogically_broken=3F?= References: <20090731220110.GB11156@mcnabbs.org> Message-ID: Andrew McNabb wrote: > Jacob Rus wrote: >> * The operation is crazy: It defines a MimeTypes class which >> actually stores the type mappings, but this class is designed to >> be a singleton. The way that such a design is enforced is >> through the use of the module-global 'init' function, which >> makes an instance of the class, and then maps all of the >> functions in the module global namespace to instance methods. >> But confusingly, all such functions are also defined >> independently of the init function, with definitions such as: >> >> def guess_type(url, strict=True): >> if not inited: >> init() >> return guess_type(url, strict) > > I can't speak for any of your other complaints, but I know that this > weird init stuff is fixed in trunk. Actually, this fix changes the semantics of the code quite substantially (not in any way that is incompatible with the extremely vague documentation, but in a way that might break any code that relies on the Python <=2.6 behavior). If such a change is okay, then we can do quite a bit of implementation change under these new semantics. Cheers, Jacob Rus From tjreedy at udel.edu Sat Aug 1 05:03:27 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 31 Jul 2009 23:03:27 -0400 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: References: Message-ID: Jacob Rus wrote: > Okay. Well I'd still like to hear a bit about what people really need > before trying to make a new API. Try asking some specific question on python-list. "How to you use the stdlib mimetypes module?" From jafo at tummy.com Sun Aug 2 02:00:51 2009 From: jafo at tummy.com (Sean Reifschneider) Date: Sat, 1 Aug 2009 18:00:51 -0600 Subject: [Python-Dev] REVIEW: PyArg_ParseTuple with "s" format and NUL: Bogus TypeError detail string. In-Reply-To: <20090724005734.GA20019@tummy.com> References: <20090724005734.GA20019@tummy.com> Message-ID: <20090802000051.GA16510@tummy.com> "make test" in both python and py3k trunks were happy with this change, so I've documented the issue in Issue #6624 and committed it in 74277 (2.x) and 74278 (3.x). Sean -- "The only thing more expensive than hiring a professional is hiring an amateur." -- Red Adair, Oil Well Fire-Fighter Sean Reifschneider, Member of Technical Staff tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability From vincent.legoll at gmail.com Sun Aug 2 00:40:06 2009 From: vincent.legoll at gmail.com (Vincent Legoll) Date: Sun, 2 Aug 2009 00:40:06 +0200 Subject: [Python-Dev] pylinting the stdlib Message-ID: <4727185d0908011540m218434dao9f6c9e71d001007@mail.gmail.com> Hello, I've fed parts of the stdlib to pylint and after some filtering there appears to be some things that looks strange, I've filled a few bugs to the tracker for them. 6623 Lib/ftplib.py netrc class parsing problem 6622 [RFC] wrong variable used in Lib/poplib.py 6621 [RFC] Remove leftover use of Carbon module from Lib/binhex.py 6620 Variable may be used before first being assigned to in Lib/locale.py 6619 Remove duplicated function in Lib/inspect.py Is this useless and taking reviewer's time for nothing ? Please advise, if this is deemed useful, I'll continue further -- Vincent Legoll From jacobolus at gmail.com Sun Aug 2 06:03:12 2009 From: jacobolus at gmail.com (Jacob Rus) Date: Sat, 1 Aug 2009 21:03:12 -0700 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: References: Message-ID: Brett Cannon wrote: > Jacob Rus wrote: >> At the very least, I >> think some changes can be made to this code without altering its basic >> function, which would clean up the actual mime types it returns, >> comment the exceptions to Apache and explain why they're there, and >> make the code flow understandable to someone reading the code. > > That all sounds reasonable. Okay, as a start, I did a simple code cleanup that I think fixes some potential bugs (any code using its own instance of the MimeTypes class should now be insulated from other same-process users of the module), chops out 80 or 90 lines, removes some redundant code paths, clarifies some of the micro level behavior of some chunks of code, adds a bit more to the docstring at the top of the file, and makes the program flow somewhat clearer ? *without* changing the semantics of the module or its included list of MIME types. Here's a diff: http://pastie.textmate.org/568329 And here's the whole file: http://pastie.textmate.org/568333 This change does require any tests that previously called _default_mime_types() to instead call init(). Any thoughts? Jacob Rus From jacobolus at gmail.com Sun Aug 2 06:58:38 2009 From: jacobolus at gmail.com (Jacob Rus) Date: Sat, 1 Aug 2009 21:58:38 -0700 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: References: Message-ID: Jacob Rus wrote: > Here's a diff: > http://pastie.textmate.org/568329 > > And here's the whole file: > http://pastie.textmate.org/568333 Slightly better: http://pastie.textmate.org/568354 http://pastie.textmate.org/568355 From jacobolus at gmail.com Sun Aug 2 08:37:18 2009 From: jacobolus at gmail.com (Jacob Rus) Date: Sat, 1 Aug 2009 23:37:18 -0700 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: References: Message-ID: Jacob Rus wrote: > Brett Cannon wrote: >> Jacob Rus wrote: >>> At the very least, I >>> think some changes can be made to this code without altering its basic >>> function, which would clean up the actual mime types it returns, >>> comment the exceptions to Apache and explain why they're there, and >>> make the code flow understandable to someone reading the code. >> >> That all sounds reasonable. > > Okay, as a start, I did a simple code cleanup that I think fixes some > potential bugs (any code using its own instance of the MimeTypes class > should now be insulated from other same-process users of the module), > chops out 80 or 90 lines, removes some redundant code paths, clarifies > some of the micro level behavior of some chunks of code, adds a bit > more to the docstring at the top of the file, and makes the program > flow somewhat clearer ? *without* changing the semantics of the module > or its included list of MIME types. Here is a somewhat more substantively changed version. This one does away with the 'inited' flag and the 'init' function, which might be impossible given that their documented (though I would be extremely surprised if anyone calls them in third-party code), and makes the behavior of the code much clearer, I think, by making it very obvious how the singleton instance is actually working. Additionally, this version brings the lazy loading of Apache mime.types files to every MimeTypes instance, and makes the read_mime_types() function behave as expected (only getting the mapping from an apache mime.types file rather than including some extra types as the current code does). In this version, tests would want to call the _init_singleton() function to reset to defaults. http://pastie.textmate.org/568399 http://pastie.textmate.org/568400 To reiterate: this should still behave identically to the current module in all reasonable conditions. I still haven't made any changes to the set of MIME types included in the file, or the behavior of the module. Some such changes should be made as well, but the changes so far should be relatively uncontroversial, I hope. Cheers, Jacob Rus From fuzzyman at voidspace.org.uk Sun Aug 2 12:53:09 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 02 Aug 2009 11:53:09 +0100 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: References: Message-ID: <4A757015.6010100@voidspace.org.uk> Jacob Rus wrote: > Jacob Rus wrote: > >> Brett Cannon wrote: >> >>> Jacob Rus wrote: >>> >>>> At the very least, I >>>> think some changes can be made to this code without altering its basic >>>> function, which would clean up the actual mime types it returns, >>>> comment the exceptions to Apache and explain why they're there, and >>>> make the code flow understandable to someone reading the code. >>>> >>> That all sounds reasonable. >>> >> Okay, as a start, I did a simple code cleanup that I think fixes some >> potential bugs (any code using its own instance of the MimeTypes class >> should now be insulated from other same-process users of the module), >> chops out 80 or 90 lines, removes some redundant code paths, clarifies >> some of the micro level behavior of some chunks of code, adds a bit >> more to the docstring at the top of the file, and makes the program >> flow somewhat clearer ? *without* changing the semantics of the module >> or its included list of MIME types. >> > > Here is a somewhat more substantively changed version. This one does > away with the 'inited' flag and the 'init' function, which might be > impossible given that their documented (though I would be extremely > surprised if anyone calls them in third-party code), and makes the > behavior of the code much clearer, I think, by making it very obvious > how the singleton instance is actually working. > > Additionally, this version brings the lazy loading of Apache > mime.types files to every MimeTypes instance, and makes the > read_mime_types() function behave as expected (only getting the > mapping from an apache mime.types file rather than including some > extra types as the current code does). > > In this version, tests would want to call the _init_singleton() > function to reset to defaults. > > http://pastie.textmate.org/568399 > http://pastie.textmate.org/568400 > > To reiterate: this should still behave identically to the current > module in all reasonable conditions. I still haven't made any changes > to the set of MIME types included in the file, or the behavior of the > module. Some such changes should be made as well, but the changes so > far should be relatively uncontroversial, I hope. > Please post the patches to the Python bug tracker: http://bugs.python.org Thanks Michael Foord > Cheers, > Jacob Rus > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From p.f.moore at gmail.com Sun Aug 2 13:45:49 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 2 Aug 2009 12:45:49 +0100 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: <4A757015.6010100@voidspace.org.uk> References: <4A757015.6010100@voidspace.org.uk> Message-ID: <79990c6b0908020445q5d4bcc82vf93d2af4493554a9@mail.gmail.com> 2009/8/2 Michael Foord : [...] >> In this version, tests would want to call the _init_singleton() >> function to reset to defaults. [...] > Please post the patches to the Python bug tracker: > > ? http://bugs.python.org > > Thanks The patch you post should also patch the test suite to use your replacement initialisation function where needed (if you didn't already do that). Paul. From stargaming at gmail.com Sun Aug 2 13:14:05 2009 From: stargaming at gmail.com (Robert Lehmann) Date: Sun, 2 Aug 2009 11:14:05 +0000 (UTC) Subject: [Python-Dev] standard library mimetypes module pathologically broken? References: Message-ID: On Sat, 01 Aug 2009 23:37:18 -0700, Jacob Rus wrote: > Here is a somewhat more substantively changed version. This one does > away with the 'inited' flag and the 'init' function, which might be > impossible given that their documented (though I would be extremely > surprised if anyone calls them in third-party code) [snip] There seem to be quite a bunch of high-profile third-party modules relying on this interface, eg. Zope, Plone, TurboGears, and CherryPy. See http://www.google.com/codesearch?q=mimetypes.init+lang%3Apython for a more thorough listing. Given that most of them aren't ported to Python 3 yet, I guess, changing the semantics in 3.x seems not-too-bad to me. HTH, -- Robert "Stargaming" Lehmann From python at mrabarnett.plus.com Sun Aug 2 17:54:22 2009 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 02 Aug 2009 16:54:22 +0100 Subject: [Python-Dev] [regex] memory leak In-Reply-To: <4A7512FD.6030505@lexicon.net> References: <4A7512FD.6030505@lexicon.net> Message-ID: <4A75B6AE.4070003@mrabarnett.plus.com> John Machin wrote: > Hi Matthew, > > Your post in c.l.py about your re rewrite didn't mention where to report > bugs etc so I dug this address out of Google Groups ... > > Environment: Python 2.6.2, Windows XP SP3, your latest (29 July) regex > from the Python bugtracker. > > Problem is repeated calls of e.g. compiled_pattern.search(some_text) -- > Task Manager performance panel shows increasing memory usage with regex > but not with re. It appears to be cumulative i.e. changing to another > pattern or text doesn't release memory. > > Example: > > 8<-- regex_timer.py > import sys > import time > if sys.platform == 'win32': > timer = time.clock > else: > timer = time.time > module = __import__(sys.argv[1]) > count = int(sys.argv[2]) > pattern = sys.argv[3] > expected = sys.argv[4] > text = 80 * '~' + 'qwerty' > rx = module.compile(pattern) > t0 = timer() > for i in xrange(count): > assert rx.search(text).group(0) == expected > t1 = timer() > print "%d iterations in %.6f seconds" % (count, t1 - t0) > 8<--- > > Here are the results of running this (plus observed difference between > peak memory usage and base memory usage): > > dos-prompt>\python26\python regex_timer.py regex 1000000 "~" "~" > 1000000 iterations in 3.811500 seconds [60 Mb] > > dos-prompt>\python26\python regex_timer.py regex 2000000 "~" "~" > 2000000 iterations in 7.581335 seconds [128 Mb] > > dos-prompt>\python26\python regex_timer.py re 2000000 "~" "~" > 2000000 iterations in 2.549738 seconds [3 Mb] > > This happens on a variety of patterns: "w", "wert", "[a-z]+", "[a-z]+t", > ... > Thanks for that, John. I've should've kept an eye on the Task Manager! :-) Now fixed. It's surprising how much time and effort is needed just to manage the memory! From dickinsm at gmail.com Sun Aug 2 18:20:36 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 2 Aug 2009 17:20:36 +0100 Subject: [Python-Dev] pylinting the stdlib In-Reply-To: <4727185d0908011540m218434dao9f6c9e71d001007@mail.gmail.com> References: <4727185d0908011540m218434dao9f6c9e71d001007@mail.gmail.com> Message-ID: <5c6f2a5d0908020920n647d97b0n5cc4d4a00ec17cca@mail.gmail.com> On Sat, Aug 1, 2009 at 11:40 PM, Vincent Legoll wrote: > Hello, > > I've fed parts of the stdlib to pylint and after some filtering > there appears to be some things that looks strange, I've > filled a few bugs to the tracker for them. > > > > Is this useless and taking reviewer's time for nothing ? > > Please advise, if this is deemed useful, I'll continue further I think this is valuable work---please do continue! Just out of interest, how many false positives did you have to filter out in finding the 5 cases above? Mark From jimjjewett at gmail.com Sun Aug 2 19:47:30 2009 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 2 Aug 2009 13:47:30 -0400 Subject: [Python-Dev] standard library mimetypes module pathologically broken? Message-ID: [It may be worth creating a patch; I think most of these comments would be better on the bug-tracker.] (1) In a few cases, it looked like you were changing parameter names between "files" and "filenames". This might break code that was calling it with keyword arguments -- as I typically would for this type of function. (1a) If you are going to change the .sig, you might as well do it right, and make the default be "knownfiles" rather than the empty tuple. (2) The comment about why inited was set true at the beginning of the function instead of the end should probably be kept, or at least reworded. (3) Default values: (3a) Why the list of known files going back to Apache 1.2, in that order? Is there any risk in using too *new* of a MimeTypes file? I would assume that the goal is to pick up whatever changes the user has made locally, but in that case, it still makes sense to have the newest file be the last one read, in case Apache has made bugfixes. (3b) Also, this would improve cross-platform consistency; if I read that correctly, the Apache files will override the python defaults on unix or a mac, but not on windows. That will change the results on the majority of items in _common_types. (application vs text, whether to put an x- in front of the word pict.) (3c) rtf is listed in non-standard, but http://www.iana.org/assignments/media-types/ does define it. (Though whether to guess application vs text is not defined, and python chooses differently from apache.) (3d) jpg is listed as non-standard. It turns out that this is just for the inverse mapping, where image/jpg is non-standard (for image/jpeg) but that is worth a comment. (see #5) (3e) In _types_map, the lines marked duplicates are duplicate keys, not duplicate values; it would be more clear to also comment out the (first) line itself, instead of just marking it a duplicate. (Or better yet, to mention that it is just being added for the inverse mapping, if that is the case.) (4) Why bother to lazyinit? Is there any sane usecase for a MimeTypes that hasn't been inited? I see value in not reading the default files, but none in not reading at least the files that were asked for. I could see value in only partial initialization if there were several long steps, but right now, initialization is all-or-nothing. If the thing is useless without an init, then it makes sense to just get done it immediately and skip the later checks; anyone who could have actually saved time should just remove the import. -jJ From jacobolus at gmail.com Sun Aug 2 20:56:29 2009 From: jacobolus at gmail.com (Jacob Rus) Date: Sun, 2 Aug 2009 11:56:29 -0700 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: References: Message-ID: Jim Jewett wrote: > [It may be worth creating a patch; I think most of these comments > would be better on the bug-tracker.] I'm going to do that shortly. > (1) ?In a few cases, it looked like you were changing parameter names > between "files" and "filenames". ?This might break code that was > calling it with keyword arguments -- as I typically would for this > type of function. Sorry, that was a mistake. > (1a) ?If you are going to change the .sig, you might as well do it > right, and make the default be "knownfiles" rather than the empty > tuple. Seems reasonable. > (2) ?The comment about why inited was set true at the beginning of the > function instead of the end should probably be kept, or at least > reworded. > > (3) Default values: > > (3a) Why the list of known files going back to Apache 1.2, in that > order? ?Is there any risk in using too *new* of a MimeTypes file? > > I would assume that the goal is to pick up whatever changes the user > has made locally, but in that case, it still makes sense to have the > newest file be the last one read, in case Apache has made bugfixes. I did not change this in my patch, but I completely agree. Indeed, I think it makes more sense to grab the newest Apache mime.types and just include them with the standard library, either as an in-code python object, or as a mime.types file to be parsed. > (3b) ?Also, this would improve cross-platform consistency; if I read > that correctly, the Apache files will override the python defaults on > unix or a mac, but not on windows. ?That will change the results on > the majority of items in _common_types. ?(application vs text, whether > to put an x- in front of the word pict.) Quite possibly true. It actually seems > (3c) ?rtf is listed in non-standard, but > http://www.iana.org/assignments/media-types/ does define it. ?(Though > whether to guess application vs text is not defined, and python > chooses differently from apache.) > > (3d) ?jpg is listed as non-standard. ?It turns out that this is just > for the inverse mapping, where image/jpg is non-standard (for > image/jpeg) but that is worth a comment. ?(see #5) > > (3e) ?In _types_map, the lines marked duplicates are duplicate keys, > not duplicate values; it would be more clear to also comment out the > (first) line itself, instead of just marking it a duplicate. ?(Or > better yet, to mention that it is just being added for the inverse > mapping, if that is the case.) I completely agree that this whole section should be considered carefully. Just any changes might have more impact on backwards compatibility than the code flow changes I made, so I thought they could be in a separate patch. > (4) ?Why bother to lazyinit? ? ?Is there any sane usecase for a > MimeTypes that hasn't been inited? Only because the original was written that way, back in 1997 or whatever. I don't think there's necessarily any need for it these days: reading the default files even should be blazingly fast, unless the disk is otherwise thrashing: each is about a a 37k file, and there are at most going to be 3 or 4 of them installed on one machine for different versions of Apache. Cheers, Jacob Rus From jacobolus at gmail.com Sun Aug 2 22:17:45 2009 From: jacobolus at gmail.com (Jacob Rus) Date: Sun, 2 Aug 2009 13:17:45 -0700 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: <79990c6b0908020445q5d4bcc82vf93d2af4493554a9@mail.gmail.com> References: <4A757015.6010100@voidspace.org.uk> <79990c6b0908020445q5d4bcc82vf93d2af4493554a9@mail.gmail.com> Message-ID: Robert Lehmann wrote: > Jacob Rus wrote: >> Here is a somewhat more substantively changed version. This one does >> away with the 'inited' flag and the 'init' function, which might be >> impossible given that their documented (though I would be extremely >> surprised if anyone calls them in third-party code) > [snip] > > There seem to be quite a bunch of high-profile third-party modules > relying on this interface, eg. Zope, Plone, TurboGears, and CherryPy. See > http://www.google.com/codesearch?q=mimetypes.init+lang%3Apython for a > more thorough listing. > > Given that most of them aren't ported to Python 3 yet, I guess, changing > the semantics in 3.x seems not-too-bad to me. Ooh, okay. Well I guess we can?t get rid of those then! Michael Foord wrote: > Please post the patches to the Python bug tracker: I made a new issue on the bug tracker, , and added a new patch which should hopefully be fairly reasonable. I still haven't addressed the issue of which MIME types should be included by default, and how precisely the logic should work for setting those up. But again, hopefully this at least makes it clear what the code is trying to do, so that it's relatively readable for someone trying to use the module. (For instance, so they'll be warned off of using init() and breaking each-other's code) Paul Moore wrote: > The patch you post should also patch the test suite to use your > replacement initialisation function where needed (if you didn't > already do that). Done. The tests still pass, though to be honest this test suite isn't really testing any edge cases. Cheers, Jacob Rus From glyph at twistedmatrix.com Mon Aug 3 00:36:27 2009 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Sun, 2 Aug 2009 18:36:27 -0400 Subject: [Python-Dev] standard library mimetypes module pathologically broken? In-Reply-To: References: <4A757015.6010100@voidspace.org.uk> <79990c6b0908020445q5d4bcc82vf93d2af4493554a9@mail.gmail.com> Message-ID: On Sun, Aug 2, 2009 at 4:17 PM, Jacob Rus wrote: > Robert Lehmann wrote: > > Jacob Rus wrote: > >> Here is a somewhat more substantively changed version. This one does > >> away with the 'inited' flag and the 'init' function, which might be > >> impossible given that their documented (though I would be extremely > >> surprised if anyone calls them in third-party code) > > [snip] > > > > There seem to be quite a bunch of high-profile third-party modules > > relying on this interface, eg. Zope, Plone, TurboGears, and CherryPy. See > > http://www.google.com/codesearch?q=mimetypes.init+lang%3Apython for a > > more thorough listing. > > > > Given that most of them aren't ported to Python 3 yet, I guess, changing > > the semantics in 3.x seems not-too-bad to me. > No, it's bad. If I may quote Guido: http://www.artima.com/weblogs/viewpost.jsp?thread=227041 So, once more for emphasis: *Don't change your APIs at the same time as > porting to Py3k!* > Please follow this policy as much as possible in the standard library; the language transition is going to be hard enough. Put a different way: please don't change the library unless you're *also*going to write a 2to3 fixer that somehow updates all calling code, too. Ooh, okay. Well I guess we can?t get rid of those then! > Indeed not. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dirkjan at ochtman.nl Mon Aug 3 13:53:06 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 3 Aug 2009 13:53:06 +0200 Subject: [Python-Dev] PEP 385: updating the PEP Message-ID: The diff below should reflect changes from the discussion we had last time. Please review. (Some comments may be more appropriate for the other threads I just kicked off.) Cheers, Dirkjan Index: pep-0385.txt =================================================================== --- pep-0385.txt (revision 74294) +++ pep-0385.txt (revision 74296) @@ -59,27 +59,25 @@ often has somewhat unintuitive results for people (though this has been getting better in recent versions of Mercurial). -I'm still a bit on the fence about whether Python should adopt cloned -branches and named branches. Since it usually makes more sense to tag releases -on the maintenance branch, for example, mainline history would not contain -release tags if we used cloned branches. Also, Mercurial 1.2 and 1.3 have the -necessary tools to make named branches less painful (because they can be -properly closed and closed heads are no longer considered in relevant cases). +The current proposal is to use named branches for release branches and adopt +cloned branches for feature branches, with one exception to this rule: the 3.x +branches will be kept in separate clones from the 2.x branches. I think this +provides an optimal hybrid approach for Python's uses of branching. -A disadvantage might be that the used clones will be a good bit larger (since -they essentially contain all other branches as well). This can me mitigated by -keeping non-release (feature) branches in separate clones. Also note that it's -still possible to clone a single named branch from a combined clone, by -specifying the branch as in hg clone http://hg.python.org/main/#2.6-maint. -Keeping the py3k history in a separate clone problably also makes sense. +Differences between named branches and cloned branches: -XXX To do: size comparison for selected separation scenarios. +* Tags in a different (maintenance) clone aren't available in the local clone +* Clones with named branches will be larger, since they contain more data +(The Mercurial book discourages the use of named branches, but it is, in this +respect, somewhat outdated. Named branches have gotten much easier to use +since that comment was written, due to improvements in hg.) + Converting branches ------------------- There are quite a lot of branches in SVN's branches directory. I propose to -clean this up a bit, by employing the following the strategy: +clean this up a bit, by following this basic strategy: * Keep all release (maintenance) branches * Discard branches that haven't been touched in 18 months, unless somone @@ -87,6 +85,21 @@ * Keep branches that have been touched in the last 18 months, unless someone indicates the branch can be deprecated +There's a `branch map`_ available that shows info about each branch: + +* keep-clone means we'll keep that branch in a separate clone +* keep-named means we'll keep that branch as a named branch in one of the clones +* strip means we won't keep that branch +* streamed-merge means that it got merged by committing several new revisions + to the other branch +* merged-r* means the branch got merged in the named revision +* merges? means I haven't checked/found out yet whether that branch was ever + merged +* ? means that your input would be even more helpful than for the other items +* some items have no action yet, feel free to treat that as just '?' + +.. _branch map: http://hg.python.org/pymigr/file/tip/all-branches.txt + Converting tags --------------- @@ -95,8 +108,8 @@ we should keep all release tags, and consider other tags for inclusion based on requests from the developer community. I'd like to consider unifying the release tag naming scheme to make some things more consistent, if people feel -that won't create too many problems. For example, Mercurial itself just uses -'1.2.1' as a tag, where CPython would currently use r121. +that won't create too many problems. The current proposal is to bring old +release tags in line with the current practice of release tag naming. Author map ---------- @@ -119,17 +132,19 @@ possible forms of pattern matching. The current Python repository already includes a rudimentary .hgignore file to help with using the hg mirrors. -It might be useful to have the .hgignore be generated automatically from -svn:ignore properties. This would make sure all historic revisions also have -useful ignore information (though one could argue ignoring isn't really -relevant to just checking out an old revision). +Since the current Python repository already includes a .hgignore file (for use +with hg mirrors), we'll just use that. Generating full history of the file +was debated but deemed impractical (because it's relatively hard with fairly +little gain, since ignoring is less important for older revisions). Revlog reordering ----------------- -As an optional optimization technique, we should consider trying a reordering -pass on the revlogs (internal Mercurial files) resulting from the conversion. -In some cases this results in dramatic decreases in on-disk repository size. +As an optional optimization technique, I have performed a reordering pass on +the revlogs (internal Mercurial files) resulting from the conversion. In some +cases this results in dramatic decreases in on-disk repository size. This +especially makes sense for the manifest (where it really helps out quite a lot) +and oft-edited files like NEWS.txt (with an admittedly smaller effect). Other repositories ------------------ @@ -138,7 +153,14 @@ converted. What other projects in the svn.python.org repository should be converted? Do we want to convert the peps repository? distutils? others? +There's now an initial stab at converting the Jython repository. The current +tip of hgsubversion unfortunately fails at some point. Pending investigation. +Other repositories that would like to converted to Mercurial can announce +themselves to me after the main Python migration is done, and I'll take care +of their needs. + + Infrastructure ============== @@ -165,18 +187,34 @@ lines. Open issue: do we check only the tip after each push, or do we check every commit in a changegroup? -* commit mails: we can leverage the notify extension for this +* commit mails: we can leverage the notify extension for this. Emails will + include diffs for each changeset committed against the repository. * buildbots: both the regular and the community build masters must be notified. Fortunately buildbot includes support for hg. I've also implemented this for Mercurial itself, so I don't expect problems here. * check contributors: in the current setup, all changesets bear the username of - committers, who must have signed the contributor agreement. In a DVCS, the - committers are not necessarily the same people who push, and so we can't - check if the committer is a contributor. We could use a hook to check if the - committer is a contributor if we keep a list of registered contributors. + committers, who must have signed the contributor agreement. We might want to + use a hook to check if the committer is a contributor if we keep a list of + registered contributors. Then, the hook might warn users that push a group + of revisions containing changesets from unknown contributors. +End-of-line conversions +----------------------- + +There has been some discussion about the lack of end-of-line conversion support +in Mercurial. While Mercurial comes with a win32text extension that provides +some basic support for converting end-of-line data on a file-name pattern +basis, the lack of exclusion (for specifying broad rules with exceptions) and +the use of hgrc files (which can't be versioned) make it less than ideal. + +I think the primary line of defense for prevention of inappropriate newlines +should be hooks on the server side which basically turn down any changegroup +or changeset introducing such data. The use of the win32text extension (which +can hopefully be improved/extended to support the usage scenarios mentioned +above) and/or a commit-time hook could be the first line of defense. + hgwebdir -------- @@ -185,7 +223,16 @@ build a quick extension to augment the URL rev parser so that it can also take r[0-9]+ args and come up with the matching hg revision. +roundup +------- +We'll come up with an auto-linking plugin for roundup, which can match a +changeset identifier (possibly with a branch prefix), and link it to the +appropriate revision in the hgwebdir instance. Second, the script above (in +the hgwebdir section) will make sure that old links to revision should continue +to work (by pointing to the hg changeset that reflects the svn revision). + + After migration =============== @@ -222,37 +269,32 @@ .. _wiki: http://www.selenic.com/mercurial/wiki/ .. _parts of the developer FAQ: http://www.python.org/dev/faq/#version-control -Think first, commit later? --------------------------- +Proposed workflow +----------------- -In recent history, old versions of Python have been maintained by a select -group of people backporting patches from trunk to release branches. While -this may not scale so well as the development pace grows, it also runs into -some problems with the current crop of distributed versioning tools. These -tools (I believe similar problems would exist for either git, bzr, or hg, -though some may cope better than others) are based on the idea of a Directed -Acyclic Graph (or DAG), meaning they keep track of relations of changesets. +I propose two workflows for the migration of patches between several branches. -Mercurial itself has a stable branch which is a ''strict'' subset of the -unstable branch. This means that generally all fixes for the stable branch -get committed against the tip of the stable branch, then they get merged into -the unstable branch (which already contains the parent of the new cset). This -provides a largely frictionless environment for moving changes from stable to -unstable branches. Mistakes, where a change that should go on stable goes on -unstable first, do happen, but they're usually easy to fix. That can be done by -copying the change over to the stable branch, then trivial-merging with -unstable -- meaning the merge in fact ignores the parent from the stable -branch). +For migration within 2.x or 3.x branches, I propose a patch always gets +committed to the oldest branch where it applies first. Then, the resulting +changeset can be merged using hg merge to all newer branches within that +series (2.x or 3.x). If it does not apply as-is to the newer branch, hg revert +can be used to easily revert to the new-branch-native head, patch in some +alternative version of the patch (or none, if it's not applicable), then commit +the merge. The premise here is that all changesets from an older branch within +the series are eventually merged to all newer branches within the series. -This strategy means a little more work for regular committers, because they -have to think about whether their change should go on stable or unstable; they -may even have to ask someone else (the RM) before committing. But it also -relieves a dedicated group of committers of regular backporting duty, in -addition to making it easier to work with the tool. +The upshot is that this provides for the most painless merging procedure. The +downside is that in the general case, people have to think about the oldest +branch to which the patch should be applied before actually applying it. -Now would be a good time to consider changing strategies in this regard, -although it would be relatively easy to switch to such a model later on. +For migration between 2.x and 3.x branches (which should all be in the same +direction, though I'm not sure what direction is most appropriate here), +changesets should be transplanted (not merged) in some other way. The +transplant extension, import/export and bundle/unbundle work equally well here. +Choosing this approach allows 3.x not to carry all of the 2.x history-since-it- +was-branched, meaning the clone is not as big and the merges not as complicated. + The future of Subversion ------------------------ @@ -281,7 +323,9 @@ I propose that the revision identifier will be the short version of hg's revision hash, for example 'dd3ebf81af43', augmented with '+' (instead of 'M') if the working directory from which it was built was modified. This mirrors -the output of the hg id command, which is intended for this kind of usage. +the output of the hg id command, which is intended for this kind of usage. The +sys.subversion value will also be renamed to sys.mercurial to reflect the +change in VCS. For the tag/branch identifier, I propose that hg will check for tags on the currently checked out revision, use the tag if there is one ('tip' doesn't From dirkjan at ochtman.nl Mon Aug 3 12:41:31 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 3 Aug 2009 12:41:31 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue Message-ID: So, I've been not-working on this, which I feel bad about. Suffice it to say the day job has required more of my time then usual for the past few weeks. I want to get back into it, so let's start by re-raising this issue, which Mark Hammond conveniently summarized below. > On 4/07/2009 2:03 PM, Mark Hammond wrote: >> On 4/07/2009 12:30 PM, Nick Coghlan wrote: >>> And since Mercurial doesn't even allow us to say "this is a binary file" >>> the way CVS used to I'm currently not seeing any way for that to happen >>> except for win32text to be updated to correctly handle wild cards in >>> combination with negative filters. >> >> I agree with your conclusion. My ruminating on this over the last few >> months leaves me thinking this would involve: >> >> * my older 'accepted but then lost' hg patch to allow an explicit 'none' >> rule for a single file to override wildcards. This was and still is a good idea. It would be very nice if you could un-bitrot it and submit it for inclusion into crew-stable (so that it may land in the next release, which would hopefully be a somewhat near 1.3.2). >> * win32text be enhanced to use a normal versioned file in the root of >> the repo, much like hgingore, where a project can maintain project wide >> rules. I'm thinking that it should take stuff from .hgeols or whatever and apply rules from .hg/hgrc after that, so both may be used (and for backwards compatibility), but it sounds like a good idea in principle. >> * win32text be enhanced such that all python developers, regardless of >> platform, are willing to use this extension, even if the majority of >> files happen to use their native line ending (sauce for the goose is >> sauce for the gander, and all that...) I don't think that is necessary, I will elaborate below. >> * commit hooks be implemented to enforce this - but this should not be >> necessary if the above was implemented and socially enforced. You seem to advocate a two-step approach: enforce line endings through win32text, catch any errors that slipped through in a hook (commit hook is an optional first line of defense, changegroup hooks on the server to protect the rest of the world). I think inverting that approach would be better: have strict hooks on the server to prevent people from pushing inappropriate EOLs, and provide help on configuring win32text as an extra help for developers on Windows who use editors that work better with \r\n. That leaves people to pick their own weapon of choice against propagation of \r\n (e.g. better editor, commit hooks, whatever) while still making sure no inappropriate line endings land in the python.org repositories. It also seems to fit well with the whole consenting adults thing (but that might just be me). On Sun, Jul 19, 2009 at 15:27, Mark Hammond wrote: > Sorry Dirkjan - I just noticed I didn't CC you on this mail originally. I'm > wondering if you have any more thoughts on these EOL issues and if there is > anything I can do to help? Taking up the 'none' filter, first, and .hgeols, secondly, in the win32text extension would be wonderfully helpful, since I don't do much development on Windows and am therefore not that familiar with the extension in the first place. Cheers, Dirkjan From solipsis at pitrou.net Mon Aug 3 18:50:02 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 3 Aug 2009 16:50:02 +0000 (UTC) Subject: [Python-Dev] PEP 385: updating the PEP References: Message-ID: Hello Dirkjan, Dirkjan Ochtman ochtman.nl> writes: > > +As an optional optimization technique, I have performed a reordering pass on > +the revlogs (internal Mercurial files) resulting from the conversion. In some > +cases this results in dramatic decreases in on-disk repository size. Can you give size numbers for the two main repositories (2.x and 3.x)? Thanks for your work, again! I'm glad this is progressing. Regards Antoine. From casey at pandora.com Mon Aug 3 16:47:30 2009 From: casey at pandora.com (Casey Duncan) Date: Mon, 3 Aug 2009 08:47:30 -0600 Subject: [Python-Dev] In late this am Message-ID: Going to the Dr., will be in thereafter. -Casey From casey at pandora.com Mon Aug 3 16:56:11 2009 From: casey at pandora.com (Casey Duncan) Date: Mon, 3 Aug 2009 08:56:11 -0600 Subject: [Python-Dev] In late this am In-Reply-To: References: Message-ID: <44F70F54-134E-4E94-BBA0-7C53EDDA5E7C@pandora.com> Heh, wrong dev list 8^). Sorry for the noise. -Casey On Aug 3, 2009, at 8:47 AM, Casey Duncan wrote: > Going to the Dr., will be in thereafter. > > -Casey From vincent.legoll at gmail.com Mon Aug 3 09:20:25 2009 From: vincent.legoll at gmail.com (Vincent Legoll) Date: Mon, 3 Aug 2009 09:20:25 +0200 Subject: [Python-Dev] pylinting the stdlib In-Reply-To: <5c6f2a5d0908020920n647d97b0n5cc4d4a00ec17cca@mail.gmail.com> References: <4727185d0908011540m218434dao9f6c9e71d001007@mail.gmail.com> <5c6f2a5d0908020920n647d97b0n5cc4d4a00ec17cca@mail.gmail.com> Message-ID: <4727185d0908030020p71b426a0n59baef82afdbf0b5@mail.gmail.com> On Sun, Aug 2, 2009 at 6:20 PM, Mark Dickinson wrote: > On Sat, Aug 1, 2009 at 11:40 PM, Vincent Legoll wrote: >> >> I've fed parts of the stdlib to pylint and after some filtering >> there appears to be some things that looks strange, I've >> filled a few bugs to the tracker for them. >> >> Is this useless and taking reviewer's time for nothing ? >> >> Please advise, if this is deemed useful, I'll continue further > > I think this is valuable work---please do continue! Thanks, I will > Just out of interest, how many false positives did you have > to filter out in finding the 5 cases above? I can't really tell if there was false positives, I just started with the low hanging fruits, the ones I immediately saw as fishy, the remaining I skipped without too much consideration, I think it will take many iterations to do the whole thing. I used a pylint version which is not capable of understanding py3k syntax, so a lot of files were simply skipped. -- Vincent Legoll From eric.pruitt at gmail.com Mon Aug 3 20:42:03 2009 From: eric.pruitt at gmail.com (Eric Pruitt) Date: Mon, 3 Aug 2009 13:42:03 -0500 Subject: [Python-Dev] Functionality in subprocess.Popen.terminate() Message-ID: <171e8a410908031142t52d974f6tea40aaa7f5d8c059@mail.gmail.com> In my GSoC project, I have implemented asnychronous I/O in subprocess.Popen. Since the read/write operations are asynchronous, the program may have already exited by the time one calls the asyncread function I have implemented. While it returns the data just fine, I have come across an issue with the TerminateProcess function in Windows: if the program has already exited, when subprocess.Popen.Terminate calls the Windows built-in "TerminateProcess" function, an "access denied" error will occur. Should I just make it so that this exception is simply ignored or perform some kind of check to see if the process exists beforehand? If the latter, I have been unable to find a way to do so, to my liking at least. The solutions I saw would require code that seems a bit excessive to me. Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Aug 3 23:02:19 2009 From: brett at python.org (Brett Cannon) Date: Mon, 3 Aug 2009 14:02:19 -0700 Subject: [Python-Dev] PEP 385: updating the PEP In-Reply-To: References: Message-ID: On Mon, Aug 3, 2009 at 04:53, Dirkjan Ochtman wrote: > The diff below should reflect changes from the discussion we had last > time. Please review. (Some comments may be more appropriate for the > other threads I just kicked off.) > > Cheers, > > Dirkjan > > Index: pep-0385.txt > =================================================================== > --- pep-0385.txt (revision 74294) > +++ pep-0385.txt (revision 74296) > @@ -59,27 +59,25 @@ > often has somewhat unintuitive results for people (though this has been > getting better in recent versions of Mercurial). > > -I'm still a bit on the fence about whether Python should adopt cloned > -branches and named branches. Since it usually makes more sense to tag > releases > -on the maintenance branch, for example, mainline history would not contain > -release tags if we used cloned branches. Also, Mercurial 1.2 and 1.3 have > the > -necessary tools to make named branches less painful (because they can be > -properly closed and closed heads are no longer considered in relevant > cases). > +The current proposal is to use named branches for release branches and > adopt > +cloned branches for feature branches, with one exception to this rule: the > 3.x > +branches will be kept in separate clones from the 2.x branches. I think > this > +provides an optimal hybrid approach for Python's uses of branching. > Sounds good to me. > > -A disadvantage might be that the used clones will be a good bit larger > (since > -they essentially contain all other branches as well). This can me > mitigated by > -keeping non-release (feature) branches in separate clones. Also note that > it's > -still possible to clone a single named branch from a combined clone, by > -specifying the branch as in hg clone > http://hg.python.org/main/#2.6-maint. > -Keeping the py3k history in a separate clone problably also makes sense. > +Differences between named branches and cloned branches: > > -XXX To do: size comparison for selected separation scenarios. > +* Tags in a different (maintenance) clone aren't available in the local > clone > +* Clones with named branches will be larger, since they contain more data > > +(The Mercurial book discourages the use of named branches, but it is, in > this > +respect, somewhat outdated. Named branches have gotten much easier to use > +since that comment was written, due to improvements in hg.) > + > Converting branches > ------------------- > > There are quite a lot of branches in SVN's branches directory. I propose > to > -clean this up a bit, by employing the following the strategy: > +clean this up a bit, by following this basic strategy: > > * Keep all release (maintenance) branches > * Discard branches that haven't been touched in 18 months, unless somone > @@ -87,6 +85,21 @@ > * Keep branches that have been touched in the last 18 months, unless > someone > indicates the branch can be deprecated > > +There's a `branch map`_ available that shows info about each branch: > + > +* keep-clone means we'll keep that branch in a separate clone > +* keep-named means we'll keep that branch as a named branch in one of > the clones > +* strip means we won't keep that branch > +* streamed-merge means that it got merged by committing several new > revisions > + to the other branch > +* merged-r* means the branch got merged in the named revision > +* merges? means I haven't checked/found out yet whether that branch was > ever > + merged > +* ? means that your input would be even more helpful than for the other > items > +* some items have no action yet, feel free to treat that as just '?' > + > +.. _branch map: http://hg.python.org/pymigr/file/tip/all-branches.txt > + > Converting tags > --------------- > > @@ -95,8 +108,8 @@ > we should keep all release tags, and consider other tags for inclusion > based > on requests from the developer community. I'd like to consider unifying > the > release tag naming scheme to make some things more consistent, if people > feel > -that won't create too many problems. For example, Mercurial itself just > uses > -'1.2.1' as a tag, where CPython would currently use r121. > +that won't create too many problems. The current proposal is to bring old > +release tags in line with the current practice of release tag naming. > > Author map > ---------- > @@ -119,17 +132,19 @@ > possible forms of pattern matching. The current Python repository already > includes a rudimentary .hgignore file to help with using the hg mirrors. > > -It might be useful to have the .hgignore be generated automatically from > -svn:ignore properties. This would make sure all historic revisions also > have > -useful ignore information (though one could argue ignoring isn't really > -relevant to just checking out an old revision). > +Since the current Python repository already includes a .hgignore file (for > use > +with hg mirrors), we'll just use that. Generating full history of the file > +was debated but deemed impractical (because it's relatively hard with > fairly > +little gain, since ignoring is less important for older revisions). > > Revlog reordering > ----------------- > > -As an optional optimization technique, we should consider trying a > reordering > -pass on the revlogs (internal Mercurial files) resulting from the > conversion. > -In some cases this results in dramatic decreases in on-disk repository > size. > +As an optional optimization technique, I have performed a reordering pass > on > +the revlogs (internal Mercurial files) resulting from the conversion. In > some > +cases this results in dramatic decreases in on-disk repository size. This > +especially makes sense for the manifest (where it really helps out quite a > lot) > +and oft-edited files like NEWS.txt (with an admittedly smaller effect). > > Other repositories > ------------------ > @@ -138,7 +153,14 @@ > converted. What other projects in the svn.python.org repository should be > converted? Do we want to convert the peps repository? distutils? others? > > +There's now an initial stab at converting the Jython repository. The > current > +tip of hgsubversion unfortunately fails at some point. Pending > investigation. > > +Other repositories that would like to converted to Mercurial can announce > +themselves to me after the main Python migration is done, and I'll take > care > +of their needs. > + > + > Infrastructure > ============== > > @@ -165,18 +187,34 @@ > lines. Open issue: do we check only the tip after each push, or do we > check > every commit in a changegroup? > > -* commit mails: we can leverage the notify extension for this > +* commit mails: we can leverage the notify extension for this. Emails will > + include diffs for each changeset committed against the repository. > > * buildbots: both the regular and the community build masters must be > notified. > Fortunately buildbot includes support for hg. I've also implemented this > for > Mercurial itself, so I don't expect problems here. > > * check contributors: in the current setup, all changesets bear the > username of > - committers, who must have signed the contributor agreement. In a DVCS, > the > - committers are not necessarily the same people who push, and so we can't > - check if the committer is a contributor. We could use a hook to check if > the > - committer is a contributor if we keep a list of registered contributors. > + committers, who must have signed the contributor agreement. We might > want to > + use a hook to check if the committer is a contributor if we keep a list > of > + registered contributors. Then, the hook might warn users that push a > group > + of revisions containing changesets from unknown contributors. > Is this from people who submit patch sets to bugs.python.org that include the individual commits? Or is this for core developers? > > +End-of-line conversions > +----------------------- > + > +There has been some discussion about the lack of end-of-line conversion > support > +in Mercurial. While Mercurial comes with a win32text extension that > provides > +some basic support for converting end-of-line data on a file-name pattern > +basis, the lack of exclusion (for specifying broad rules with exceptions) > and > +the use of hgrc files (which can't be versioned) make it less than ideal. > + > +I think the primary line of defense for prevention of inappropriate > newlines > +should be hooks on the server side which basically turn down any > changegroup > +or changeset introducing such data. The use of the win32text extension > (which > +can hopefully be improved/extended to support the usage scenarios > mentioned > +above) and/or a commit-time hook could be the first line of defense. > + > hgwebdir > -------- > > @@ -185,7 +223,16 @@ > build a quick extension to augment the URL rev parser so that it can also > take > r[0-9]+ args and come up with the matching hg revision. > > +roundup > +------- > > +We'll come up with an auto-linking plugin for roundup, which can match a > +changeset identifier (possibly with a branch prefix), and link it to the > +appropriate revision in the hgwebdir instance. Second, the script above > (in > +the hgwebdir section) will make sure that old links to revision should > continue > +to work (by pointing to the hg changeset that reflects the svn revision). > + > + > After migration > =============== > > @@ -222,37 +269,32 @@ > .. _wiki: http://www.selenic.com/mercurial/wiki/ > .. _parts of the developer FAQ: > http://www.python.org/dev/faq/#version-control > > -Think first, commit later? > --------------------------- > +Proposed workflow > +----------------- > > -In recent history, old versions of Python have been maintained by a select > -group of people backporting patches from trunk to release branches. While > -this may not scale so well as the development pace grows, it also runs > into > -some problems with the current crop of distributed versioning tools. These > -tools (I believe similar problems would exist for either git, bzr, or hg, > -though some may cope better than others) are based on the idea of a > Directed > -Acyclic Graph (or DAG), meaning they keep track of relations of > changesets. > +I propose two workflows for the migration of patches between several > branches. > > -Mercurial itself has a stable branch which is a ''strict'' subset of the > -unstable branch. This means that generally all fixes for the stable branch > -get committed against the tip of the stable branch, then they get merged > into > -the unstable branch (which already contains the parent of the new cset). > This > -provides a largely frictionless environment for moving changes from stable > to > -unstable branches. Mistakes, where a change that should go on stable goes > on > -unstable first, do happen, but they're usually easy to fix. That can be > done by > -copying the change over to the stable branch, then trivial-merging with > -unstable -- meaning the merge in fact ignores the parent from the stable > -branch). > +For migration within 2.x or 3.x branches, I propose a patch always gets > +committed to the oldest branch where it applies first. Then, the resulting > +changeset can be merged using hg merge to all newer branches within that > +series (2.x or 3.x). If it does not apply as-is to the newer branch, hg > revert > +can be used to easily revert to the new-branch-native head, patch in some > +alternative version of the patch (or none, if it's not applicable), then > commit > +the merge. The premise here is that all changesets from an older branch > within > +the series are eventually merged to all newer branches within the series. > > -This strategy means a little more work for regular committers, because > they > -have to think about whether their change should go on stable or unstable; > they > -may even have to ask someone else (the RM) before committing. But it also > -relieves a dedicated group of committers of regular backporting duty, in > -addition to making it easier to work with the tool. > +The upshot is that this provides for the most painless merging procedure. > The > +downside is that in the general case, people have to think about the > oldest > +branch to which the patch should be applied before actually applying it. > People should be doing that anyway intra-major version, so it should be a very minor pain point. Plus named branches make this straightforward, right? > -Now would be a good time to consider changing strategies in this regard, > -although it would be relatively easy to switch to such a model later on. > +For migration between 2.x and 3.x branches (which should all be in the > same > +direction, though I'm not sure what direction is most appropriate here), Patches go from 2.x to 3.x typically. > > +changesets should be transplanted (not merged) in some other way. The > +transplant extension, import/export and bundle/unbundle work equally well > here. > We will need to choose one and document how to use it. When you are ready to lock this down let me know and we can starting writing a new version of the dev FAQ. > > +Choosing this approach allows 3.x not to carry all of the 2.x > history-since-it- > +was-branched, meaning the clone is not as big and the merges not as > complicated. > + So just like it is now with svnmerge, right? That's fine as long as we continue to include the revision # in the commit so it is possible to reference the original commit on the other branch. > > The future of Subversion > ------------------------ > > @@ -281,7 +323,9 @@ > I propose that the revision identifier will be the short version of hg's > revision hash, for example 'dd3ebf81af43', augmented with '+' (instead of > 'M') > if the working directory from which it was built was modified. This > mirrors > -the output of the hg id command, which is intended for this kind of usage. > +the output of the hg id command, which is intended for this kind of usage. > The > +sys.subversion value will also be renamed to sys.mercurial to reflect the > +change in VCS. > > For the tag/branch identifier, I propose that hg will check for tags on > the > currently checked out revision, use the tag if there is one ('tip' doesn't > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dirkjan at ochtman.nl Mon Aug 3 12:51:36 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 3 Aug 2009 12:51:36 +0200 Subject: [Python-Dev] PEP 385: pruning/reorganizing branches Message-ID: So PEP 385 proposes to clean up the old branches we still have lying around in SVN. This list of branch: action items is what I've come up with to do this cleanup. Legend first: - keep-clone means we'll keep that branch in a separate clone - keep-named means we'll keep that branch as a named branch in one of the clones - strip means we won't keep that branch - streamed-merge means that it got merged by committing several new revisions to the other branch - merged-r* means the branch got merged in the named revision - merges? means I haven't checked/found out yet whether that branch was ever merged - ? means that your input would be even more helpful than for the other items - some items have no action yet, feel free to treat that as just '?' The actual List: py3k: keep-clone default: keep-clone tk_and_idle_maintenance: keep-clone release26-maint: keep-named release30-maint: keep-named pep-0383: keep-clone py3k-short-float-repr: strip streamed-merge multiprocessing-autoconf: keep-clone? release25-maint: keep-named io-c: keep-clone? py3k-issue1717: keep-clone tlee-ast-optimize: keep-clone release24-maint: keep-named empty: keep-clone? py3k-urllib: keep-clone tnelson-trunk-bsddb-47-upgrade: strip benjaminp-testing: strip py3k-importlib: keep-clone release23-maint: keep-named py3k-importhook: keep-clone ctypes-branch: strip decimal-branch: merged-r58143 bcannon-objcap: keep-clone? p3yk_no_args_on_exc: strip amk-mailbox: keep-clone? twouters-dictviews-backport: keep-clone? bcannon-sandboxing: keep-clone? release22-maint: keep-named theller_modulefinder: strip hoxworth-stdlib_logging-soc: strip partial tim-exc_sanity: merged-r46426 IDLE-syntax-branch: merged-r41480 strip-later ast-objects: strip ast-arena: merged-r41739 jim-doctest: strip ast-branch: merged-r39758 release23-branch: merges? jim-modulator: strip release21-maint: keep-named indexing-cleanup-branch: strip r23c1-branch: merged-r33637 r23b2-branch: merges? anthony-parser-branch: merged-r35460 r23b1-branch: merged-r32490 idlefork-merge-branch: strip getargs_mask_mods: strip cache-attr-branch: strip folding-reimpl-branch: strip streamed-merge r23a2-branch: merges? bsddb-bsddb3-schizo-branch: merged-r31008 r23a1-branch: merged-r30482 py-cvs-vendor-branch: strip DS_RPC_BRANCH: strip streamed-merge SourceForge: strip release22-branch: merged-r24921 r22rc1-branch: strip r22b2-branch: merges? merged-r24426 r22b1-branch: merges? r22a4-branch: merges? r22a3-branch: merges? r22a2-branch: merged-r22674 descr-branch: merged-r22139 release20-maint: keep-named gen-branch: merged-r21181 iter-branch: merged-r20492 r161-branch: merges? cnri-16-start: strip universal-33: merges? None: strip avendor: strip Distutils_0_1_3-branch: strip partial release152p1-patches: merges? string_methods: merged-r13927 PYIDE: strip OSAM: strip PYTHONSCRIPT: strip BBPY: strip jar: merges? alpha100: strip streamed-merge unlabeled-2.36.4: strip partial unlabeled-2.1.4: strip partial unlabeled-2.25.4: strip partial fix-test-ftplib: merged-r66673 trunk-math: py3k-grandrenaming: okkoto-sizeof py3k-ctypes-pep3118: merged-r62597 trunk-bytearray: merged-r61936 libffi3-branch: merged-r61234 alex-py3k: strip cpy_merge: strip py3k-pep3137: merged-r58888 ../ctypes-branch: strip pep302_phase2: strip py3k-buffer: merged-r57181 p3yk-noslice py3k-struni p3yk: rename int_unification unlabeled-1.1.1: strip unlabeled-1.5.4: strip unlabeled-1.1.2: strip unlabeled-2.9.2: strip unlabeled-2.9.4: strip unlabeled-1.5.2: strip unlabeled-2.1.2: strip unlabeled-2.36.2: strip unlabeled-2.108.2: strip unlabeled-2.10.2: strip unlabeled-2.54.2: strip unlabeled-1.3.2: strip unlabeled-1.23.4: strip unlabeled-2.25.2: strip unlabeled-1.2.2: strip unlabeled-1.98.2: strip unlabeled-2.16.2: strip unlabeled-2.3.2: strip unlabeled-1.9.2: strip unlabeled-1.8.2: strip aimacintyre-sf1454481: merged-r46919 tim-current_frames: merged-r50541 bippolito-newstruct: merges? runar-longslice-branch: strip steve-notracing: strip rjones-funccall: merged-r46096 sreifschneider-newnewexcept: merged-r46456 tim-doctest-branch: merged-r36839 blais-bytebuf: strip ../bippolito-newstruct: rename rjones-prealloc: strip sreifschneider-64ints: strip stdlib-cleanup: strip ssize_t: merged-r42382 sqlite-integration: merged-r43514 tim-obmalloc: merged-r43059 Further actions: - implement branch map support in hgsubversion to be able to do named/unnamed/no branch on a branch-by-branch basis - implement splice map support in hgsubversion to be able to convert given merges to hg-native merge data Cheers, Dirkjan From martin at v.loewis.de Tue Aug 4 08:06:04 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 04 Aug 2009 08:06:04 +0200 Subject: [Python-Dev] PEP 385: pruning/reorganizing branches In-Reply-To: References: Message-ID: <4A77CFCC.3050606@v.loewis.de> > empty: keep-clone? I use that as a branch to tell build slaves to clean out their current checkouts. So keep-clone sounds right, assuming it is possible to target buildslaves at either clones or branches (which, IIUC, would be necessary anyway, since we are using a mix of branches and clones). > amk-mailbox: keep-clone? > twouters-dictviews-backport: keep-clone? > bcannon-sandboxing: keep-clone? > bippolito-newstruct: merges? You'll probably need to explicitly ping the specific owners (Andrew Kuchling, Thomas Wouters, Brett Cannon, Bob Ippolito) to understand the fate of these branches. This also raises the question how developers should publish their "own" branches. For the bzr setup, there was apparently a proposal to use directories for that, i.e. giving each developer a directory on code.python.org to publish branches. Not doing that, but keeping owner information encoded in the clone name, would be fine as well. > release23-branch: merges? > r23b2-branch: merges? > r22rc1-branch: strip > r22b1-branch: merges? > r22a4-branch: merges? > r22a3-branch: merges? > r161-branch: merges? It seems we had been creating CVS branches for every release around that time; I don't remember the details. Each such branch should end up in a tag. For example, release23-branch should (and does) ultimately lead to tags/r23. cvs2svn wasn't able to recognize this correctly (as CVS branches apply to each file individually), so it created the r23 tag out of various copies that were current when the tag was made. I don't know what your plan is wrt. release tags, i.e. whether you want to keep them all. If you are stripping out some of the branches, but plan to keep the release tags, I wonder what the tags look like. > release22-branch: merged-r24921 Not really. Jack Jansen merged some changes that got first applied to the 2.2 > r22b2-branch: merges? merged-r24426 > r22b2-branch: merges? merged-r24426 > release20-maint: keep-named See above. So you do plan to keep all past releases? > release152p1-patches: merges? Probably merged. I don't recall whether 1.5.2p1 really happened; in r14966, Fred claims that he merged all changes from 1.5.2p2 (!). "Hopefully I got all this right!" I surely hope the same - I doubt anybody would go back and check whether anything is missing. Regards, Martin From dirkjan at ochtman.nl Tue Aug 4 08:33:49 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Tue, 4 Aug 2009 08:33:49 +0200 Subject: [Python-Dev] PEP 385: pruning/reorganizing branches In-Reply-To: <4A77CFCC.3050606@v.loewis.de> References: <4A77CFCC.3050606@v.loewis.de> Message-ID: On Tue, Aug 4, 2009 at 08:06, "Martin v. L?wis" wrote: > I use that as a branch to tell build slaves to clean out their > current checkouts. So keep-clone sounds right, assuming it is possible > to target buildslaves at either clones or branches (which, IIUC, would > be necessary anyway, since we are using a mix of branches and clones). Yes, that should be straightforward. >> amk-mailbox: keep-clone? >> twouters-dictviews-backport: keep-clone? >> bcannon-sandboxing: keep-clone? >> bippolito-newstruct: merges? > > You'll probably need to explicitly ping the specific owners > (Andrew Kuchling, Thomas Wouters, Brett Cannon, Bob Ippolito) > to understand the fate of these branches. Will do. > This also raises the question how developers should publish their > "own" branches. For the bzr setup, there was apparently a proposal > to use directories for that, i.e. giving each developer a directory > on code.python.org to publish branches. User repositories has apparently worked well for Mozilla, so yeah, it's worth discussing. > Not doing that, but keeping owner information encoded in the clone > name, would be fine as well. > >> release23-branch: merges? >> r23b2-branch: merges? >> r22rc1-branch: strip >> r22b1-branch: merges? >> r22a4-branch: merges? >> r22a3-branch: merges? >> r161-branch: merges? > > It seems we had been creating CVS branches for every release around > that time; I don't remember the details. Each such branch should end > up in a tag. For example, release23-branch should (and does) ultimately > lead to tags/r23. cvs2svn wasn't able to recognize this correctly (as > CVS branches apply to each file individually), so it created the r23 > tag out of various copies that were current when the tag was made. > > I don't know what your plan is wrt. release tags, i.e. whether you > want to keep them all. If you are stripping out some of the branches, > but plan to keep the release tags, I wonder what the tags look like. The plan was to keep all maintenance branches and all release tags but not all release branches (since they seem to contain few commits anyway). >> release22-branch: merged-r24921 > > Not really. Jack Jansen merged some changes that got first applied > to the 2.2 > >> r22b2-branch: merges? merged-r24426 >> r22b2-branch: merges? merged-r24426 > >> release20-maint: keep-named > > See above. So you do plan to keep all past releases? > >> release152p1-patches: merges? > > Probably merged. I don't recall whether 1.5.2p1 really happened; > in r14966, Fred claims that he merged all changes from 1.5.2p2 (!). > > "Hopefully I got all this right!" > > I surely hope the same - I doubt anybody would go back and check > whether anything is missing. Thanks for the thorough review, Dirkjan From dickinsm at gmail.com Tue Aug 4 11:20:09 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 4 Aug 2009 10:20:09 +0100 Subject: [Python-Dev] PEP 385: pruning/reorganizing branches In-Reply-To: References: Message-ID: <5c6f2a5d0908040220w24163e7n142842a39e9a09c5@mail.gmail.com> Comments on some of the branches I've had involvement with... On Mon, Aug 3, 2009 at 11:51 AM, Dirkjan Ochtman wrote: > py3k-short-float-repr: strip streamed-merge Sounds fine. > py3k-issue1717: keep-clone I don't think there's any need to keep this branch; its contents were all merged (in pieces) to py3k (various revisions with numbers in the range 69188--69225). So I think 'strip streamed-merge' is appropriate here, if I'm understanding your terminology. > trunk-math: I think this one can go down as 'strip', too; there's nothing there of interest that isn't already in trunk and py3k. It was merged to trunk in r62380. -- Mark From ncoghlan at gmail.com Tue Aug 4 11:20:13 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 04 Aug 2009 19:20:13 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: Message-ID: <4A77FD4D.1020502@gmail.com> Dirkjan Ochtman wrote: >>> * commit hooks be implemented to enforce this - but this should not be >>> necessary if the above was implemented and socially enforced. > > You seem to advocate a two-step approach: enforce line endings through > win32text, catch any errors that slipped through in a hook (commit > hook is an optional first line of defense, changegroup hooks on the > server to protect the rest of the world). > > I think inverting that approach would be better: have strict hooks on > the server to prevent people from pushing inappropriate EOLs, and > provide help on configuring win32text as an extra help for developers > on Windows who use editors that work better with \r\n. That leaves > people to pick their own weapon of choice against propagation of \r\n > (e.g. better editor, commit hooks, whatever) while still making sure > no inappropriate line endings land in the python.org repositories. It > also seems to fit well with the whole consenting adults thing (but > that might just be me). It's about not treating Windows developers as second class citizens. Their platform uses \r\n as its native line ending format, so they should be able to work in that format without any hassles by following some simple instructions (such as "ensure you have version X of the Windows hg client, enable the win32text extension and configure it in such-and-such a way"). Not "oh, yeah, that's an issue but if you search the Intarwebs there are a few different things you can do that kinda sorta work but are a bit fragile and klunky". The precise order the two issues (server side enforcement and client side assistance) are dealt with doesn't really matter because *both* issues need to be addressed before we migrate. win32text needs to be usable on non-Windows clients so that tarballs generated on a *nix machine get the line endings right in the Windows-only files. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Tue Aug 4 11:27:46 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 04 Aug 2009 19:27:46 +1000 Subject: [Python-Dev] Functionality in subprocess.Popen.terminate() In-Reply-To: <171e8a410908031142t52d974f6tea40aaa7f5d8c059@mail.gmail.com> References: <171e8a410908031142t52d974f6tea40aaa7f5d8c059@mail.gmail.com> Message-ID: <4A77FF12.3010401@gmail.com> Eric Pruitt wrote: > In my GSoC project, I have implemented asnychronous I/O in > subprocess.Popen. Since the read/write operations are asynchronous, the > program may have already exited by the time one calls the asyncread > function I have implemented. While it returns the data just fine, I have > come across an issue with the TerminateProcess function in Windows: if > the program has already exited, when subprocess.Popen.Terminate calls > the Windows built-in "TerminateProcess" function, an "access denied" > error will occur. Should I just make it so that this exception is simply > ignored or perform some kind of check to see if the process exists > beforehand? If the latter, I have been unable to find a way to do so, to > my liking at least. The solutions I saw would require code that seems a > bit excessive to me. I'm pretty sure we already ignore some spurious error messages in cases like calling flush() in file.close(). I would suggest checking what the io module does in such cases and see what kind of precedent it sets. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Tue Aug 4 11:35:33 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 04 Aug 2009 19:35:33 +1000 Subject: [Python-Dev] PEP 385: pruning/reorganizing branches In-Reply-To: References: <4A77CFCC.3050606@v.loewis.de> Message-ID: <4A7800E5.7040308@gmail.com> Dirkjan Ochtman wrote: > On Tue, Aug 4, 2009 at 08:06, "Martin v. L?wis" wrote: >> I don't know what your plan is wrt. release tags, i.e. whether you >> want to keep them all. If you are stripping out some of the branches, >> but plan to keep the release tags, I wonder what the tags look like. > > The plan was to keep all maintenance branches and all release tags but > not all release branches (since they seem to contain few commits > anyway). I think I share Martin's confusion here - how can you keep a release tag (e.g. 2.2.2) without also keeping the release branch where that tag was created? Yes, the maintenance branches contain a comparatively small number of commits, but they're still the sources of the maintenance release tags. Or is this a case where Mercurial's DAG allows you to handle those old branches as "abandoned" leaves of the DAG in the history of the affected files, with the tags picking out the relevant versions of the files? Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From amk at amk.ca Tue Aug 4 13:43:12 2009 From: amk at amk.ca (A.M. Kuchling) Date: Tue, 4 Aug 2009 07:43:12 -0400 Subject: [Python-Dev] PEP 385: pruning/reorganizing branches In-Reply-To: References: Message-ID: <20090804114312.GA6322@amk-desktop.matrixgroup.net> On Mon, Aug 03, 2009 at 12:51:36PM +0200, Dirkjan Ochtman wrote: > amk-mailbox: keep-clone? strip -- this branch was for working on a fix for http://bugs.python.org/issue1599254, but the actual work in the branch is available as the patches attached to that item. --amk From robert.schuppenies at gmail.com Tue Aug 4 15:41:33 2009 From: robert.schuppenies at gmail.com (Robert Schuppenies) Date: Tue, 04 Aug 2009 06:41:33 -0700 Subject: [Python-Dev] PEP 385: pruning/reorganizing branches In-Reply-To: References: Message-ID: <4A783A8D.5020503@gmail.com> Dirkjan Ochtman wrote: > > okkoto-sizeof strip - It's an 2008 Google Summer of Code project. The important changes have been applied in r63856. From eric.pruitt at gmail.com Tue Aug 4 17:01:57 2009 From: eric.pruitt at gmail.com (Eric Pruitt) Date: Tue, 4 Aug 2009 10:01:57 -0500 Subject: [Python-Dev] Functionality in subprocess.Popen.terminate() In-Reply-To: <4A77FF12.3010401@gmail.com> References: <171e8a410908031142t52d974f6tea40aaa7f5d8c059@mail.gmail.com> <4A77FF12.3010401@gmail.com> Message-ID: <171e8a410908040801s3f112b9fh4b3c6dc32cc7f9ec@mail.gmail.com> On Tue, Aug 4, 2009 at 04:27, Nick Coghlan wrote: > Eric Pruitt wrote: >> In my GSoC project, I have implemented asnychronous I/O in >> subprocess.Popen. Since the read/write operations are asynchronous, the >> program may have already exited by the time one calls the asyncread >> function I have implemented. While it returns the data just fine, I have >> come across an issue with the TerminateProcess function in Windows: if >> the program has already exited, when subprocess.Popen.Terminate calls >> the Windows built-in "TerminateProcess" function, an "access denied" >> error will occur. Should I just make it so that this exception is simply >> ignored or perform some kind of check to see if the process exists >> beforehand? If the latter, I have been unable to find a way to do so, to >> my liking at least. The solutions I saw would require code that seems a >> bit excessive to me. > > I'm pretty sure we already ignore some spurious error messages in cases > like calling flush() in file.close(). I would suggest checking what the > io module does in such cases and see what kind of precedent it sets. > > Cheers, > Nick. > > -- > Nick Coghlan ? | ? ncoghlan at gmail.com ? | ? Brisbane, Australia > --------------------------------------------------------------- > Sounds good enough to me but I was wondering if it might be a good idea to add a function like "pidinuse" to subprocess as a whole that would determine if a process ID was being used and return a simple boolean value. I came across a number of people searching for a way to determine if a PID was running (Google "python check if pid exists") so it seems like the implemented functionality would be of use to the community as a whole, not just my wrapper class. Eric From janzert at janzert.com Tue Aug 4 17:24:23 2009 From: janzert at janzert.com (Janzert) Date: Tue, 04 Aug 2009 11:24:23 -0400 Subject: [Python-Dev] Functionality in subprocess.Popen.terminate() In-Reply-To: <171e8a410908040801s3f112b9fh4b3c6dc32cc7f9ec@mail.gmail.com> References: <171e8a410908031142t52d974f6tea40aaa7f5d8c059@mail.gmail.com> <4A77FF12.3010401@gmail.com> <171e8a410908040801s3f112b9fh4b3c6dc32cc7f9ec@mail.gmail.com> Message-ID: <4A7852A7.50407@janzert.com> Eric Pruitt wrote: > On Tue, Aug 4, 2009 at 04:27, Nick Coghlan wrote: >> Eric Pruitt wrote: >>> In my GSoC project, I have implemented asnychronous I/O in >>> subprocess.Popen. Since the read/write operations are asynchronous, the >>> program may have already exited by the time one calls the asyncread >>> function I have implemented. While it returns the data just fine, I have >>> come across an issue with the TerminateProcess function in Windows: if >>> the program has already exited, when subprocess.Popen.Terminate calls >>> the Windows built-in "TerminateProcess" function, an "access denied" >>> error will occur. Should I just make it so that this exception is simply >>> ignored or perform some kind of check to see if the process exists >>> beforehand? If the latter, I have been unable to find a way to do so, to >>> my liking at least. The solutions I saw would require code that seems a >>> bit excessive to me. >> I'm pretty sure we already ignore some spurious error messages in cases >> like calling flush() in file.close(). I would suggest checking what the >> io module does in such cases and see what kind of precedent it sets. >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> --------------------------------------------------------------- >> > > Sounds good enough to me but I was wondering if it might be a good > idea to add a function like "pidinuse" to subprocess as a whole that > would determine if a process ID was being used and return a simple > boolean value. I came across a number of people searching for a way to > determine if a PID was running (Google "python check if pid exists") > so it seems like the implemented functionality would be of use to the > community as a whole, not just my wrapper class. > > Eric > I'm not sure of the actual details but it seems from your description that even if you check first a race condition will still exist. Specifically the subprocess could terminate after the check and before the TerminateProcess call. So it seems better just to call TerminateProcess and then correctly handle any possible error. Janzert From brett at python.org Tue Aug 4 21:28:10 2009 From: brett at python.org (Brett Cannon) Date: Tue, 4 Aug 2009 12:28:10 -0700 Subject: [Python-Dev] PEP 385: pruning/reorganizing branches In-Reply-To: <4A77CFCC.3050606@v.loewis.de> References: <4A77CFCC.3050606@v.loewis.de> Message-ID: On Mon, Aug 3, 2009 at 23:06, "Martin v. L?wis" wrote: > > empty: keep-clone? > > I use that as a branch to tell build slaves to clean out their > current checkouts. So keep-clone sounds right, assuming it is possible > to target buildslaves at either clones or branches (which, IIUC, would > be necessary anyway, since we are using a mix of branches and clones). > > > amk-mailbox: keep-clone? > > twouters-dictviews-backport: keep-clone? > > bcannon-sandboxing: keep-clone? > > bippolito-newstruct: merges? > keep-clone bcannon-objcap, strip bcannon-sandboxing. > > You'll probably need to explicitly ping the specific owners > (Andrew Kuchling, Thomas Wouters, Brett Cannon, Bob Ippolito) > to understand the fate of these branches. > > This also raises the question how developers should publish their > "own" branches. For the bzr setup, there was apparently a proposal > to use directories for that, i.e. giving each developer a directory > on code.python.org to publish branches. > Yeah, I thought I brought this up and people liked the idea of keeping some user directory on hg.python.org. I am fine with code.python.org as well. But having some place would be really handy (although having bitbucket and Google Code makes this not quite as important). > > Not doing that, but keeping owner information encoded in the clone > name, would be fine as well. > > > release23-branch: merges? > > r23b2-branch: merges? > > r22rc1-branch: strip > > r22b1-branch: merges? > > r22a4-branch: merges? > > r22a3-branch: merges? > > r161-branch: merges? > > It seems we had been creating CVS branches for every release around > that time; I don't remember the details. Each such branch should end > up in a tag. For example, release23-branch should (and does) ultimately > lead to tags/r23. cvs2svn wasn't able to recognize this correctly (as > CVS branches apply to each file individually), so it created the r23 > tag out of various copies that were current when the tag was made. > > I don't know what your plan is wrt. release tags, i.e. whether you > want to keep them all. If you are stripping out some of the branches, > but plan to keep the release tags, I wonder what the tags look like. > > > release22-branch: merged-r24921 > > Not really. Jack Jansen merged some changes that got first applied > to the 2.2 > > > r22b2-branch: merges? merged-r24426 > > r22b2-branch: merges? merged-r24426 > > > release20-maint: keep-named > > See above. So you do plan to keep all past releases? > > > release152p1-patches: merges? > > Probably merged. I don't recall whether 1.5.2p1 really happened; > in r14966, Fred claims that he merged all changes from 1.5.2p2 (!). > > "Hopefully I got all this right!" > > I surely hope the same - I doubt anybody would go back and check > whether anything is missing. > > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhammond at skippinet.com.au Wed Aug 5 01:43:15 2009 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 05 Aug 2009 09:43:15 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A77FD4D.1020502@gmail.com> References: <4A77FD4D.1020502@gmail.com> Message-ID: <4A78C793.9020409@skippinet.com.au> On 4/08/2009 7:20 PM, Nick Coghlan wrote: > Dirkjan Ochtman wrote: >>>> * commit hooks be implemented to enforce this - but this should not be >>>> necessary if the above was implemented and socially enforced. >> >> You seem to advocate a two-step approach: enforce line endings through >> win32text, catch any errors that slipped through in a hook (commit >> hook is an optional first line of defense, changegroup hooks on the >> server to protect the rest of the world). >> >> I think inverting that approach would be better: have strict hooks on >> the server to prevent people from pushing inappropriate EOLs, and >> provide help on configuring win32text as an extra help for developers >> on Windows who use editors that work better with \r\n. That leaves >> people to pick their own weapon of choice against propagation of \r\n >> (e.g. better editor, commit hooks, whatever) while still making sure >> no inappropriate line endings land in the python.org repositories. It >> also seems to fit well with the whole consenting adults thing (but >> that might just be me). > > It's about not treating Windows developers as second class citizens. > Their platform uses \r\n as its native line ending format, so they Thanks Nick; I didn't want to be the only one saying that. There is a fine line between asserting reasonable requirements for Windows users and being obstructionist and unhelpful, and I'm trying to stay on the former side :) > should be able to work in that format without any hassles by following > some simple instructions (such as "ensure you have version X of the > Windows hg client, enable the win32text extension and configure it in > such-and-such a way"). Not "oh, yeah, that's an issue but if you search > the Intarwebs there are a few different things you can do that kinda > sorta work but are a bit fragile and klunky". > > The precise order the two issues (server side enforcement and client > side assistance) are dealt with doesn't really matter because *both* > issues need to be addressed before we migrate. I'm not that happy with the server being the primary line of defense. Let's say I make a branch of the hg repo, myself and a few others work on it committing as we go, then attempt to merge back upstream. Let's say some of the early commits on that clone introduced "bad" line endings. I'm guessing I would be forced to make a number of whitespace-only checkins to normalize the line-endings before it could merge - and these checkins would then be in the history forever. Or I could attempt to recreate the clone by somehow "replaying" the commits with line endings corrected. Either way, the situation doesn't seem good. > win32text needs to be usable on non-Windows clients so that tarballs > generated on a *nix machine get the line endings right in the > Windows-only files. I agree. It isn't fair to make this windows users problem. It would be like me proposing the repo get imported with \r\n line endings, enforce that with server side hooks, and let non-Windows users worry about the ramifications of that - somehow I doubt that would fly - so neither should it fly for Windows users... I'm more than willing to help on this; I haven't resurrected my stale patch because I find win32text only 1/2 a solution that doesn't work in practice. Therefore that patch is as stale for me as it is anyone. However, if a plan is put in place which offers a full solution and the hg developers are committed to it, I promise I'll put my hand up to help with implementation in a fairly timely manner... Cheers, Mark From nyamatongwe at gmail.com Wed Aug 5 02:44:04 2009 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Wed, 5 Aug 2009 10:44:04 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A78C793.9020409@skippinet.com.au> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> Message-ID: <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> Mark Hammond: > Thanks Nick; I didn't want to be the only one saying that. ?There is a fine > line between asserting reasonable requirements for Windows users and being > obstructionist and unhelpful, and I'm trying to stay on the former side :) I haven't commented on this issue before because I can't really be helpful. I just don't understand why hg is being considered before it's Windows support is roughly equivalent to svn and cvs. There has been some similar experience with the main repository for the Cocoa port of Scintilla which is in bzr on launchpad. Several times in that repository, files were checked in with wrong line ends making every line appear changed when looking through history. There are several causes for this including user error but bzr (and hg) should default to more helpful behaviour on text files. Neil From ben+python at benfinney.id.au Wed Aug 5 07:56:16 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 05 Aug 2009 15:56:16 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> Message-ID: <87fxc6regv.fsf@benfinney.id.au> Mark Hammond writes: > Let's say I make a branch of the hg repo, myself and a few others work > on it committing as we go, then attempt to merge back upstream. Let's > say some of the early commits on that clone introduced "bad" line > endings. I'm guessing I would be forced to make a number of > whitespace-only checkins to normalize the line-endings before it could > merge - and these checkins would then be in the history forever. What is wrong with that? I mean, if that is the actual sequence of events, why should the history not reflect that? > Either way, the situation doesn't seem good. I see this assertion made often, so I'm not saying you are necessarily wrong to make it. I just don't see a justification for making it (and, without justification, I would say it *is* wrong to make it). -- \ ?Our products just aren't engineered for security.? ?Brian | `\ Valentine, senior vice-president of Microsoft Windows | _o__) development | Ben Finney From digitalxero at gmail.com Wed Aug 5 08:02:03 2009 From: digitalxero at gmail.com (Dj Gilcrease) Date: Wed, 5 Aug 2009 00:02:03 -0600 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A78C793.9020409@skippinet.com.au> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> Message-ID: On Tue, Aug 4, 2009 at 5:43 PM, Mark Hammond wrote: > I'm more than willing to help on this; I haven't resurrected my stale patch > because I find win32text only 1/2 a solution that doesn't work in practice. > ?Therefore that patch is as stale for me as it is anyone. However, if a plan > is put in place which offers a full solution and the hg developers are > committed to it, I promise I'll put my hand up to help with implementation > in a fairly timely manner... Not sure what your patch was as I cannot find it, but I did up a quick change to win32text that uses a versioned .win32text file to maintain encoders, decoders and an ignore list http://media.digitalxero.net/win32text.py http://media.digitalxero.net/.win32text and add to your hgrc file [hooks] precommit.eol_encode = python:hgext.win32text.versioned_encode it needs to be precommit since it needs to run before the change set has been created so it can modify the data. Honestly I think this solution is kind of a hack, a much better solution would be to modify the encode/decode hooks to accept a filename so you can at least do ignore pattern matching, but that still ignores versioned encodes / decodes From skippy.hammond at gmail.com Wed Aug 5 08:08:43 2009 From: skippy.hammond at gmail.com (Mark Hammond) Date: Wed, 05 Aug 2009 16:08:43 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <87fxc6regv.fsf@benfinney.id.au> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> Message-ID: <4A7921EB.30007@gmail.com> On 5/08/2009 3:56 PM, Ben Finney wrote: > Mark Hammond writes: > >> Let's say I make a branch of the hg repo, myself and a few others work >> on it committing as we go, then attempt to merge back upstream. Let's >> say some of the early commits on that clone introduced "bad" line >> endings. I'm guessing I would be forced to make a number of >> whitespace-only checkins to normalize the line-endings before it could >> merge - and these checkins would then be in the history forever. > > What is wrong with that? I mean, if that is the actual sequence of > events, why should the history not reflect that? The problem is the sequence of events happened in the first place. An extra burden is placed on the developer that will quickly get tiresome. I wouldn't personally be happy if that workflow became the norm. >> Either way, the situation doesn't seem good. > > I see this assertion made often, so I'm not saying you are necessarily > wrong to make it. I just don't see a justification for making it (and, > without justification, I would say it *is* wrong to make it). *shrug* - in my opinion, the fact the developer is faced with that hurdle in their workflow is justification enough to say that developer's situation "doesn't seem good" and should have been prevented from happening by the tool much earlier than proposed. Mark From ben+python at benfinney.id.au Wed Aug 5 08:50:05 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 05 Aug 2009 16:50:05 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> <4A7921EB.30007@gmail.com> Message-ID: <877hxirbz6.fsf@benfinney.id.au> Mark Hammond writes: > On 5/08/2009 3:56 PM, Ben Finney wrote: > > Mark Hammond writes: > > > >> Let's say I make a branch of the hg repo, myself and a few others work > >> on it committing as we go, then attempt to merge back upstream. Let's > >> say some of the early commits on that clone introduced "bad" line > >> endings. [?] > > The problem is the sequence of events happened in the first place. An > extra burden is placed on the developer that will quickly get > tiresome. I wouldn't personally be happy if that workflow became the > norm. Ah, okay. In that case, the ultimate ?problem? is that OS vendors entrenched their incompatible line-ending conventions instead of choosing a single standard. Any line-ending burden borne by developers is a result of that. If things were different, they'd be different. However, we live with the legacy of that stupid set of decisions and have no real option to resolve it permanently short of deprecating entire vistas of tools (or even entire operating systems). > *shrug* - in my opinion, the fact the developer is faced with that > hurdle in their workflow is justification enough to say that > developer's situation "doesn't seem good" and should have been > prevented from happening by the tool much earlier than proposed. AIUI, this is a combination of several things: * different OSen have incompatible, entrenched conventions for line-ending that is embodied in the default output of their text processing tools. * these differences matter in many concrete ways to the tools that process text, so the differences need to be preserved, or explicitly transformed. * distributed VCS has the job of preserving data as present on the filesystem, including whatever line-ending convention is present in a file. * distributed VCS has the job of managing data exchange between users, presenting differences in a way that allows easy inspection and merging. * humans want to pretend that these incompatibilities don't exist, and want ?end of line? to be an automatically-handled abstraction. It's not a simple thing to solve, and many clever people have tried over the decades. The fact that a centralised VCS can put the problem aside by requiring an explicit, single decision in the repository, is no help when addressing the constraints of a distributed VCS. At some point, the decision about how to handle line endings in cross-platform data needs to be punted to a human for a context-sensitive assessment, since (as can be seen) the above list of requirements is internally inconsistent and can't be relegated to a one-size-fits-all algorithm. -- \ ?All progress has resulted from people who took unpopular | `\ positions.? ?Adlai Stevenson | _o__) | Ben Finney From martin at v.loewis.de Wed Aug 5 09:35:26 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 05 Aug 2009 09:35:26 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> Message-ID: <4A79363E.1040606@v.loewis.de> > I haven't commented on this issue before because I can't really be > helpful. I just don't understand why hg is being considered before > it's Windows support is roughly equivalent to svn and cvs. Is it really that you don't *understand*? It's fairly easy: there was a PEP which offered a number of options, and there was BDFL pronouncement. This (BDFL pronouncement) is how Python has always worked, and, as a principle, it is a good and useful process. Now, the specific outcome of the process means that more work needs to be done. So we have a *second* PEP, and we have a lack of volunteers that help implementing it. The second PEP hasn't been approved yet (as it isn't complete, yet), so migration to hg is stalled. The primary volunteer (Dirkjan) has indicated that he can't help with that specific issue, so other volunteers need to step forward, or we cannot move to hg. Regards, Martin From skippy.hammond at gmail.com Wed Aug 5 09:31:53 2009 From: skippy.hammond at gmail.com (Mark Hammond) Date: Wed, 05 Aug 2009 17:31:53 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <877hxirbz6.fsf@benfinney.id.au> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> <4A7921EB.30007@gmail.com> <877hxirbz6.fsf@benfinney.id.au> Message-ID: <4A793569.1000008@gmail.com> On 5/08/2009 4:50 PM, Ben Finney wrote: > Mark Hammond writes: > >> On 5/08/2009 3:56 PM, Ben Finney wrote: >>> Mark Hammond writes: >>> >>>> Let's say I make a branch of the hg repo, myself and a few others work >>>> on it committing as we go, then attempt to merge back upstream. Let's >>>> say some of the early commits on that clone introduced "bad" line >>>> endings. > [?] >> >> The problem is the sequence of events happened in the first place. An >> extra burden is placed on the developer that will quickly get >> tiresome. I wouldn't personally be happy if that workflow became the >> norm. > > Ah, okay. In that case, the ultimate ?problem? is that OS vendors > entrenched their incompatible line-ending conventions instead of > choosing a single standard. Any line-ending burden borne by developers > is a result of that. Yeah - this happened around 1964 if wikipedia is any guide. > > If things were different, they'd be different. However, we live with the > legacy of that stupid set of decisions and have no real option to > resolve it permanently short of deprecating entire vistas of tools (or > even entire operating systems). Agreed - so let's not solve it permanently. ... > It's not a simple thing to solve, and many clever people have tried over > the decades. As already mentioned in this thread, a capability similar to what svn or cvs offers would be sufficient. While a DVCS does offer unique challenges, it seems to me that doing something at commit time without requiring magic hooks be configured would go a long way to addressing the problem. Magic hooks on the official repo would then be considered the final fallback defense, but should rarely be invoked. > At some point, the decision about how to handle line endings in > cross-platform data needs to be punted to a human for a > context-sensitive assessment, since (as can be seen) the above list of > requirements is internally inconsistent and can't be relegated to a > one-size-fits-all algorithm. I'm not sure what point you are trying to make, but I believe it *is* possible for a solution to be found here which will keep Windows users happy. I'm guessing you haven't had much practical experience with this problem, so probably don't see this is clearly as Windows users do. Cheers, Mark. From martin at v.loewis.de Wed Aug 5 09:45:18 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 05 Aug 2009 09:45:18 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <877hxirbz6.fsf@benfinney.id.au> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> <4A7921EB.30007@gmail.com> <877hxirbz6.fsf@benfinney.id.au> Message-ID: <4A79388E.7050006@v.loewis.de> > If things were different, they'd be different. However, we live with the > legacy of that stupid set of decisions and have no real option to > resolve it permanently short of deprecating entire vistas of tools (or > even entire operating systems). I think you missed the solution to the problem that Mark proposed (IIUC): a local commit to a hg repository should already get the line endings right, by automatically converting the file-to-be-committed into the repository line endings. This is what CVS has supported for more than ten years, and what svn supports for close-to ten years. > * distributed VCS has the job of preserving data as present on the > filesystem, including whatever line-ending convention is present in a > file. No, that's not true. Distributed VCS has the job to help the developer. That may mean to preserve the file as-is, or it may mean to convert the file on checkout and checkin. Which of these would be needed depends on the file, of course. > It's not a simple thing to solve, and many clever people have tried over > the decades. The fact that a centralised VCS can put the problem aside > by requiring an explicit, single decision in the repository, is no help > when addressing the constraints of a distributed VCS. Why do you say that? It's not true. The approach that has worked for the central repository can work just as well for a distributed repository. > At some point, the decision about how to handle line endings in > cross-platform data needs to be punted to a human for a > context-sensitive assessment, since (as can be seen) the above list of > requirements is internally inconsistent and can't be relegated to a > one-size-fits-all algorithm. Right - there needs to be a way for the user to specify what line endings to use. That's why both CVS and subversion have supported such configuration, on a per file basis, for many years. I can't see why hg couldn't, in principle, support the same configuration. Being a DVCS, such configuration would have to be part of the clone, of course, being versioned, and all that. I think hg is well capable of keeping versioned configuration information in the clone, as demonstrated by the .hgignore files. Regards, Martin From skippy.hammond at gmail.com Wed Aug 5 09:44:18 2009 From: skippy.hammond at gmail.com (Mark Hammond) Date: Wed, 05 Aug 2009 17:44:18 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A79363E.1040606@v.loewis.de> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> Message-ID: <4A793852.1070106@gmail.com> On 5/08/2009 5:35 PM, "Martin v. L?wis" wrote: > Now, the specific outcome of the process means that more work needs to > be done. So we have a *second* PEP, and we have a lack of volunteers > that help implementing it. The second PEP hasn't been approved yet > (as it isn't complete, yet), so migration to hg is stalled. > The primary volunteer (Dirkjan) has indicated that he can't help with > that specific issue, so other volunteers need to step forward, or we > cannot move to hg. I don't recall Dirkjan saying he can't help with that issue - was it a lack of time, or a lack of understanding the problem/lack of a Windows environment? The problem I see is a lack of agreement about exactly what the solution entails. I believe there is general agreement win32text needs to be enhanced to support versioned 'rules'. But even with that, the only option I see is a truly cross-platform extension to implement these rules which every Python committer, regardless of operating-system, is expected to use - but that doesn't seem the consensus. As mentioned, I'm willing to lend manpower for this once there is agreement on something workable... Cheers, Mark From martin at v.loewis.de Wed Aug 5 09:57:29 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 05 Aug 2009 09:57:29 +0200 Subject: [Python-Dev] Mercurial migration: help needed Message-ID: <4A793B69.3000800@v.loewis.de> This is a repost from a month ago. It didn't get much feedback last time. I have now two items. In this thread, I'd like to collect things that ought to be done but where Dirkjan has indicated that he would prefer if somebody else did it. Item 1 ------ The first item is build identification. If you want to work on this, please either provide a patch (for trunk and/or py3k), or (if you are a committer) create a subversion branch. It seems that Barry and I agree that for the maintenance branches, sys.subversion should be frozen, so we need actually two sets of patches: one that removes sys.subversion entirely, and the other that freezes the branch to the respective one, and freezes the subversion revision to None. Of course, it seems that the actual representation of branches hasn't been determined yet, so the build process integration may need to be changed if named branches aren't going to be used in the end. Anybody working on this should have good knowledge of the Python source code, Mercurial, and either autoconf or Visual Studio (preferably both). Item 2 ------ The second item is line conversion hooks. Dj Gilcrease has posted a solution which he considers a hack himself. Mark Hammond has also volunteered, but it seems some volunteer needs to be "in charge", keeping track of a proposed solution until everybody agrees that it is a good solution. It may be that two solutions are necessary: a short-term one, that operates as a hook and has limitations, and a long-term one, that improves the hook system of Mercurial to implement the proper functionality (which then might get shipped with Mercurial in a cross-platform manner). Regards, Martin From ben+python at benfinney.id.au Wed Aug 5 10:00:57 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 05 Aug 2009 18:00:57 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> <4A7921EB.30007@gmail.com> <877hxirbz6.fsf@benfinney.id.au> <4A793569.1000008@gmail.com> Message-ID: <873a86r8p2.fsf@benfinney.id.au> Mark Hammond writes: > As already mentioned in this thread, a capability similar to what svn > or cvs offers would be sufficient. That capability presented by centralised VCSen is entirely dependent on the fact that they *are* centralised. Using a distributed VCS means the same capability doesn't apply. > While a DVCS does offer unique challenges, it seems to me that doing > something at commit time without requiring magic hooks be configured > would go a long way to addressing the problem. The hand-waving ?doing something? is exactly what needs to be solved. > Magic hooks on the official repo would then be considered the final > fallback defense, but should rarely be invoked. Right, so that's ?capability similar to centralised VCS? out of consideration; I'm glad we agree in the end. > I'm not sure what point you are trying to make That I disagree with your position. You seem to think that the problem has an obvious solution, which is not true; and that choice of a distributed VCS should be delayed until the problem is solved, which I don't agree with. > but I believe it *is* possible for a solution to be found here which > will keep Windows users happy. I'm guessing you haven't had much > practical experience with this problem, so probably don't see this is > clearly as Windows users do. Your guess is incorrect; I've been bitten time and again by this problem in many different contexts, enough to know that it's not obvious what the ?right? solution is. -- \ ?Not to perambulate the corridors in the hours of repose in the | `\ boots of ascension.? ?ski hotel, Austria | _o__) | Ben Finney From skippy.hammond at gmail.com Wed Aug 5 10:09:24 2009 From: skippy.hammond at gmail.com (Mark Hammond) Date: Wed, 05 Aug 2009 18:09:24 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <873a86r8p2.fsf@benfinney.id.au> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> <4A7921EB.30007@gmail.com> <877hxirbz6.fsf@benfinney.id.au> <4A793569.1000008@gmail.com> <873a86r8p2.fsf@benfinney.id.au> Message-ID: <4A793E34.8060508@gmail.com> On 5/08/2009 6:00 PM, Ben Finney wrote: > Mark Hammond writes: > >> As already mentioned in this thread, a capability similar to what svn >> or cvs offers would be sufficient. > > That capability presented by centralised VCSen is entirely dependent on > the fact that they *are* centralised. Using a distributed VCS means the > same capability doesn't apply. Why do you say that (without justification I might add ) about this issue? >> While a DVCS does offer unique challenges, it seems to me that doing >> something at commit time without requiring magic hooks be configured >> would go a long way to addressing the problem. > > The hand-waving ?doing something? is exactly what needs to be solved. I think you have been mis-reading this thread. It is quite clear what 'doing something' means in this context - it means implement the human-defined rules for the line-ending policy for the repository. >> Magic hooks on the official repo would then be considered the final >> fallback defense, but should rarely be invoked. > > Right, so that's ?capability similar to centralised VCS? out of > consideration; I'm glad we agree in the end. I'm afraid you have lost me again, as clearly we don't agree on what useful things can be done at local commit time. >> I'm not sure what point you are trying to make > > That I disagree with your position. You seem to think that the problem > has an obvious solution, which is not true; and that choice of a > distributed VCS should be delayed until the problem is solved, which I > don't agree with. Fair enough - but it seems clear to enough of us that we can make progress and meet the requirements of the people actually impacted. > >> but I believe it *is* possible for a solution to be found here which >> will keep Windows users happy. I'm guessing you haven't had much >> practical experience with this problem, so probably don't see this is >> clearly as Windows users do. > > Your guess is incorrect; I've been bitten time and again by this problem > in many different contexts, enough to know that it's not obvious what > the ?right? solution is. Sorry about that - but that was the only way I could explain you not seeing how such a solution can work. Cheers, Mark From martin at v.loewis.de Wed Aug 5 10:09:47 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 05 Aug 2009 10:09:47 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A793852.1070106@gmail.com> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> Message-ID: <4A793E4B.30101@v.loewis.de> >> Now, the specific outcome of the process means that more work needs to >> be done. So we have a *second* PEP, and we have a lack of volunteers >> that help implementing it. The second PEP hasn't been approved yet >> (as it isn't complete, yet), so migration to hg is stalled. >> The primary volunteer (Dirkjan) has indicated that he can't help with >> that specific issue, so other volunteers need to step forward, or we >> cannot move to hg. > > I don't recall Dirkjan saying he can't help with that issue - was it a > lack of time, or a lack of understanding the problem/lack of a Windows > environment? I think he said (at some point) that he is not a Windows user, and thus can't really help. Of course, he also indicated that, as a Mercurial contributor, he is willing to help as much as he can. > The problem I see is a lack of agreement about exactly what the solution > entails. I believe there is general agreement win32text needs to be > enhanced to support versioned 'rules'. But even with that, the only > option I see is a truly cross-platform extension to implement these > rules which every Python committer, regardless of operating-system, is > expected to use - but that doesn't seem the consensus. > > As mentioned, I'm willing to lend manpower for this once there is > agreement on something workable... I think it needs to work the other way 'round. Somebody (perhaps you) needs to propose a hook and configuration settings, and propose that this hook is used on every system, and that refusal to use these hooks could lead to changes not being integratable (is that a word?). There can't be consensus to use a solution that doesn't exist. My personal favorite outcome would be this: - most files have svn's "native" eol style; they get stored in LF in the repository; the hook will convert them on Windows, and check on Unix. - some files have "windows" eol style; they get stored in CRLF. The hook will not convert, but only check. - not sure whether some files need to be declared as "unix" eol style. - some files are "binary"; they get stored as-is - the hook will do nothing. With such a setup, using the hook would be truly optional on Unix, as it only ever checks and never converts. So if you manage to mess up, and don't have the hook installed on Unix, you lose when trying to push. That will teach you to be more careful in the future, or to install the hook (which hopefully becomes built into Mercurial at some point). Whether it is actually possible to implement all that, I don't know. Regards, Martin From martin at v.loewis.de Wed Aug 5 10:12:38 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 05 Aug 2009 10:12:38 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <873a86r8p2.fsf@benfinney.id.au> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> <4A7921EB.30007@gmail.com> <877hxirbz6.fsf@benfinney.id.au> <4A793569.1000008@gmail.com> <873a86r8p2.fsf@benfinney.id.au> Message-ID: <4A793EF6.7030707@v.loewis.de> >> As already mentioned in this thread, a capability similar to what svn >> or cvs offers would be sufficient. > > That capability presented by centralised VCSen is entirely dependent on > the fact that they *are* centralised. Using a distributed VCS means the > same capability doesn't apply. Why do you say that? People have demonstrated the contrary already. >> I'm not sure what point you are trying to make > > That I disagree with your position. You seem to think that the problem > has an obvious solution, which is not true; and that choice of a > distributed VCS should be delayed until the problem is solved, which I > don't agree with. But is *has* an obvious solution. See the implementation from Dj Gilcrease, or the spec that I just posted. > Your guess is incorrect; I've been bitten time and again by this problem > in many different contexts, enough to know that it's not obvious what > the ?right? solution is. The configuration options of svn have served us well enough. Regards, Martin From nyamatongwe at gmail.com Wed Aug 5 10:25:08 2009 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Wed, 5 Aug 2009 18:25:08 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A79363E.1040606@v.loewis.de> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> Message-ID: <50862ebd0908050125u53e19515j9c0542e4009fc2fd@mail.gmail.com> Martin v. L?wis: > Is it really that you don't *understand*? It's fairly easy: there was > a PEP ... The PEP process is straightforward. However, a PEP may produce an outcome that proves after more experience to be wrong. ISTM a prerequisite to choosing a DVCS is that it should support the full range of development platforms and thus the PEP was accepted prematurely. At some point the PEP should be reexamined and, if necessary, rescinded. What I don't understand is why the plan is still to move to hg despite, after several months, there not being a known good way to include Windows eol support. Neil From dirkjan at ochtman.nl Wed Aug 5 10:25:19 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 5 Aug 2009 10:25:19 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A78C793.9020409@skippinet.com.au> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> Message-ID: On Wed, Aug 5, 2009 at 01:43, Mark Hammond wrote: > Thanks Nick; I didn't want to be the only one saying that. ?There is a fine > line between asserting reasonable requirements for Windows users and being > obstructionist and unhelpful, and I'm trying to stay on the former side :) I'm not trying to be obstructionist and unhelpful (I hope that should be obvious). On the other hand, I'm working from the point of view of hg, which has two assumptions: - we're a distributed system, there's fairly little we can assume about clients - we exchange checksummed byte streams (even if we have some tools that assume those streams are code) - because of the previous point, there's one native (and therefore better, in a sense) serialization of what you consider "structured" data The first point means, for example, there will always be some clients who don't have win32text enabled, no matter what, so you can't rely on it, which is why I want to make the server hooks the primary line of defense, and view the client-side tools as helper tools (to make it easy not to trigger the server-side hooks). That doesn't mean I think Windows users are second-rate, or anything like that! > I'm not that happy with the server being the primary line of defense. Let's > say I make a branch of the hg repo, myself and a few others work on it > committing as we go, then attempt to merge back upstream. ?Let's say some of > the early commits on that clone introduced "bad" line endings. ?I'm guessing > I would be forced to make a number of whitespace-only checkins to normalize > the line-endings before it could merge - and these checkins would then be in > the history forever. ?Or I could attempt to recreate the clone by somehow > "replaying" the commits with line endings corrected. ?Either way, the > situation doesn't seem good. I don't think either is bad. In the first case, you have one or maybe two extra changesets. As we like to advocate small changesets that fix one thing, a changeset fixing up whitespace is par for the course. ;) The other solution would be to employ mq, for example, to fix up the commits, which mq excels at (although admittedly it has a learning curve). > I agree. ?It isn't fair to make this windows users problem. ?It would be > like me proposing the repo get imported with \r\n line endings, enforce that > with server side hooks, and let non-Windows users worry about the > ramifications of that - somehow I doubt that would fly - so neither should > it fly for Windows users... > > I'm more than willing to help on this; I haven't resurrected my stale patch > because I find win32text only 1/2 a solution that doesn't work in practice. > ?Therefore that patch is as stale for me as it is anyone. However, if a plan > is put in place which offers a full solution and the hg developers are > committed to it, I promise I'll put my hand up to help with implementation > in a fairly timely manner... Well, I'd be happy to help convince the hg crew to accept whatever we come up with, but I'm not sure I'm the best person to come up with it. It sounds like a versioned .hgeols would help a bunch of issues, but I have the feeling you know that better than me, so I'm hoping you can come up with a concrete proposal on what should change in win32text to fix all the problems you see. Cheers, Dirkjan From martin at v.loewis.de Wed Aug 5 10:41:41 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 05 Aug 2009 10:41:41 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <50862ebd0908050125u53e19515j9c0542e4009fc2fd@mail.gmail.com> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <50862ebd0908050125u53e19515j9c0542e4009fc2fd@mail.gmail.com> Message-ID: <4A7945C5.5000303@v.loewis.de> > The PEP process is straightforward. However, a PEP may produce an > outcome that proves after more experience to be wrong. ISTM a > prerequisite to choosing a DVCS is that it should support the full > range of development platforms and thus the PEP was accepted > prematurely. To be as blunt as possible: the PEP was accepted because Guido really, Really, REALLY wanted to switch to Mercurial. So you would have to convince Guido to revert his decision. You may not like the decision (I did not like using a DVCS in the first place), but following such decisions has served us well, and will serve us well this time. > At some point the PEP should be reexamined and, if > necessary, rescinded. What I don't understand is why the plan is still > to move to hg despite, after several months, there not being a known > good way to include Windows eol support. You don't understand why it takes many months? That's also easy: because there is a single volunteer, and because there is a lot of work. I think it took me a year to migrate to subversion back then, and I wouldn't be surprised if the Mercurial migration takes even longer. Or don't you understand why that single unresolved item didn't manage to revert the decision? Well, there are many unresolved items in the Mercurial conversion, some much more stressful than the eol issue (e.g. the branching discussion). None of them is unsolvable (AFAICT); you can either contribute to the solution, and sit back and wait for solutions to emerge. Then you can vote on PEP 385 up or down still. Regards, Martin From martin at v.loewis.de Wed Aug 5 10:51:46 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 05 Aug 2009 10:51:46 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> Message-ID: <4A794822.80704@v.loewis.de> > - we're a distributed system, there's fairly little we can assume about clients Not as Mercurial, no. As Python, we can certainly expect that all of our contributors have read the developer FAQ, and set up their systems accordingly. If all else fails, we can revoke commit access (or is it "push access"?) if some committer doesn't get the configuration right. We would, of course, prefer if it was very easy to get the configuration right, so that problems don't occur in the first place. > The first point means, for example, there will always be some clients > who don't have win32text enabled, no matter what, so you can't rely on > it, which is why I want to make the server hooks the primary line of > defense I think it's a terminology issue only: don't say "primary", say "last". Can we agree that the "last" line of defense will be the server hooks, and the "primary" line of defense will be the client commits? "primary" would mean that this is were most errors are detected and fixed; Mark would really object to a flow where most errors are detected only at the server. > That doesn't mean I think > Windows users are second-rate, or anything like that! If the server hooks were the primary line of defense, it would effectively make Windows users second-rate: they will have to redo all their changes over-and-over again, whereas the Unix users can push the changes without any obstacles (just because they are less likely to make mistakes). If the client machines were the primary line of defense, Windows users were treated equally: they would make as few mistakes as Unix users, because the hooks do what they want correctly. > I don't think either is bad. In the first case, you have one or maybe > two extra changesets. As we like to advocate small changesets that fix > one thing, a changeset fixing up whitespace is par for the course. ;) Whitespace-only changes hurt the "annotate" feature, so we dislike them very much in Python. > Well, I'd be happy to help convince the hg crew to accept whatever we > come up with, but I'm not sure I'm the best person to come up with it. That is all very well. See my other message (asking for volunteers) as well. If you have more work you would prefer to delegate, please let us know. Regards, Martin From skippy.hammond at gmail.com Wed Aug 5 11:02:08 2009 From: skippy.hammond at gmail.com (Mark Hammond) Date: Wed, 05 Aug 2009 19:02:08 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> Message-ID: <4A794A90.8090502@gmail.com> On 5/08/2009 6:25 PM, Dirkjan Ochtman wrote: > On Wed, Aug 5, 2009 at 01:43, Mark Hammond wrote: >> Thanks Nick; I didn't want to be the only one saying that. There is a fine >> line between asserting reasonable requirements for Windows users and being >> obstructionist and unhelpful, and I'm trying to stay on the former side :) > > I'm not trying to be obstructionist and unhelpful (I hope that should > be obvious). It is, and I hope I didn't imply otherwise. > On the other hand, I'm working from the point of view of > hg, which has two assumptions: > > - we're a distributed system, there's fairly little we can assume about clients > - we exchange checksummed byte streams (even if we have some tools > that assume those streams are code) > - because of the previous point, there's one native (and therefore > better, in a sense) serialization of what you consider "structured" > data > > The first point means, for example, there will always be some clients > who don't have win32text enabled, no matter what, so you can't rely on > it, which is why I want to make the server hooks the primary line of > defense, and view the client-side tools as helper tools (to make it > easy not to trigger the server-side hooks). That doesn't mean I think > Windows users are second-rate, or anything like that! In general I agree - although I think we can enforce a "social contract" which puts requirements on people who commit to the Python repository - and therefore we can consider the server-side hooks a "secondary" defense. IOW, the system (including the social aspects of the system) are setup such that the server-side hooks are very rarely called upon. >> I'm not that happy with the server being the primary line of defense. Let's >> say I make a branch of the hg repo, myself and a few others work on it >> committing as we go, then attempt to merge back upstream. Let's say some of >> the early commits on that clone introduced "bad" line endings. I'm guessing >> I would be forced to make a number of whitespace-only checkins to normalize >> the line-endings before it could merge - and these checkins would then be in >> the history forever. Or I could attempt to recreate the clone by somehow >> "replaying" the commits with line endings corrected. Either way, the >> situation doesn't seem good. > > I don't think either is bad. With all due respect, I suspect that is because you don't expect to see the issue regularly. This proposal still leaves the problem squarely in the lap of Windows users and imposes a burden on them that would probably be considered unreasonable if the situation was reversed. I'm yet to work on a hg repository without mixed line endings. If I understand correctly, every such repository would have involved a developer checking in locally, than at some point in the future pushing these changes upstream. I really really don't want hg to tell me at this final step that I need to perform whitespace only fixes purely because I am running Windows. I understand we are discussing how win32text can offer that - but I must object to your assertion that the situation I described isn't bad when you hit it. > Well, I'd be happy to help convince the hg crew to accept whatever we > come up with, but I'm not sure I'm the best person to come up with it. > It sounds like a versioned .hgeols would help a bunch of issues, but I > have the feeling you know that better than me, so I'm hoping you can > come up with a concrete proposal on what should change in win32text to > fix all the problems you see. Actually, I think it is easy to make this problem much easier to understand; mandate every platform should use win32text, then start collating the issues people, including yourself, will no doubt face. I'm happy to get this ball rolling, but again, don't want this left purely in the domain of "it is a windows problem" - it isn't. Cheers, Mark From dirkjan at ochtman.nl Wed Aug 5 11:04:40 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 5 Aug 2009 11:04:40 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A794822.80704@v.loewis.de> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794822.80704@v.loewis.de> Message-ID: On Wed, Aug 5, 2009 at 10:51, "Martin v. L?wis" wrote: > Not as Mercurial, no. As Python, we can certainly expect that all of our > contributors have read the developer FAQ, and set up their systems > accordingly. If all else fails, we can revoke commit access (or is > it "push access"?) if some committer doesn't get the configuration > right. We would, of course, prefer if it was very easy to get the > configuration right, so that problems don't occur in the first place. There will also be non-committers who forge changesets that you want to be able to push directly to the Python repositories. > If the client machines were the primary line of defense, Windows users > were treated equally: they would make as few mistakes as Unix users, > because the hooks do what they want correctly. Similarly, if Python kept its .py files in \r\n line endings by default instead of \n endings, Unix-like users would be more prone to mistake, so by keeping the .py files in \n-format, so Python is making Windows users second-rate by keeping the line endings in \n format. To cope with that, hg needs to do extra work on the client side. Cheers, Dirkjan From nyamatongwe at gmail.com Wed Aug 5 11:09:10 2009 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Wed, 5 Aug 2009 19:09:10 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A7945C5.5000303@v.loewis.de> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <50862ebd0908050125u53e19515j9c0542e4009fc2fd@mail.gmail.com> <4A7945C5.5000303@v.loewis.de> Message-ID: <50862ebd0908050209g7da4133fi2b859160f6fd17fb@mail.gmail.com> Martin v. L?wis: > Or don't you understand why that single unresolved item didn't manage > to revert the decision? Well, there are many unresolved items in > the Mercurial conversion, some much more stressful than the eol issue > (e.g. the branching discussion). Then these issues should have been included in the initial PEP for choosing a DVCS since the issues could have driven the choice. PEP 374 implies that win32text effectively solves the Windows eol issue which no longer appears to be correct. Neil From dirkjan at ochtman.nl Wed Aug 5 11:09:44 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 5 Aug 2009 11:09:44 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A794A90.8090502@gmail.com> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> Message-ID: On Wed, Aug 5, 2009 at 11:02, Mark Hammond wrote: > In general I agree - although I think we can enforce a "social contract" > which puts requirements on people who commit to the Python repository - and > therefore we can consider the server-side hooks a "secondary" defense. ?IOW, > the system (including the social aspects of the system) are setup such that > the server-side hooks are very rarely called upon. Agreed. > With all due respect, I suspect that is because you don't expect to see the > issue regularly. I suspect so, too! > I'm yet to work on a hg repository without mixed line endings. ?If I > understand correctly, every such repository would have involved a developer > checking in locally, than at some point in the future pushing these changes > upstream. ?I really really don't want hg to tell me at this final step that > I need to perform whitespace only fixes purely because I am running Windows. > > I understand we are discussing how win32text can offer that - but I must > object to your assertion that the situation I described isn't bad when you > hit it. I agree it is to be avoided, I'm just saying that I think it will be exceptional and therefore not a large burden, given other kinds of defenses we can put in place. > Actually, I think it is easy to make this problem much easier to understand; > mandate every platform should use win32text, then start collating the issues > people, including yourself, will no doubt face. I'm happy to get this ball > rolling, but again, don't want this left purely in the domain of "it is a > windows problem" - it isn't. I'm not sure how win32text will provide anything other than performance degradation for non-Windows developers, but if there's functionality to be had, I'm happy to mandate its use on every platform. Cheers, Dirkjan From martin at v.loewis.de Wed Aug 5 11:12:58 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 05 Aug 2009 11:12:58 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794822.80704@v.loewis.de> Message-ID: <4A794D1A.2060105@v.loewis.de> >> Not as Mercurial, no. As Python, we can certainly expect that all of our >> contributors have read the developer FAQ, and set up their systems >> accordingly. If all else fails, we can revoke commit access (or is >> it "push access"?) if some committer doesn't get the configuration >> right. We would, of course, prefer if it was very easy to get the >> configuration right, so that problems don't occur in the first place. > > There will also be non-committers who forge changesets that you want > to be able to push directly to the Python repositories. They will also have to follow the policies we set up. If they refuse to do that, we refuse to accept their changes. It's very simple, and contributors have learned very quickly what the policies were (after they were explained to them). Whether that means that they have to fix their changesets, or that they have to redo them, practice will show. >> If the client machines were the primary line of defense, Windows users >> were treated equally: they would make as few mistakes as Unix users, >> because the hooks do what they want correctly. > > Similarly, if Python kept its .py files in \r\n line endings by > default instead of \n endings, Unix-like users would be more prone to > mistake, so by keeping the .py files in \n-format, so Python is making > Windows users second-rate by keeping the line endings in \n format. To > cope with that, hg needs to do extra work on the client side. I think you still miss the point. *If* hg does the extra work, *then* Windows users are *not* second-class citizens anymore. They *only* consider themselves second-class if they have to do additional *manual* work (*). Regards, Martin (*) They may also consider themselves second-class if they have to install additional software, so hopefully, the necessary extra code for hg will become part of the regular Mercurial distribution at some point. From martin at v.loewis.de Wed Aug 5 11:16:37 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 05 Aug 2009 11:16:37 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> Message-ID: <4A794DF5.9090100@v.loewis.de> > I'm not sure how win32text will provide anything other than > performance degradation for non-Windows developers, but if there's > functionality to be had, I'm happy to mandate its use on every > platform. This is all fairly hypothetical - if hg grew a .hgeols file, it would be good if it supported that cross-platform. It then may make win32text obsolete (in particular if it provided some useful defaults). On Unix, the functionality might be as simple as checking conformance with the eol-style at pre-commit time. Regards, Martin From skippy.hammond at gmail.com Wed Aug 5 11:17:58 2009 From: skippy.hammond at gmail.com (Mark Hammond) Date: Wed, 05 Aug 2009 19:17:58 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> Message-ID: <4A794E46.2030700@gmail.com> On 5/08/2009 7:09 PM, Dirkjan Ochtman wrote: > I'm not sure how win32text will provide anything other than > performance degradation for non-Windows developers, but if there's > functionality to be had, I'm happy to mandate its use on every > platform. I see two practical outcomes of such a mandate: * line-ending rules are enforced for local checkins, even for linux users, even though such 'accidental' inappropriate line-ending checkins should be much rarer than for windows. * practical problems faced by Windows users, including any performance considerations, are shared by the community and therefore addressed as a community, thereby ensuring all platforms are considered as important as any other. Cheers, Mark From ben+python at benfinney.id.au Wed Aug 5 11:42:05 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 05 Aug 2009 19:42:05 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> <4A7921EB.30007@gmail.com> <877hxirbz6.fsf@benfinney.id.au> <4A793569.1000008@gmail.com> <873a86r8p2.fsf@benfinney.id.au> <4A793E34.8060508@gmail.com> Message-ID: <87prbappg2.fsf@benfinney.id.au> "Martin v. L?wis" writes: > > You seem to think that the problem has an obvious solution, which is > > not true; > But is *has* an obvious solution. See the implementation from Dj > Gilcrease, or the spec that I just posted. Two different solutions are both obvious? There are other solutions proposed elsewhere too; are they also obvious? Mark Hammond writes: > I think you have been mis-reading this thread. Quite possibly; I'm not intending to impose my position on anyone. I'll go back to lurking on the thread for a while and see if it becomes any clearer. -- \ ?First things first, but not necessarily in that order.? ?The | `\ Doctor, _Doctor Who_ | _o__) | Ben Finney From p.f.moore at gmail.com Wed Aug 5 12:04:42 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 5 Aug 2009 11:04:42 +0100 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A793E4B.30101@v.loewis.de> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> Message-ID: <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> 2009/8/5 "Martin v. L?wis" : > My personal favorite outcome would be this: > - most files have svn's "native" eol style; they get stored in LF > ?in the repository; the hook will convert them on Windows, and check > ?on Unix. > - some files have "windows" eol style; they get stored in CRLF. > ?The hook will not convert, but only check. > - not sure whether some files need to be declared as "unix" eol style. > - some files are "binary"; they get stored as-is - the hook will > ?do nothing. > > With such a setup, using the hook would be truly optional on Unix, > as it only ever checks and never converts. So if you manage to mess > up, and don't have the hook installed on Unix, you lose when trying > to push. That will teach you to be more careful in the future, or > to install the hook (which hopefully becomes built into Mercurial at > some point). Given that my preference is to use Unix-style EOL for "text" files on Windows, as every text editor I use (barring notepad!) understands LF format, it seems to me that this proposal also means that the hook would be optional for me. That suits me fine - I'd prefer to avoid having hooks that are required for Python checkouts, as that means I have to remember to configure them on each clone (IIUC). Of course, this implies that your proposal only requires any action by the user in the case of Windows users whose text editing tools insist on CRLF format text files (sources, etc). Is that really a large group of developers? (I honestly don't know). I suspect that there is something missing from your proposal, as if this were the case, then the problem appears to be limited to a very small group of developers. Maybe it's Visual Studio that insists on CRLF for source files? (I don't know, as I don't use the VS editor). If that's the case, then maybe a VS hook would be an alternative approach? (I can't imagine such a hook would be an *easier* approach, I only mention it because it makes it clearer where the issue lies). Paul. From dirkjan at ochtman.nl Wed Aug 5 12:14:24 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 5 Aug 2009 12:14:24 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> Message-ID: On Wed, Aug 5, 2009 at 12:04, Paul Moore wrote: > Given that my preference is to use Unix-style EOL for "text" files on > Windows, as every text editor I use (barring notepad!) understands LF > format, it seems to me that this proposal also means that the hook > would be optional for me. That suits me fine - I'd prefer to avoid > having hooks that are required for Python checkouts, as that means I > have to remember to configure them on each clone (IIUC). Yeah, this may also be what's making it harder for me to understand the issues. I am actually a Windows user, although I do most of my development on Linux servers through PuTTY. I just always make sure I use editors that respect the file's line endings, and so for those things where I've used hg to version code on Windows (for example, when testing a Firefox extension) and when my colleague who does edit his code inside Windows, I've just used editors that deal with line endings. Typically, in my case, that was either Notepad2 (an awesomely light-weight Notepad replacement) or Komodo (Edit). That solved all of my issues, so I haven't had a need for win32text so far. Cheers, Dirkjan From ncoghlan at gmail.com Wed Aug 5 12:43:12 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 05 Aug 2009 20:43:12 +1000 Subject: [Python-Dev] Functionality in subprocess.Popen.terminate() In-Reply-To: <4A7852A7.50407@janzert.com> References: <171e8a410908031142t52d974f6tea40aaa7f5d8c059@mail.gmail.com> <4A77FF12.3010401@gmail.com> <171e8a410908040801s3f112b9fh4b3c6dc32cc7f9ec@mail.gmail.com> <4A7852A7.50407@janzert.com> Message-ID: <4A796240.2000203@gmail.com> Janzert wrote: > Eric Pruitt wrote: >> Sounds good enough to me but I was wondering if it might be a good >> idea to add a function like "pidinuse" to subprocess as a whole that >> would determine if a process ID was being used and return a simple >> boolean value. I came across a number of people searching for a way to >> determine if a PID was running (Google "python check if pid exists") >> so it seems like the implemented functionality would be of use to the >> community as a whole, not just my wrapper class. >> >> Eric >> > > I'm not sure of the actual details but it seems from your description > that even if you check first a race condition will still exist. > Specifically the subprocess could terminate after the check and before > the TerminateProcess call. So it seems better just to call > TerminateProcess and then correctly handle any possible error. Janzert is correct here - this is a case where ruling out the error completely is impossible, so you're going to have to handle it regardless. A cross platform way of checking if a particular subprocess is still running might be an interesting feature in its own right, but I don't think it will prevent this exception. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From mhammond at skippinet.com.au Wed Aug 5 13:19:24 2009 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 05 Aug 2009 21:19:24 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> Message-ID: <4A796ABC.80801@skippinet.com.au> On 5/08/2009 8:04 PM, Paul Moore wrote: > 2009/8/5 "Martin v. L?wis": >> With such a setup, using the hook would be truly optional on Unix, >> as it only ever checks and never converts. So if you manage to mess >> up, and don't have the hook installed on Unix, you lose when trying >> to push. That will teach you to be more careful in the future, or >> to install the hook (which hopefully becomes built into Mercurial at >> some point). > > Given that my preference is to use Unix-style EOL for "text" files on > Windows, as every text editor I use (barring notepad!) understands LF > format, Most tools that I use will tend to not mix EOL styles in a single file, but will tend to create \r\n line endings for new files I create. Most hg repos I come across don't have mixed line endings within individual files, so I can only guess these files were accidentally introduced in the same way (and indeed I have personally done this.) I'm hoping to be part of the solution instead of part of the problem :) > it seems to me that this proposal also means that the hook > would be optional for me. Technically it would be optional for everyone, of course. However, the solution should be such that everyone, regardless of personal preference, is willing to take the hit. For example, if the repo is converted using \r\n line endings natively, then Windows users would need to take no action either and puts the onus back on you (given your stated preferences) to configure the tool appropriately. I assume you would have no objection to that and would be happy to make that tool optional for me? That suits me fine - I'd prefer to avoid > having hooks that are required for Python checkouts, as that means I > have to remember to configure them on each clone (IIUC). Configuring on each clone would certainly be sub-optimal, so the proposal is this configuration be stored in a versioned file in the repo. > Of course, this implies that your proposal only requires any action by > the user in the case of Windows users whose text editing tools insist > on CRLF format text files (sources, etc). Is that really a large group > of developers? (I honestly don't know). It applies to all files that aren't "native" EOL style - there are just less of them regularly modified than those that are so marked. > I suspect that there is something missing from your proposal, as if > this were the case, then the problem appears to be limited to a very > small group of developers. Maybe it's Visual Studio that insists on > CRLF for source files? (I don't know, as I don't use the VS editor). > If that's the case, then maybe a VS hook would be an alternative > approach? (I can't imagine such a hook would be an *easier* approach, > I only mention it because it makes it clearer where the issue lies). I must concede that Windows developers are the minority here - but assuming we want a level playing field, I don't see how that changes the underlying issue... Cheers, Mark From mhammond at skippinet.com.au Wed Aug 5 13:22:02 2009 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 05 Aug 2009 21:22:02 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> Message-ID: <4A796B5A.9020402@skippinet.com.au> On 5/08/2009 8:14 PM, Dirkjan Ochtman wrote: > endings. Typically, in my case, that was either Notepad2 (an awesomely > light-weight Notepad replacement) or Komodo (Edit). That solved all of > my issues, so I haven't had a need for win32text so far. FWIW, I use komodo and scite as my primary editors, and as mentioned, am personally responsible for accidentally checking in \r\n files into what should be a \n repo. I am slowly and painfully learning to be more careful - IMO, I shouldn't need to... Cheers, Mark From dirkjan at ochtman.nl Wed Aug 5 13:28:49 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 5 Aug 2009 13:28:49 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A796ABC.80801@skippinet.com.au> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796ABC.80801@skippinet.com.au> Message-ID: On Wed, Aug 5, 2009 at 13:19, Mark Hammond wrote: > Configuring on each clone would certainly be sub-optimal, so the proposal is > this configuration be stored in a versioned file in the repo. Even if we do that, enabling hg extensions will still need to be done locally -- although it can be done per-user/box instead of per-clone. Cheers, Dirkjan From skippy.hammond at gmail.com Wed Aug 5 13:46:14 2009 From: skippy.hammond at gmail.com (Mark Hammond) Date: Wed, 05 Aug 2009 21:46:14 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796ABC.80801@skippinet.com.au> Message-ID: <4A797106.8040807@gmail.com> On 5/08/2009 9:28 PM, Dirkjan Ochtman wrote: > On Wed, Aug 5, 2009 at 13:19, Mark Hammond wrote: >> Configuring on each clone would certainly be sub-optimal, so the proposal is >> this configuration be stored in a versioned file in the repo. > > Even if we do that, enabling hg extensions will still need to be done > locally -- although it can be done per-user/box instead of per-clone. That is completely fine, and not unlike SVN where a per-user/box setting generally needs to be set once - but after that everything "just works". Windows developers don't mind taking a hit once ;) The dev guide can make it clear what the expectations are... Cheers, Mark From ncoghlan at gmail.com Wed Aug 5 14:50:39 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 05 Aug 2009 22:50:39 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A794E46.2030700@gmail.com> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> Message-ID: <4A79801F.9080702@gmail.com> Mark Hammond wrote: > On 5/08/2009 7:09 PM, Dirkjan Ochtman wrote: >> I'm not sure how win32text will provide anything other than >> performance degradation for non-Windows developers, but if there's >> functionality to be had, I'm happy to mandate its use on every >> platform. > > I see two practical outcomes of such a mandate: > > * line-ending rules are enforced for local checkins, even for linux > users, even though such 'accidental' inappropriate line-ending checkins > should be much rarer than for windows. > > * practical problems faced by Windows users, including any performance > considerations, are shared by the community and therefore addressed as a > community, thereby ensuring all platforms are considered as important as > any other. The main error that enabling win32text everywhere can catch is the use of a *nix client to accidentally corrupt one of the files that is supposed to have \r\n line endings. It also simplifies the configuration rules in the Python hg FAQ - we would be able to just tell all developers wanting to contribute patches to Python to enable the win32text extension when working with the Python repositories (or clones thereof) without having to worry about what platform they were on. So it seems to me that the main client-side feature we want is a versioned .hgeols file in the repository that allows files to be explicitly nominated as one of: - eol=CRLF (i.e. have \r\n line endings in the repository and should be left that way on the local disk as well - equivalent to SVN eol-style:CRLF) - eol=LF (i.e. have \n line endings in the repository and should be left that way on the local disk as well - equivalent to SVN eol-style:LF) - eol=CR (i.e. have \n line endings in the repository and should be left that way on the local disk as well - equivalent to SVN eol-style:CR) - native text (i.e. always stored in the repository with \n line endings, but uses native line endings on the local disk - equivalent to SVN eol-style:native) - binary (i.e. always reproduced on disk exactly as they are in the repository - equivalent to SVN files without eol-style set at all) The .hgeols file should also allow the repository to define which of the above should be used as the default handling mechanism for text files that are not named in the file (native text, in the specific case of the Python repositories). Files which look like binary files (according to the existing win32text heuristics) would be left alone regardless of what the default handling was set to in .hgeols. win32text would then be enhanced to check for a .hgeols file before falling back to its existing configuration mechanisms. The above basically provides the SVN eol-style feature in a more hg-friendly way. Allowing wildcards in the .hgeols files might be nice, but I don't think it is actually required. We really don't have that many files that are affected by this problem (it's just the fact that it is a number greater than zero that is causing the problem). The server side pre-push hooks for the main Python repositories would be set to reject change sets which didn't meet the above rules. If a patch fails those checks, either the committer can fix it themselves and resubmit, or else send it back to the originator along with a pointer to the section in the dev FAQ that describes the expected client-side configuration. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From python at mrabarnett.plus.com Wed Aug 5 15:35:02 2009 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 05 Aug 2009 14:35:02 +0100 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A79801F.9080702@gmail.com> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> Message-ID: <4A798A86.8010702@mrabarnett.plus.com> Nick Coghlan wrote: > Mark Hammond wrote: >> On 5/08/2009 7:09 PM, Dirkjan Ochtman wrote: >>> I'm not sure how win32text will provide anything other than >>> performance degradation for non-Windows developers, but if there's >>> functionality to be had, I'm happy to mandate its use on every >>> platform. >> I see two practical outcomes of such a mandate: >> >> * line-ending rules are enforced for local checkins, even for linux >> users, even though such 'accidental' inappropriate line-ending checkins >> should be much rarer than for windows. >> >> * practical problems faced by Windows users, including any performance >> considerations, are shared by the community and therefore addressed as a >> community, thereby ensuring all platforms are considered as important as >> any other. > > The main error that enabling win32text everywhere can catch is the use > of a *nix client to accidentally corrupt one of the files that is > supposed to have \r\n line endings. > > It also simplifies the configuration rules in the Python hg FAQ - we > would be able to just tell all developers wanting to contribute patches > to Python to enable the win32text extension when working with the Python > repositories (or clones thereof) without having to worry about what > platform they were on. > > So it seems to me that the main client-side feature we want is a > versioned .hgeols file in the repository that allows files to be > explicitly nominated as one of: > - eol=CRLF (i.e. have \r\n line endings in the repository and should be > left that way on the local disk as well - equivalent to SVN eol-style:CRLF) > - eol=LF (i.e. have \n line endings in the repository and should be left > that way on the local disk as well - equivalent to SVN eol-style:LF) > - eol=CR (i.e. have \n line endings in the repository and should be left > that way on the local disk as well - equivalent to SVN eol-style:CR) > - native text (i.e. always stored in the repository with \n line > endings, but uses native line endings on the local disk - equivalent to > SVN eol-style:native) > - binary (i.e. always reproduced on disk exactly as they are in the > repository - equivalent to SVN files without eol-style set at all) > > The .hgeols file should also allow the repository to define which of the > above should be used as the default handling mechanism for text files > that are not named in the file (native text, in the specific case of the > Python repositories). > > Files which look like binary files (according to the existing win32text > heuristics) would be left alone regardless of what the default handling > was set to in .hgeols. > > win32text would then be enhanced to check for a .hgeols file before > falling back to its existing configuration mechanisms. > > The above basically provides the SVN eol-style feature in a more > hg-friendly way. Allowing wildcards in the .hgeols files might be nice, > but I don't think it is actually required. We really don't have that > many files that are affected by this problem (it's just the fact that it > is a number greater than zero that is causing the problem). > > The server side pre-push hooks for the main Python repositories would be > set to reject change sets which didn't meet the above rules. If a patch > fails those checks, either the committer can fix it themselves and > resubmit, or else send it back to the originator along with a pointer to > the section in the dev FAQ that describes the expected client-side > configuration. > Instead of just talking about line endings, could each file have a specific 'filetype'? This would define what kind of data it contains, how it's stored in the repository, and what actions to perform for fetching and committing, including any checks: c_header: C header file; LF in repository; native outside c_source: C source file; LF in repository; native outside text: plain text; LF in repository; native outside crlf_text: plain text; CRLF in repository; CRLF outside cr_text: plain text; CR in repository; CR outside lf_text: plain text; LF in repository; LF outside binary: arbitrary binary data; as-is in repository This could be expanded in the future to include filetypes for JPEG, etc. From dirkjan at ochtman.nl Wed Aug 5 15:37:57 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 5 Aug 2009 15:37:57 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A798A86.8010702@mrabarnett.plus.com> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> Message-ID: On Wed, Aug 5, 2009 at 15:35, MRAB wrote: > Instead of just talking about line endings, could each file have a > specific 'filetype'? This would define what kind of data it contains, > how it's stored in the repository, and what actions to perform for > fetching and committing, including any checks: Sounds like YAGNI to me. The outline Nick provided seems to me to be quite close to the current win32text settings in syntax and purpose and staying close to that would help making adoption easier. Cheers, Dirkjan From phd at phd.pp.ru Wed Aug 5 15:50:03 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Wed, 5 Aug 2009 17:50:03 +0400 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A798A86.8010702@mrabarnett.plus.com> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> Message-ID: <20090805135003.GC28780@phd.pp.ru> On Wed, Aug 05, 2009 at 02:35:02PM +0100, MRAB wrote: > Instead of just talking about line endings, could each file have a > specific 'filetype'? EOL-conversion, MIME type and encoding (charset) are three different concepts. Yes, all of them must be supported, but not necessary in one configuration mechanism. Subversion handles these issues by providing svn:eol-style and svn:mime-type (handles both MIME type and charset) properties on a file-by-file basis. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From phd at phd.pp.ru Wed Aug 5 15:57:59 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Wed, 5 Aug 2009 17:57:59 +0400 Subject: [Python-Dev] PEP 385: Mercurial issues In-Reply-To: <20090805135003.GC28780@phd.pp.ru> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <20090805135003.GC28780@phd.pp.ru> Message-ID: <20090805135759.GD28780@phd.pp.ru> On Wed, Aug 05, 2009 at 05:50:03PM +0400, Oleg Broytmann wrote: > Subversion handles these issues by providing ... > svn:mime-type (handles both MIME type and charset) > file-by-file basis. Dirkjan, how does Mercurial handles charsets? If I have three files in my repository - one in utf-8, another in koi8-r, and the third in cp1251 encoding - I certainly don't want to convert them back and force, but I want hg web interface to provide charset in the Content-Type header. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From dirkjan at ochtman.nl Wed Aug 5 16:04:24 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 5 Aug 2009 16:04:24 +0200 Subject: [Python-Dev] PEP 385: Mercurial issues In-Reply-To: <20090805135759.GD28780@phd.pp.ru> References: <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <20090805135003.GC28780@phd.pp.ru> <20090805135759.GD28780@phd.pp.ru> Message-ID: On Wed, Aug 5, 2009 at 15:57, Oleg Broytmann wrote: > ? Dirkjan, how does Mercurial handles charsets? If I have three files in > my repository - one in utf-8, another in koi8-r, and the third in cp1251 > encoding - I certainly don't want to convert them back and force, but I > want hg web interface to provide charset in the Content-Type header. It doesn't currently have any way to provide out-of-band charset info. Cheers, Dirkjan From ncoghlan at gmail.com Wed Aug 5 16:12:08 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 06 Aug 2009 00:12:08 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> Message-ID: <4A799338.3050008@gmail.com> Dirkjan Ochtman wrote: > On Wed, Aug 5, 2009 at 15:35, MRAB wrote: >> Instead of just talking about line endings, could each file have a >> specific 'filetype'? This would define what kind of data it contains, >> how it's stored in the repository, and what actions to perform for >> fetching and committing, including any checks: > > Sounds like YAGNI to me. Yep - while SVN does support full mime_type specification for files, I don't think we have ever used it. The SVN eol-style property is all we're trying to replicate, since that has served us well in the few cases where it has mattered. > The outline Nick provided seems to me to be > quite close to the current win32text settings in syntax and purpose > and staying close to that would help making adoption easier. Yeah, win32text is already tantalising close to what we would like so I deliberately tried to stay close to its existing approach. We're just being a bit fussier than most about the repository being able to tell the clients which files should be given special treatment. That way individual users can just set it up once on their development machine and then no longer have to worry about it (if more files that need special treatment are added to the repository, then the same checkin that adds them should also update .hgeols). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From stephen at xemacs.org Wed Aug 5 16:28:42 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 05 Aug 2009 23:28:42 +0900 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A793569.1000008@gmail.com> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> <4A7921EB.30007@gmail.com> <877hxirbz6.fsf@benfinney.id.au> <4A793569.1000008@gmail.com> Message-ID: <87ws5ibahx.fsf@uwakimon.sk.tsukuba.ac.jp> Mark Hammond writes: > I'm not sure what point you are trying to make, but I believe it *is* > possible for a solution to be found here which will keep Windows users > happy. I'm guessing you haven't had much practical experience with this > problem, so probably don't see this is clearly as Windows users do. Mercurial is not only open source, it's written in Python. The problem is known to be hard in a practical sense, the existing solutions (written by non-Windows developers, of course) are judged to be insufficient by Windows users, and the non-Windows developers "probably don't see this is clearly as Windows users do." I think the implication is obvious. There will be no good solution until Windows users develop it. I don't see a good reason to wait for that. I do see good reason for non-Windows users to put up with some inconvenience during the "beta" phase of implementing that solution; it's important enough to be fast-tracked, and doesn't need to be perfect for everybody to be tried (though it should not be allowed to endanger repo content, which seems unlikely but needs care since it's a potential disaster). From phd at phd.pp.ru Wed Aug 5 16:35:33 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Wed, 5 Aug 2009 18:35:33 +0400 Subject: [Python-Dev] PEP 385: Mercurial issues In-Reply-To: References: <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <20090805135003.GC28780@phd.pp.ru> <20090805135759.GD28780@phd.pp.ru> Message-ID: <20090805143533.GA31621@phd.pp.ru> On Wed, Aug 05, 2009 at 04:04:24PM +0200, Dirkjan Ochtman wrote: > On Wed, Aug 5, 2009 at 15:57, Oleg Broytmann wrote: > > ? Dirkjan, how does Mercurial handles charsets? If I have three files in > > my repository - one in utf-8, another in koi8-r, and the third in cp1251 > > encoding - I certainly don't want to convert them back and force, but I > > want hg web interface to provide charset in the Content-Type header. > > It doesn't currently have any way to provide out-of-band charset info. Perhaps that's not a big issue for Python, but it's certainly a big issue for me. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From dirkjan at ochtman.nl Wed Aug 5 16:40:31 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 5 Aug 2009 16:40:31 +0200 Subject: [Python-Dev] PEP 385: Mercurial issues In-Reply-To: <20090805143533.GA31621@phd.pp.ru> References: <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <20090805135003.GC28780@phd.pp.ru> <20090805135759.GD28780@phd.pp.ru> <20090805143533.GA31621@phd.pp.ru> Message-ID: On Wed, Aug 5, 2009 at 16:35, Oleg Broytmann wrote: > ? Perhaps that's not a big issue for Python, but it's certainly a big > issue for me. I think there are extensions that try to deal with it. Have a look: http://mercurial.selenic.com/wiki/UsingExtensions If not, it should be easy to come up with something and write an extension for it. Cheers, Dirkjan From phd at phd.pp.ru Wed Aug 5 16:58:57 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Wed, 5 Aug 2009 18:58:57 +0400 Subject: [Python-Dev] PEP 385: the charset issue In-Reply-To: <4A799338.3050008@gmail.com> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <4A799338.3050008@gmail.com> Message-ID: <20090805145857.GB31621@phd.pp.ru> On Thu, Aug 06, 2009 at 12:12:08AM +1000, Nick Coghlan wrote: > Yep - while SVN does support full mime_type specification for files, I > don't think we have ever used it. These files are in 8859-1 encoding (names in comments, at least): http://svn.python.org/view/python/trunk/Lib/encodings/punycode.py http://svn.python.org/view/python/trunk/Lib/test/test_csv.py http://svn.python.org/view/python/trunk/Tools/i18n/msgfmt.py http://svn.python.org/view/python/trunk/Tools/i18n/pygettext.py If they are not marked as "text/plain; charset=iso-8859-1" I think it's a bug. Either they should be marked, or converted to ascii or utf-8; the coding pseudocomment (directive) should be changed accordingly. Probably there are other files. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From john.arbash.meinel at gmail.com Wed Aug 5 16:58:50 2009 From: john.arbash.meinel at gmail.com (John Arbash Meinel) Date: Wed, 05 Aug 2009 09:58:50 -0500 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A796B5A.9020402@skippinet.com.au> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796B5A.9020402@skippinet.com.au> Message-ID: <4A799E2A.1090703@gmail.com> Mark Hammond wrote: > On 5/08/2009 8:14 PM, Dirkjan Ochtman wrote: >> endings. Typically, in my case, that was either Notepad2 (an awesomely >> light-weight Notepad replacement) or Komodo (Edit). That solved all of >> my issues, so I haven't had a need for win32text so far. > > FWIW, I use komodo and scite as my primary editors, and as mentioned, am > personally responsible for accidentally checking in \r\n files into what > should be a \n repo. I am slowly and painfully learning to be more > careful - IMO, I shouldn't need to... > > Cheers, > > Mark IIRC one of the main problems in Copy & Paste. I believe both Scite and Visual Studio have had issues where they "preserve" the line endings of files, but if you paste from another source, it will continue to "preserve" the line endings of the pasted content. That said, you also have the "create a new file defaults to CRLF" that has similar problems. John =:-> From solipsis at pitrou.net Wed Aug 5 17:08:25 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 5 Aug 2009 15:08:25 +0000 (UTC) Subject: [Python-Dev] PEP 385: the charset issue References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <4A799338.3050008@gmail.com> <20090805145857.GB31621@phd.pp.ru> Message-ID: Oleg Broytmann phd.pp.ru> writes: > > These files are in 8859-1 encoding (names in comments, at least): > http://svn.python.org/view/python/trunk/Lib/encodings/punycode.py > http://svn.python.org/view/python/trunk/Lib/test/test_csv.py > http://svn.python.org/view/python/trunk/Tools/i18n/msgfmt.py > http://svn.python.org/view/python/trunk/Tools/i18n/pygettext.py > If they are not marked as "text/plain; charset=iso-8859-1" I think it's > a bug. Either they should be marked, or converted to ascii or utf-8; the > coding pseudocomment (directive) should be changed accordingly. It's certainly ok to convert them to utf-8 (and add the marker anyway). There's no point in having different charsets used throughout the code base, except for testing purposes (just as there's no point in having different indentation rules used for the same file type throughout the code base ;-)). Regards Antoine. From stephen at xemacs.org Wed Aug 5 17:34:39 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 06 Aug 2009 00:34:39 +0900 Subject: [Python-Dev] PEP 385: Mercurial issues In-Reply-To: <20090805135759.GD28780@phd.pp.ru> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <20090805135003.GC28780@phd.pp.ru> <20090805135759.GD28780@phd.pp.ru> Message-ID: <87r5vqb7g0.fsf@uwakimon.sk.tsukuba.ac.jp> Oleg Broytmann writes: > Dirkjan, how does Mercurial handles charsets? If I have three files in > my repository - one in utf-8, another in koi8-r, and the third in cp1251 > encoding - I certainly don't want to convert them back and force, but I > want hg web interface to provide charset in the Content-Type header. How is this relevant to PEP 385? I hope the answer is "not at all". I've been there, done that, and my answer is "never again". (I'm not telling you what to do with *your* repository, just that I don't see any good reason for having any encodings but UTF-8 in Python's.) From phd at phd.pp.ru Wed Aug 5 17:54:15 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Wed, 5 Aug 2009 19:54:15 +0400 Subject: [Python-Dev] PEP 385: Mercurial issues In-Reply-To: <87r5vqb7g0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <20090805135003.GC28780@phd.pp.ru> <20090805135759.GD28780@phd.pp.ru> <87r5vqb7g0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20090805155415.GC31621@phd.pp.ru> On Thu, Aug 06, 2009 at 12:34:39AM +0900, Stephen J. Turnbull wrote: > Oleg Broytmann writes: > > Dirkjan, how does Mercurial handles charsets? If I have three files in > > my repository - one in utf-8, another in koi8-r, and the third in cp1251 > > encoding - I certainly don't want to convert them back and force, but I > > want hg web interface to provide charset in the Content-Type header. > > How is this relevant to PEP 385? I hope the answer is "not at all". There are non-utf8 non-ascii files in the Python source tree. Either there should be a way to handle them in Mercurial or they have to be converted to UTF-8 in a proper way (i.e., don't forget to rewrite charset directives). Other tan that - I am pondering a switch from SVN to hg in other projects using Python process as an example and asking questions that are slightly off-topic (but only slightly). > I've been there, done that, and my answer is "never again". (I'm not > telling you what to do with *your* repository, just that I don't see > any good reason for having any encodings but UTF-8 in Python's.) We have files in at least two different encodings - utf-8 and cp1251 for user-visible text-files on w32. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From steve at holdenweb.com Wed Aug 5 18:05:11 2009 From: steve at holdenweb.com (Steve Holden) Date: Wed, 05 Aug 2009 12:05:11 -0400 Subject: [Python-Dev] Microsoft MSDN Message-ID: <4A79ADB7.5080809@holdenweb.com> I sent fourteen requests for licenses in to Microsoft. I've asked them to let me know which they grant (since they may choose to limit the number) and will inform you all personally when I hear their decision. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Watch PyCon on video now! http://pycon.blip.tv/ From p.f.moore at gmail.com Wed Aug 5 18:24:17 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 5 Aug 2009 17:24:17 +0100 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A796ABC.80801@skippinet.com.au> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796ABC.80801@skippinet.com.au> Message-ID: <79990c6b0908050924lf0f5c6ewda5a75779d24cfd0@mail.gmail.com> 2009/8/5 Mark Hammond : > Most tools that I use will tend to not mix EOL styles in a single file, but > will tend to create \r\n line endings for new files I create. ?Most hg repos > I come across don't have mixed line endings within individual files, so I > can only guess these files were accidentally introduced in the same way (and > indeed I have personally done this.) ?I'm hoping to be part of the solution > instead of part of the problem :) Interesting. I don't recall *ever* having generated CRLF line endings in a LF-delimited file (I use Vim) although I may have created CRLF in new files (and then not noticed, as Vim handles it transparently enough that I missed it). There are no significant projects where I'm a committer, though, so I interact via patches, which means I don't get the opportunity to break the repository :-) > Technically it would be optional for everyone, of course. ?However, the > solution should be such that everyone, regardless of personal preference, is > willing to take the hit. > > For example, if the repo is converted using \r\n line endings natively, then > Windows users would need to take no action either and puts the onus back on > you (given your stated preferences) to configure the tool appropriately. ?I > assume you would have no objection to that and would be happy to make that > tool optional for me? Absolutely. My issue is with 2 points: 1) I'm an infrequent contributor, so I don't keep a checkout around. I make a new clone "on demand", so I would be likely to forget to enable the hook on at least a proportion of my clones. The versioned .hgeols proposal seems to cover this. 2) This behaviour is something needed for Python only. I've no issue with enabling win32text globally, but I'd want to be clear that it is a no-op unless specifically requested (ie, something like **=cleverencode is *not* used in the absence of an explicit set of rules). That may well be the case, but I had the impression that win32text tried to be "automatic", so I'd like to verify it. > I must concede that Windows developers are the minority here - but assuming > we want a level playing field, I don't see how that changes the underlying > issue... Again, agreed entirely. As a Windows developer who doesn't (knowingly) encounter the issue, I'm not in a good position to help, but I'm happy to contribute comments and test things. I'll be offline for a couple of weeks, though, so you may well have solved it before I can do anything :-) Paul From v+python at g.nevcal.com Wed Aug 5 19:43:57 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 05 Aug 2009 10:43:57 -0700 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796ABC.80801@skippinet.com.au> Message-ID: <4A79C4DD.6000601@g.nevcal.com> On approximately 8/5/2009 4:28 AM, came the following characters from the keyboard of Dirkjan Ochtman: > On Wed, Aug 5, 2009 at 13:19, Mark Hammond wrote: >> Configuring on each clone would certainly be sub-optimal, so the proposal is >> this configuration be stored in a versioned file in the repo. > > Even if we do that, enabling hg extensions will still need to be done > locally -- although it can be done per-user/box instead of per-clone. On approximately 8/5/2009 9:24 AM, came the following characters from the keyboard of Paul Moore: > 2) This behaviour is something needed for Python only. I've no issue > with enabling win32text globally, but I'd want to be clear that it is > a no-op unless specifically requested (ie, something like > **=cleverencode is *not* used in the absence of an explicit set of > rules). That may well be the case, but I had the impression that > win32text tried to be "automatic", so I'd like to verify it. Depending on [Windows] users to configure their installation of Mercurial to work with the Python repository is lame; it will lead to new Windows contributors getting beat-up at check-in time, and make them less likely to want to contribute even the work they have already done (with wrong EOL), and much less to want to start future contributions, because some Unix Python hacker will be nasty about "Didn't you RTFM?" (Maybe not at first, but eventually). If the configuration settings have to be different per project for Windows developers using Mercurial for multiple projects, then that is also lame... Windows developers would have to keep changing their configurations, or (implied in above discussion) remember to recreate settings for each new clone or branch or whatever of the Python project. This is also error-prone, and leads to the above problem a different way. I have read this whole discussion, but want to step back and look at it from a theoretical viewpoint. A good solution would have the following characteristics: INSTALLATION) The developer should install the [D]VCS (for this discussion, Mercurial, present or future version), and attempt to access a repository (for this discussion, the Python repository, converted and configured for the chosen [D]VCS). The resultant environment should automatically be configured to work properly. If any [D]VCS extensions are required for the project, they should be automatically installed and configured, or the user given explicit instructions on how to do so, as a one-time installation step, that adversely affects no other projects for which the [D]VCS is used by that or other users of the present installation.. See below for what properly means. EOL CONFIGURATION) Each file, when added to the repository, should have a repository setting that indicates what the appropriate EOL type is for that file. The values I have heard are \n only, \r\n, platform-native, and binary. I haven't heard \r only in this discussion, but have heard it in other similar discussions, and it may be a useful setting for Mercurial to have, if the feature must be newly implemented there. I believe there are also systems that use RS to separate lines, and perhaps other things (and are there new Unicode control characters that could be used for line endings?), so it might be good to leave a few unassigned values in such a setting. I don't think any setting should be created to allow mixed line ending usage within a file, except binary. Per repository default for this setting should be available to avoid burdening the user when creating the typical type of file. ENCODING CONFIGURATION) Each file, when created, should have a repository settings that declares its character repertoire and encoding, and if it is a Unicode UTF encoding, whether or not it should have a leading BOM. In my opinion, all source code files should use a Unicode encoding, the exception being for test files that help test encoding support in internationalized environments. But the feature supports other people's opinions too. Per repository default for this setting should be available to avoid burdening the user when creating the typical type of file. CHECKOUT) Check-outs should be sensitive to the user's local environment (platform and locale settings), and non-binary files should be converted from the repository format to the local encoding and platform-specific line endings. Settings to override the line endings should be optionally available for users whose tools understand other line endings, and prefer them over the native line endings. If the characters used within a file cannot be converted losslessly to the encoding specified by the locale settings, then it should not be able to be checked out. A special override might be useful for using a lossy transformation for a read-only view of the file, at user request. CHECKIN) Check-ins, even local check-ins to local clones or branches, should automatically convert encodings and line endings from the platform and locale setting to the encoding and line ending specified by the repository for that file. If the characters in the modified file cannot be transformed losslessly to the repository repertoire and encoding, the check-in should be prevented. The CHECKIN should be a requirement of a useful [D]VCS, regardless of if any other capabilities are present. Even if none of the existing tools can reach the above flexibility, the problems that results from using tools that do not have such flexibility should be understood in terms of their specific deficiencies compared to the theoretical model. I can think of only one other solution that properly handles the problems (which is punting, really): to require the development environment to support the repertoire, encoding, and line endings of the repository. Doing this in a cross-platform manner is hard, because the tool sets (editors, compilers, databases, etc.) tend to support the platform-native convention better than the non-native conventions. It sounds like Mercurial's win32text extension is one form of this sort of requirement. CHECKIN should be a requirement even in this case, to validate the incoming data file. Basic software design requires validation of incoming data. I have no clue how many of these characteristics are implemented by Mercurial (or any other VCS or DVCS, I've been 7 years away from using SCCS, CVS, and Clearcase, but none of them had such features then, and I've not used the modern crop of VCSes much: git, svn, hg, bazaar, except a little in passing, but haven't read any documentation, nor attempted to set up a project myself in any of them). If none of the existing tools can reach the above flexibility, then there will be problems that result, and understanding what the problems are, and coming up with documented workarounds, processes, and auxiliary tools on each platform/envirenment to cure or prevent them, would seem to be necessary to support the use of such tools. Since Mercurial is the presently chosen DVCS for Python to migrate to, I'd be delighted to learn how close it comes to the theoretical model, and I'm sure someone out there knows. When I have some time, I'll attempt to figure that out by reading the Mercurial documentation... I have a personal (Python, cross-platform) project that is in need of a DVCS soon, and so I'm watching this discussion with much interest, to know whether I should also choose Mercurial, or should choose something that is closer to the theoretical solution outlined above (if there is something that is, or appears to be more likely to reach it sooner). -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From lanyjie at yahoo.com Wed Aug 5 20:10:29 2009 From: lanyjie at yahoo.com (Yingjie Lan) Date: Wed, 5 Aug 2009 11:10:29 -0700 (PDT) Subject: [Python-Dev] Reasons for using expy In-Reply-To: <4A797106.8040807@gmail.com> Message-ID: <363186.73070.qm@web54203.mail.re2.yahoo.com> Hi, The expy project provides an express way to extend Python. After some careful considerations, I came up with some reasons for expy (this is not an exhaustive list): (I). WYSIWYG. The expy project enables you to write your module in Python the way your extension would be (WYSIWYG), and meanwhile write your implementation in pure C. You specify your modules, functions, methods, classes, and even their documentations the usual way of writing your Python correspondences. Then your provide your implementation to the functions/methods by returning a multi-line string. By such an arrangement, everything falls in its right place, and your extension code becomes easy to read and maintain. Also, the generated code is very human-friendly. (II). You only provide minimal information to indicate your intension of how your module/class would function in Python. So your extension is largely independent from the Python extension API. As your interaction with the Python extension API is reduced to minimal (you only care about the functionality and logic), it is then possible that your module written in expy can be independent of changes in the extension API. (III). The building and setup of your project can be automatically done with the distutil tool. In the tutorial, there are ample examples on how easily this is achieved. (IV). Very light weight. The expy tool is surprisingly light weight dispite of its powerful ability, as it is written in pure Python. There is no parser or compiler for code generation, but rather the powerful reflexion mechanism of Python is exploited in a clever way to generate human-friendly codes. Currently, generating code in C is supported, however, the implementation is well modularized and code generation in other languages such as Java and C++ should be easy. While there are already a couple of other projects trying to simply this task with different strategies, such as Cython, Pyrex and modulator, this project is unique and charming in its own way. All you need is the WYSIWYG Python file for your module extension, then expy takes care of everything else. What follows in this documentation is on how to extend Python in C using expy-cxpy: the module expy helps define your module, while module cxpy helps generate C codes for your defined module. For more information about expy, please visit its homepage at: http://expy.sf.net/ Cheers, Yingjie From martin at v.loewis.de Wed Aug 5 20:22:27 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 05 Aug 2009 20:22:27 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> Message-ID: <4A79CDE3.9050500@v.loewis.de> >> Given that my preference is to use Unix-style EOL for "text" files on >> Windows, as every text editor I use (barring notepad!) understands LF >> format, it seems to me that this proposal also means that the hook >> would be optional for me. That suits me fine - I'd prefer to avoid >> having hooks that are required for Python checkouts, as that means I >> have to remember to configure them on each clone (IIUC). > > Yeah, this may also be what's making it harder for me to understand > the issues. Please trust that there are plenty of editors that get the line ending implementation wrong. I'm fairly certain that some Visual Studio versions are among them. They will recognize LF as a line ending, but add CRLF line breaks when the user presses enter. In addition, some editors (in particular notepad) choke when confronted with LF-only files. It is very annoying if you have to look at source code at somebody else's machine which doesn't have any programmer editor installed (except for Visual Studio). Regards, Martin From martin at v.loewis.de Wed Aug 5 20:32:09 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 05 Aug 2009 20:32:09 +0200 Subject: [Python-Dev] PEP 385: the charset issue In-Reply-To: <20090805145857.GB31621@phd.pp.ru> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <4A799338.3050008@gmail.com> <20090805145857.GB31621@phd.pp.ru> Message-ID: <4A79D029.9080904@v.loewis.de> > These files are in 8859-1 encoding (names in comments, at least): > http://svn.python.org/view/python/trunk/Lib/encodings/punycode.py > http://svn.python.org/view/python/trunk/Lib/test/test_csv.py > http://svn.python.org/view/python/trunk/Tools/i18n/msgfmt.py > http://svn.python.org/view/python/trunk/Tools/i18n/pygettext.py > If they are not marked as "text/plain; charset=iso-8859-1" I think it's > a bug. Either they should be marked, or converted to ascii or utf-8; the > coding pseudocomment (directive) should be changed accordingly. It's certainly a bug of the web page. I'm not so sure it's a bug in the files: I would claim that it's a bug in ViewCVS. Regards, Martin From martin at v.loewis.de Wed Aug 5 20:35:02 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 05 Aug 2009 20:35:02 +0200 Subject: [Python-Dev] PEP 385: the charset issue In-Reply-To: References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <4A799338.3050008@gmail.com> <20090805145857.GB31621@phd.pp.ru> Message-ID: <4A79D0D6.6040002@v.loewis.de> >> These files are in 8859-1 encoding (names in comments, at least): >> http://svn.python.org/view/python/trunk/Lib/encodings/punycode.py >> http://svn.python.org/view/python/trunk/Lib/test/test_csv.py >> http://svn.python.org/view/python/trunk/Tools/i18n/msgfmt.py >> http://svn.python.org/view/python/trunk/Tools/i18n/pygettext.py >> If they are not marked as "text/plain; charset=iso-8859-1" I think it's >> a bug. Either they should be marked, or converted to ascii or utf-8; the >> coding pseudocomment (directive) should be changed accordingly. > > It's certainly ok to convert them to utf-8 (and add the marker anyway). No, it's not. PEP 8 mandates that non-ASCII code in the Python source code is in Latin-1. Regards, Martin From martin at v.loewis.de Wed Aug 5 20:37:55 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 05 Aug 2009 20:37:55 +0200 Subject: [Python-Dev] PEP 385: Mercurial issues In-Reply-To: <87r5vqb7g0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <20090805135003.GC28780@phd.pp.ru> <20090805135759.GD28780@phd.pp.ru> <87r5vqb7g0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A79D183.7090503@v.loewis.de> > > Dirkjan, how does Mercurial handles charsets? If I have three files in > > my repository - one in utf-8, another in koi8-r, and the third in cp1251 > > encoding - I certainly don't want to convert them back and force, but I > > want hg web interface to provide charset in the Content-Type header. > > How is this relevant to PEP 385? I hope the answer is "not at all". > I've been there, done that, and my answer is "never again". (I'm not > telling you what to do with *your* repository, just that I don't see > any good reason for having any encodings but UTF-8 in Python's.) Just in case my previous message gets overlooked: PEP 8 mandates Latin-1 for Python 2.x source code (except for files that test PEP 263). Regards, Martin From solipsis at pitrou.net Wed Aug 5 21:17:42 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 5 Aug 2009 19:17:42 +0000 (UTC) Subject: [Python-Dev] PEP 385: the charset issue References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <4A799338.3050008@gmail.com> <20090805145857.GB31621@phd.pp.ru> <4A79D0D6.6040002@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > No, it's not. PEP 8 mandates that non-ASCII code in the Python source > code is in Latin-1. Ok, point taken. Having several encodings (and several indentation rules) certainly makes things more annoying for contributors than they should, however. Regards Antoine. From g.brandl at gmx.net Wed Aug 5 21:43:08 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 05 Aug 2009 21:43:08 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <50862ebd0908050209g7da4133fi2b859160f6fd17fb@mail.gmail.com> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <50862ebd0908050125u53e19515j9c0542e4009fc2fd@mail.gmail.com> <4A7945C5.5000303@v.loewis.de> <50862ebd0908050209g7da4133fi2b859160f6fd17fb@mail.gmail.com> Message-ID: Neil Hodgson schrieb: > Martin v. L?wis: > >> Or don't you understand why that single unresolved item didn't manage >> to revert the decision? Well, there are many unresolved items in >> the Mercurial conversion, some much more stressful than the eol issue >> (e.g. the branching discussion). > > Then these issues should have been included in the initial PEP for > choosing a DVCS since the issues could have driven the choice. PEP 374 > implies that win32text effectively solves the Windows eol issue which > no longer appears to be correct. Apparently, it was the author's understanding at that time that win32text would be sufficient. Also, PEP 374 has not been written in isolation; at any time during the process people could have notified Dirkjan that this is not the case. The branching issue *has* been included in PEP 374; it is not a blocker for migration, but rather a decision has to be made between two similar, but in other ways quite different styles for converting SVN branches. I'm not aware of any other unresolved items; they may exist, but the fact that they're not discussed on this list in detail means that they are largely unimportant. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Wed Aug 5 21:56:15 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 05 Aug 2009 21:56:15 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <87ws5ibahx.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> <4A7921EB.30007@gmail.com> <877hxirbz6.fsf@benfinney.id.au> <4A793569.1000008@gmail.com> <87ws5ibahx.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Stephen J. Turnbull schrieb: > Mark Hammond writes: > > > I'm not sure what point you are trying to make, but I believe it *is* > > possible for a solution to be found here which will keep Windows users > > happy. I'm guessing you haven't had much practical experience with this > > problem, so probably don't see this is clearly as Windows users do. > > Mercurial is not only open source, it's written in Python. The > problem is known to be hard in a practical sense, the existing > solutions (written by non-Windows developers, of course) are judged to > be insufficient by Windows users, and the non-Windows developers > "probably don't see this is clearly as Windows users do." > > I think the implication is obvious. There will be no good solution > until Windows users develop it. I don't see a good reason to wait for > that. I do see good reason for non-Windows users to put up with some > inconvenience during the "beta" phase of implementing that solution; It's not that obvious -- we at least need the server-side check that doesn't allow "wrong" line endings as the "last" line of defense, and this check already needs a way to know which files are supposed to have which line endings -- deciding how to specify that is already half of the needed solution. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From mal at egenix.com Wed Aug 5 22:04:46 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 05 Aug 2009 22:04:46 +0200 Subject: [Python-Dev] PEP 385: the charset issue In-Reply-To: <4A79D0D6.6040002@v.loewis.de> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <4A799338.3050008@gmail.com> <20090805145857.GB31621@phd.pp.ru> <4A79D0D6.6040002@v.loewis.de> Message-ID: <4A79E5DE.5050301@egenix.com> "Martin v. L?wis" wrote: >>> These files are in 8859-1 encoding (names in comments, at least): >>> http://svn.python.org/view/python/trunk/Lib/encodings/punycode.py >>> http://svn.python.org/view/python/trunk/Lib/test/test_csv.py >>> http://svn.python.org/view/python/trunk/Tools/i18n/msgfmt.py >>> http://svn.python.org/view/python/trunk/Tools/i18n/pygettext.py >>> If they are not marked as "text/plain; charset=iso-8859-1" I think it's >>> a bug. Either they should be marked, or converted to ascii or utf-8; the >>> coding pseudocomment (directive) should be changed accordingly. >> >> It's certainly ok to convert them to utf-8 (and add the marker anyway). > > No, it's not. PEP 8 mandates that non-ASCII code in the Python source > code is in Latin-1. Then I guess it's time to change PEP 8 for Python 2.7 ... """ Code in the core Python distribution should aways use the ASCII or UTF-8 encoding together with a PEP 263 encoding comment header. """ Since UTF-8 is ASCII compatible, the whole source code will effectively be UTF-8 encoded. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 05 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From martin at v.loewis.de Wed Aug 5 22:13:07 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 05 Aug 2009 22:13:07 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <50862ebd0908050125u53e19515j9c0542e4009fc2fd@mail.gmail.com> <4A7945C5.5000303@v.loewis.de> <50862ebd0908050209g7da4133fi2b859160f6fd17fb@mail.gmail.com> Message-ID: <4A79E7D3.6000906@v.loewis.de> > I'm not aware of any other unresolved items; they may exist, but the fact > that they're not discussed on this list in detail means that they are > largely unimportant. There is a long list of things that still need to be done; each one potentially creating new problems. In particular: - the .hgeols plugin needs to be written - the hooks need to be written, or at least deployed, for code style checks, for email notification, and for buildbot triggering - the build identification patch needs to be written (I do expect many problems out of that one, some possibly small - I'm not a Mercurial user, so I can't estimate how difficult that will be) - buildbot configuration needs to be adjusted - the roundup regex needs to be configured to refer to hgweb links - access control needs to be setup - stackless needs to be converted - a decision on the location of the PEPs must be made and implemented - developer documentation needs to be written - a decision must be made what to do with the migrated parts of subversion, in the subversion repository I may have missed some things. I would like to see test period (say, two weeks) were we can find further issues. Regards, Martin From g.brandl at gmx.net Wed Aug 5 22:18:16 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 05 Aug 2009 22:18:16 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A79E7D3.6000906@v.loewis.de> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <50862ebd0908050125u53e19515j9c0542e4009fc2fd@mail.gmail.com> <4A7945C5.5000303@v.loewis.de> <50862ebd0908050209g7da4133fi2b859160f6fd17fb@mail.gmail.com> <4A79E7D3.6000906@v.loewis.de> Message-ID: Martin v. L?wis schrieb: >> I'm not aware of any other unresolved items; they may exist, but the fact >> that they're not discussed on this list in detail means that they are >> largely unimportant. > > There is a long list of things that still need to be done; each one > potentially creating new problems. In particular: > - the .hgeols plugin needs to be written > - the hooks need to be written, or at least deployed, for code > style checks, for email notification, and for buildbot triggering > - the build identification patch needs to be written (I do expect > many problems out of that one, some possibly small - I'm not a > Mercurial user, so I can't estimate how difficult that will be) > - buildbot configuration needs to be adjusted > - the roundup regex needs to be configured to refer to hgweb links > - access control needs to be setup > - stackless needs to be converted > - a decision on the location of the PEPs must be made and implemented > - developer documentation needs to be written > - a decision must be made what to do with the migrated parts of > subversion, in the subversion repository > > I may have missed some things. I would like to see test period (say, > two weeks) were we can find further issues. Sure there are many things to do; I was speaking of issues where the way to go is not decided, and needs to be before the switch can happen. Maybe build identification is one of them; but I think everything has been said in the one thread we had about this. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From amauryfa at gmail.com Wed Aug 5 23:03:44 2009 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 5 Aug 2009 23:03:44 +0200 Subject: [Python-Dev] PEP 385: pruning/reorganizing branches In-Reply-To: References: Message-ID: 2009/8/3 Dirkjan Ochtman : > So PEP 385 proposes to clean up the old branches we still have lying > around in SVN. > > io-c: keep-clone? strip - it was merged into py3k some months ago. -- Amaury Forgeot d'Arc From ncoghlan at gmail.com Wed Aug 5 23:44:58 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 06 Aug 2009 07:44:58 +1000 Subject: [Python-Dev] Reasons for using expy In-Reply-To: <363186.73070.qm@web54203.mail.re2.yahoo.com> References: <363186.73070.qm@web54203.mail.re2.yahoo.com> Message-ID: <4A79FD5A.1090006@gmail.com> Yingjie Lan wrote: > Hi, > > The expy project provides an express way to extend Python. After some > careful considerations, I came up with some reasons for expy (this is > not an exhaustive list): This kind of advocacy for external projects belongs on python-list, not python-dev (or, if you're proposing something for use in the standard library, on python-ideas). Cheers, Nick. P.S. The message to capi-sig was probably on topic - certainly closer to being so than the inclusion of python-dev. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From nyamatongwe at gmail.com Thu Aug 6 00:22:14 2009 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Thu, 6 Aug 2009 08:22:14 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A79C4DD.6000601@g.nevcal.com> References: <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796ABC.80801@skippinet.com.au> <4A79C4DD.6000601@g.nevcal.com> Message-ID: <50862ebd0908051522h7a2f3a21g398ebd1d8522b4dc@mail.gmail.com> Glenn Linderman: > and perhaps other things (and > are there new Unicode control characters that could be used for line > endings?), Unicode includes Line Separator U+2028 and Paragraph Separator U+2029 but they are rarely supported and very rarely used. They are a pain to work with since they are 3 byte sequences in UTF-8. Visual Studio does support them. Python does not currently support these line separators such as in this example which only reads 2 lines rather than 3: with open("x.txt", "wb") as f: f.write("a\nb\u2029c\n".encode('utf-8')) with open("x.txt", "r") as f: n = 1 for l in f.readlines(): print(n, repr(l)) n += 1 Neil From lanyjie at yahoo.com Thu Aug 6 00:55:28 2009 From: lanyjie at yahoo.com (Yingjie Lan) Date: Wed, 5 Aug 2009 15:55:28 -0700 (PDT) Subject: [Python-Dev] Reasons for using expy In-Reply-To: <4A79FD5A.1090006@gmail.com> Message-ID: <216507.60638.qm@web54201.mail.re2.yahoo.com> > From: Nick Coghlan > Subject: Re: [Python-Dev] Reasons for using expy > To: "Yingjie Lan" > Cc: python-dev at python.org > Date: Thursday, August 6, 2009, 1:44 AM > This kind of advocacy for external projects belongs on > python-list, not > python-dev (or, if you're proposing something for use in > the standard > library, on python-ideas). > Thanks Nick. Cheers, Yingjie From mhammond at skippinet.com.au Thu Aug 6 02:34:08 2009 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 06 Aug 2009 10:34:08 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <87ws5ibahx.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> <4A7921EB.30007@gmail.com> <877hxirbz6.fsf@benfinney.id.au> <4A793569.1000008@gmail.com> <87ws5ibahx.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A7A2500.2030104@skippinet.com.au> On 6/08/2009 12:28 AM, Stephen J. Turnbull wrote: > Mark Hammond writes: > > > I'm not sure what point you are trying to make, but I believe it *is* > > possible for a solution to be found here which will keep Windows users > > happy. I'm guessing you haven't had much practical experience with this > > problem, so probably don't see this is clearly as Windows users do. > > Mercurial is not only open source, it's written in Python. The > problem is known to be hard in a practical sense, the existing > solutions (written by non-Windows developers, of course) are judged to > be insufficient by Windows users, and the non-Windows developers > "probably don't see this is clearly as Windows users do." > > I think the implication is obvious. There will be no good solution > until Windows users develop it. I don't see a good reason to wait for > that. My conclusion is different. I'm not sure of the history of win32text, but it most certainly is now squarely in the hands of Windows users. Patches to win32text, or even general discussion is usually met with silence, and when prodded, the response is "sorry - we don't use that - it is a Windows problem." As a result, we end up in the position we are in now - win32text is great in theory but doesn't work in practice, attempts to make it work are met with indifference, and the "problem" stays squarely with Windows users. Non Windows users remain oblivious to the pain, Windows users stop bothering with the extension, and the repository post-commit hooks then cause different pain. Hence my conclusion that the answer is for any such support to be developed in conjunction with Windows users, but also in such a way that the solution works, almost identically, for non Windows users. By insisting all platforms eat the same dog-food, there is much more chance the glaringly obvious (to Windows users) issues are addressed. > I do see good reason for non-Windows users to put up with some > inconvenience during the "beta" phase of implementing that solution; > it's important enough to be fast-tracked, and doesn't need to be > perfect for everybody to be tried (though it should not be allowed to > endanger repo content, which seems unlikely but needs care since it's > a potential disaster). And on the flip-side, I accept we may migrate without the agreed solution fully implemented - I'm happy to accept commitments about what *will* be done even if it isn't a reality for a short while... Cheers, Mark From mcaninch at lanl.gov Thu Aug 6 00:22:30 2009 From: mcaninch at lanl.gov (Jeff McAninch) Date: Wed, 05 Aug 2009 16:22:30 -0600 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) Message-ID: <4A7A0626.5050303@lanl.gov> I'm new to this list, so please excuse me if this topic has been discussed, but I didn't see anything similar in the archives. I very often want something like a try-except conditional expression similar to the if-else conditional. An example of the proposed syntax might be: x = float(string) except float('nan') or possibly x = float(string) except ValueError float('nan') Here's a simple example: Converting a large list of strings to floats where there may be errors that I want returned as nan's. Currently I would write the function: def safe_float_function(string): try: result = float(string) except: result = float('nan') return result and get my list of floats using the list comprehension: xs = [ safe_float_function(string) for string in strings ] With a try-except conditional I would instead define the following lambda: safe_float_conditional = lambda string : float(string) except float('nan') leading to: xs = [ safe_float_conditional(string) for string in strings ] My understanding is that the second would be faster at run time, and, like if-else conditional expressions, possibly more easily read by the human. Again, please excuse me if this has been discussed previously. If so, I'd appreciate being pointed to the discussion. Please also excuse me if for there is some currently (pre-python 3.0) idiom that I could use to efficiently get this same behaviour. If so, I'd appreciate being educated. Thanks, Jeff McAninch -- ========================== Jeffrey E. McAninch, PhD Physicist, X-2-IFD Los Alamos National Laboratory Phone: 505-667-0374 Email: mcaninch at lanl.gov ========================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at rcn.com Thu Aug 6 02:59:44 2009 From: python at rcn.com (Raymond Hettinger) Date: Wed, 5 Aug 2009 17:59:44 -0700 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) References: <4A7A0626.5050303@lanl.gov> Message-ID: <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> [Jeffrey E. McAninch, PhD] > I very often want something like a try-except conditional expression similar > to the if-else conditional. > > An example of the proposed syntax might be: > x = float(string) except float('nan') > or possibly > x = float(string) except ValueError float('nan') +1 I've long wanted something like this. One possible spelling is: x = float(string) except ValueError else float('nan') If accepted, this would also solve the feature requests for various functions to have default arguments. For example: x = min(seq) except ValueError else 0 # default to zero for empty sequences It would also be helpful in calculations that have algebraic restrictions: sample_std_deviation = sqrt(sum(x - mu for x in seq) / (len(seq)-1)) except ZeroDivisionError else float('Inf') Raymond From pje at telecommunity.com Thu Aug 6 03:20:54 2009 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 05 Aug 2009 21:20:54 -0400 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) In-Reply-To: <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> References: <4A7A0626.5050303@lanl.gov> <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> Message-ID: <20090806012101.B42493A406B@sparrow.telecommunity.com> At 05:59 PM 8/5/2009 -0700, Raymond Hettinger wrote: >[Jeffrey E. McAninch, PhD] >>I very often want something like a try-except conditional expression similar >>to the if-else conditional. >> >>An example of the proposed syntax might be: >> x = float(string) except float('nan') >>or possibly >> x = float(string) except ValueError float('nan') > >+1 I've long wanted something like this. >One possible spelling is: > > x = float(string) except ValueError else float('nan') I think 'as' would be better than 'else', since 'else' has a different meaning in try/except statements, e.g.: x = float(string) except ValueError, TypeError as float('nan') Of course, this is a different meaning of 'as', too, but it's not "as" contradictory, IMO... ;-) From mcaninch at lanl.gov Thu Aug 6 04:11:28 2009 From: mcaninch at lanl.gov (Jeff McAninch) Date: Wed, 05 Aug 2009 20:11:28 -0600 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) In-Reply-To: <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> References: <4A7A0626.5050303@lanl.gov> <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> Message-ID: <4A7A3BD0.9000104@lanl.gov> Raymond Hettinger wrote: > If accepted, this would also solve the feature requests for various > functions to have default arguments. > For example: > > x = min(seq) except ValueError else 0 # default to zero for > empty sequences > > It would also be helpful in calculations that have algebraic > restrictions: > > sample_std_deviation = sqrt(sum(x - mu for x in seq) / (len(seq)-1)) > except ZeroDivisionError else float('Inf') > > > Raymond Yes, exactly the situations I keep coding around. -- ========================== Jeffrey E. McAninch, PhD Physicist, X-2-IFD Los Alamos National Laboratory Phone: 505-667-0374 Email: mcaninch at lanl.gov ========================== From stephen at xemacs.org Thu Aug 6 07:00:43 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 06 Aug 2009 14:00:43 +0900 Subject: [Python-Dev] PEP 385: Mercurial issues In-Reply-To: <4A79D183.7090503@v.loewis.de> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <20090805135003.GC28780@phd.pp.ru> <20090805135759.GD28780@phd.pp.ru> <87r5vqb7g0.fsf@uwakimon.sk.tsukuba.ac.jp> <4A79D183.7090503@v.loewis.de> Message-ID: <87k51hbkp0.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > > I don't see any good reason for having any encodings but UTF-8 in > > Python's. > > Just in case my previous message gets overlooked: PEP 8 mandates Latin-1 > for Python 2.x source code (except for files that test PEP 263). You're right, sorry for the misinformation. An exception should be made for gettext message files, too? From martin at v.loewis.de Thu Aug 6 07:48:46 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 06 Aug 2009 07:48:46 +0200 Subject: [Python-Dev] PEP 385: Mercurial issues In-Reply-To: <87k51hbkp0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <4A794A90.8090502@gmail.com> <4A794E46.2030700@gmail.com> <4A79801F.9080702@gmail.com> <4A798A86.8010702@mrabarnett.plus.com> <20090805135003.GC28780@phd.pp.ru> <20090805135759.GD28780@phd.pp.ru> <87r5vqb7g0.fsf@uwakimon.sk.tsukuba.ac.jp> <4A79D183.7090503@v.loewis.de> <87k51hbkp0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A7A6EBE.3080400@v.loewis.de> > > Just in case my previous message gets overlooked: PEP 8 mandates Latin-1 > > for Python 2.x source code (except for files that test PEP 263). > > You're right, sorry for the misinformation. > > An exception should be made for gettext message files, too? In principle, perhaps. However, Python doesn't have any .po files, AFAIK. Regards, Martin From stephen at xemacs.org Thu Aug 6 08:00:54 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 06 Aug 2009 15:00:54 +0900 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A7A2500.2030104@skippinet.com.au> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> <4A7921EB.30007@gmail.com> <877hxirbz6.fsf@benfinney.id.au> <4A793569.1000008@gmail.com> <87ws5ibahx.fsf@uwakimon.sk.tsukuba.ac.jp> <4A7A2500.2030104@skippinet.com.au> Message-ID: <87iqh1bhwp.fsf@uwakimon.sk.tsukuba.ac.jp> Mark Hammond writes: > On 6/08/2009 12:28 AM, Stephen J. Turnbull wrote: > > I think the implication is obvious. There will be no good solution > > until Windows users develop it. I don't see a good reason to wait for > > that. > My conclusion is different. I'm not sure of the history of win32text, > but it most certainly is now squarely in the hands of Windows users. > Patches to win32text, or even general discussion is usually met with > silence, and when prodded, the response is "sorry - we don't use that - > it is a Windows problem." Well, yes, it is a Windows problem. And it will probably always be that way, because for practical purposes, Windows users cannot advocate their platform's infrastructure solutions for open source projects: those solutions are proprietary. On the flip side, in my experience at least Windows users do not contribute much to this kind of infrastructure initiative, undoubtedly due to the high cost of acquiring familiarity with the usable options[1], and so have less input into the process. But that's a matter of certain costs that are built in to the nature of a proprietary platform. Somebody has to pay them, and I think it should be the users of that platform. Why should the rest of the community subsidize that platform? > As a result, we end up in the position we are in now - win32text is > great in theory but doesn't work in practice, attempts to make it work > are met with indifference, and the "problem" stays squarely with Windows > users. This is simply false AFAICS. There was little participation on this particular issue during PEP 374 that I can recall. Now that it is clearly an issue after all, it's still early in the PEP 385 process. Martin has already picked up the ball on EOL support, and has carried informal design pretty much to the goal line already ... all that's left is the detailed design and the implementation, and there are several people involved who will help develop the patch, all very capable. (Of course it's going to be easier said than done and there are probably bumps in the road to a smooth workflow, but I do claim that the process is working as well as you could expect.) > Hence my conclusion that the answer is for any such support to be > developed in conjunction with Windows users, [...] Ahem. Why not "(primarily) by Windows users"? > And on the flip-side, I accept we may migrate without the agreed > solution fully implemented - I'm happy to accept commitments about > what *will* be done even if it isn't a reality for a short while... Make no mistake about it, EOL support is a tempest in a teapot compared to the benefits to a large number of core developers in their *personal* workspaces -- even if the project workflow doesn't change at all. That's what is driving this change. Unless Windows users do it themselves, they are dependent on the good will of the PEP 385 proponent and other volunteer contributors. I don't think "accepting commitments" is part of the game plan. Footnotes: [1] Eg, I was willing to participate in PEP 374 because I already have a great interest in version control and use git daily. Lots of Unix users don't, and they didn't participate any more than most Windows users did. From martin at v.loewis.de Thu Aug 6 08:40:35 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 06 Aug 2009 08:40:35 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <87iqh1bhwp.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> <4A7921EB.30007@gmail.com> <877hxirbz6.fsf@benfinney.id.au> <4A793569.1000008@gmail.com> <87ws5ibahx.fsf@uwakimon.sk.tsukuba.ac.jp> <4A7A2500.2030104@skippinet.com.au> <87iqh1bhwp.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A7A7AE3.2050509@v.loewis.de> > This is simply false AFAICS. There was little participation on this > particular issue during PEP 374 that I can recall. Now that it is > clearly an issue after all, it's still early in the PEP 385 process. > Martin has already picked up the ball on EOL support, and has carried > informal design pretty much to the goal line already ... all that's > left is the detailed design and the implementation, and there are > several people involved who will help develop the patch, all very > capable. I'm not so optimistic. To me, it looks like that either Dirkjan or Mark will implement a hg hook, or else it won't happen (for me, I certainly know that I will not write Mercurial hooks anytime soon). Regards, Martin From stephen at xemacs.org Thu Aug 6 09:12:04 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 06 Aug 2009 16:12:04 +0900 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A7A7AE3.2050509@v.loewis.de> References: <4A77FD4D.1020502@gmail.com> <4A78C793.9020409@skippinet.com.au> <87fxc6regv.fsf@benfinney.id.au> <4A7921EB.30007@gmail.com> <877hxirbz6.fsf@benfinney.id.au> <4A793569.1000008@gmail.com> <87ws5ibahx.fsf@uwakimon.sk.tsukuba.ac.jp> <4A7A2500.2030104@skippinet.com.au> <87iqh1bhwp.fsf@uwakimon.sk.tsukuba.ac.jp> <4A7A7AE3.2050509@v.loewis.de> Message-ID: <87bpmtbem3.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > > This is simply false AFAICS. There was little participation on this > > particular issue during PEP 374 that I can recall. Now that it is > > clearly an issue after all, it's still early in the PEP 385 process. > > Martin has already picked up the ball on EOL support, and has carried > > informal design pretty much to the goal line already ... all that's > > left is the detailed design and the implementation, and there are > > several people involved who will help develop the patch, all very > > capable. > > I'm not so optimistic. To me, it looks like that either Dirkjan or Mark > will implement a hg hook, or else it won't happen (for me, I certainly > know that I will not write Mercurial hooks anytime soon). Ouch. Still, I think the informal discussion so far is pretty close to a usable solution at that level. From mal at egenix.com Thu Aug 6 10:31:04 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 06 Aug 2009 10:31:04 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <50862ebd0908051522h7a2f3a21g398ebd1d8522b4dc@mail.gmail.com> References: <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796ABC.80801@skippinet.com.au> <4A79C4DD.6000601@g.nevcal.com> <50862ebd0908051522h7a2f3a21g398ebd1d8522b4dc@mail.gmail.com> Message-ID: <4A7A94C8.90608@egenix.com> Neil Hodgson wrote: > Glenn Linderman: > >> and perhaps other things (and >> are there new Unicode control characters that could be used for line >> endings?), > > Unicode includes Line Separator U+2028 and Paragraph Separator > U+2029 but they are rarely supported and very rarely used. They are a > pain to work with since they are 3 byte sequences in UTF-8. Visual > Studio does support them. > > Python does not currently support these line separators such as in > this example which only reads 2 lines rather than 3: > > with open("x.txt", "wb") as f: > f.write("a\nb\u2029c\n".encode('utf-8')) > with open("x.txt", "r") as f: > n = 1 > for l in f.readlines(): > print(n, repr(l)) > n += 1 Please file a bug report for this. f.readlines() (or rather the io layer) should be using Py_UNICODE_ISLINEBREAK(ch) for detecting line break characters. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Thu Aug 6 10:51:29 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 6 Aug 2009 08:51:29 +0000 (UTC) Subject: [Python-Dev] PEP 385: the eol-type issue References: <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796ABC.80801@skippinet.com.au> <4A79C4DD.6000601@g.nevcal.com> <50862ebd0908051522h7a2f3a21g398ebd1d8522b4dc@mail.gmail.com> <4A7A94C8.90608@egenix.com> Message-ID: M.-A. Lemburg egenix.com> writes: > > Please file a bug report for this. f.readlines() (or rather > the io layer) should be using Py_UNICODE_ISLINEBREAK(ch) > for detecting line break characters. Actually, no. It has been designed from the start to only recognize the "standard" line break representations found in common formats/protocols (CR, LF and CR+LF). People wanting to split on arbitrary unicode line breaks should use str.splitlines(). Regards Antoine. From ncoghlan at gmail.com Thu Aug 6 12:19:38 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 06 Aug 2009 20:19:38 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796ABC.80801@skippinet.com.au> <4A79C4DD.6000601@g.nevcal.com> <50862ebd0908051522h7a2f3a21g398ebd1d8522b4dc@mail.gmail.com> <4A7A94C8.90608@egenix.com> Message-ID: <4A7AAE3A.3050507@gmail.com> Antoine Pitrou wrote: > M.-A. Lemburg egenix.com> writes: >> Please file a bug report for this. f.readlines() (or rather >> the io layer) should be using Py_UNICODE_ISLINEBREAK(ch) >> for detecting line break characters. > > Actually, no. It has been designed from the start to only recognize the > "standard" line break representations found in common formats/protocols (CR, LF > and CR+LF). > People wanting to split on arbitrary unicode line breaks should use > str.splitlines(). The fairly long-standing RFE relating to an arbitrarily selectable newline separator seems relevant here: http://bugs.python.org/issue1152248 As with the discussion there, the problem with using str.splitlines is that it prevents pipelining approaches that avoid reading a whole file into memory. While removing the validity check from readlines() completely is questionable (the readrecords() approach mentioned in the tracker issue would still be better there), loosening the validity check to be based on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it a feature requests rather than a bug though). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From mal at egenix.com Thu Aug 6 12:40:09 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 06 Aug 2009 12:40:09 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A7AAE3A.3050507@gmail.com> References: <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796ABC.80801@skippinet.com.au> <4A79C4DD.6000601@g.nevcal.com> <50862ebd0908051522h7a2f3a21g398ebd1d8522b4dc@mail.gmail.com> <4A7A94C8.90608@egenix.com> <4A7AAE3A.3050507@gmail.com> Message-ID: <4A7AB309.3080403@egenix.com> Nick Coghlan wrote: > Antoine Pitrou wrote: >> M.-A. Lemburg egenix.com> writes: >>> Please file a bug report for this. f.readlines() (or rather >>> the io layer) should be using Py_UNICODE_ISLINEBREAK(ch) >>> for detecting line break characters. >> >> Actually, no. It has been designed from the start to only recognize the >> "standard" line break representations found in common formats/protocols (CR, LF >> and CR+LF). >> People wanting to split on arbitrary unicode line breaks should use >> str.splitlines(). > > The fairly long-standing RFE relating to an arbitrarily selectable > newline separator seems relevant here: > http://bugs.python.org/issue1152248 > > As with the discussion there, the problem with using str.splitlines is > that it prevents pipelining approaches that avoid reading a whole file > into memory. > > While removing the validity check from readlines() completely is > questionable (the readrecords() approach mentioned in the tracker issue > would still be better there), loosening the validity check to be based > on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it > a feature requests rather than a bug though). I've had a look at the io implementation: this appears to be based on the universal newline support idea which addresses only a fixed set of "new line" character combinations and is not as straight forward to extend to support all Unicode line break characters as I thought. What I don't understand is why the io layer tries to reinvent the wheel here instead of just using the codec's .readline() method - which *does* use .splitlines() and has full support for all Unicode line break characters (including the CRLF combination). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Thu Aug 6 12:46:44 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 06 Aug 2009 12:46:44 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A7AB309.3080403@egenix.com> References: <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796ABC.80801@skippinet.com.au> <4A79C4DD.6000601@g.nevcal.com> <50862ebd0908051522h7a2f3a21g398ebd1d8522b4dc@mail.gmail.com> <4A7A94C8.90608@egenix.com> <4A7AAE3A.3050507@gmail.com> <4A7AB309.3080403@egenix.com> Message-ID: <4A7AB494.1000505@egenix.com> M.-A. Lemburg wrote: > Nick Coghlan wrote: >> Antoine Pitrou wrote: >>> M.-A. Lemburg egenix.com> writes: >>>> Please file a bug report for this. f.readlines() (or rather >>>> the io layer) should be using Py_UNICODE_ISLINEBREAK(ch) >>>> for detecting line break characters. >>> >>> Actually, no. It has been designed from the start to only recognize the >>> "standard" line break representations found in common formats/protocols (CR, LF >>> and CR+LF). >>> People wanting to split on arbitrary unicode line breaks should use >>> str.splitlines(). >> >> The fairly long-standing RFE relating to an arbitrarily selectable >> newline separator seems relevant here: >> http://bugs.python.org/issue1152248 >> >> As with the discussion there, the problem with using str.splitlines is >> that it prevents pipelining approaches that avoid reading a whole file >> into memory. >> >> While removing the validity check from readlines() completely is >> questionable (the readrecords() approach mentioned in the tracker issue >> would still be better there), loosening the validity check to be based >> on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it >> a feature requests rather than a bug though). > > I've had a look at the io implementation: this appears to be > based on the universal newline support idea which addresses > only a fixed set of "new line" character combinations and is > not as straight forward to extend to support all Unicode > line break characters as I thought. > > What I don't understand is why the io layer tries to reinvent > the wheel here instead of just using the codec's .readline() > method - which *does* use .splitlines() and has full support > for all Unicode line break characters (including the CRLF > combination). ... and because of this, the feature is already available if you use codecs.open() instead of the built-in open(): import codecs with codecs.open("x.txt", "w", encoding='utf-8') as f: f.write("a\nb\u2029c\n") with codecs.open("x.txt", "r", encoding='utf-8') as f: n = 1 for l in f.readlines(): print(n, repr(l)) n += 1 This prints: 1 'a\n' 2 'b\u2029' 3 'c\n' -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Thu Aug 6 12:47:45 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 06 Aug 2009 20:47:45 +1000 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) In-Reply-To: <20090806012101.B42493A406B@sparrow.telecommunity.com> References: <4A7A0626.5050303@lanl.gov> <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> <20090806012101.B42493A406B@sparrow.telecommunity.com> Message-ID: <4A7AB4D1.6070501@gmail.com> P.J. Eby wrote: > At 05:59 PM 8/5/2009 -0700, Raymond Hettinger wrote: >> [Jeffrey E. McAninch, PhD] >>> I very often want something like a try-except conditional expression >>> similar >>> to the if-else conditional. >>> >>> An example of the proposed syntax might be: >>> x = float(string) except float('nan') >>> or possibly >>> x = float(string) except ValueError float('nan') >> >> +1 I've long wanted something like this. >> One possible spelling is: >> >> x = float(string) except ValueError else float('nan') > > I think 'as' would be better than 'else', since 'else' has a different > meaning in try/except statements, e.g.: > > x = float(string) except ValueError, TypeError as float('nan') > > Of course, this is a different meaning of 'as', too, but it's not "as" > contradictory, IMO... ;-) (We're probably well into python-ideas territory at this point, but I'll keep things where the thread started for now) The basic idea appears sound to me as well. I suspect finding an acceptable syntax is going to be the sticking point. Breaking the problem down, we have three things we want to separate: 1. The expression that may raise the exception 2. The expression defining the exceptions to be caught 3. The expression to be used if the exception actually is caught >From there it is possible to come up with all sorts of variants. Option 1: Change the relative order of the clauses by putting the exception definition last: x = float(string) except float('nan') if ValueError op(float(string) except float('nan') if ValueError) I actually like this one (that's why I listed it first). It gets the clauses out of order relative to the statement, but the meaning still seems pretty obvious to me. Option 2: Follow the lamba model and allow a colon inside this form of expression: x = float(string) except ValueError: float('nan') op(float(string) except ValueError: float('nan')) This has the virtue of closely matching the statement syntax, but embedding colons inside expressions is somewhat ugly. Yes, lambda already does it, but lambda can hardly be put forward as a paragon of beauty. Option 3a/3b: Raymond's except-else suggestion: x = float(string) except ValueError else float('nan') op(float(string) except ValueError else float('nan')) This has the problem of inverting the sense of the else clause relative to the statement form (where the else clause is executed only if no exception occurs) A couple of extra keywords would get the sense correct again, but I'm not sure the parser could cope with it and it is rather verbose (I much prefer option 1 to this idea): x = float(string) if not except ValueError else float('nan') op(float(string) if not except ValueError else float('nan')) Option 4: PJE's except-as suggestion: x = float(string) except ValueError as float('nan') op(float(string) except ValueError as float('nan')) Given that we now use "except ValueError as ex" in exception statements, the above strikes me a really confusing idea. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Thu Aug 6 13:01:42 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 6 Aug 2009 11:01:42 +0000 (UTC) Subject: [Python-Dev] PEP 385: the eol-type issue References: <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796ABC.80801@skippinet.com.au> <4A79C4DD.6000601@g.nevcal.com> <50862ebd0908051522h7a2f3a21g398ebd1d8522b4dc@mail.gmail.com> <4A7A94C8.90608@egenix.com> <4A7AAE3A.3050507@gmail.com> <4A7AB309.3080403@egenix.com> Message-ID: M.-A. Lemburg egenix.com> writes: > > What I don't understand is why the io layer tries to reinvent > the wheel here instead of just using the codec's .readline() > method - which *does* use .splitlines() and has full support > for all Unicode line break characters (including the CRLF > combination). As for the original Python implementation, the goal was probably to start from a clean sheet. Besides, the new API has seek() and tell() as well. But I'm not really qualified to say more -- I didn't participate in its design. As for the C implementation, it had to be written from scratch anyway -- codecs.open() is pure Python and too slow. Deferring to str.splitlines() would still have been possible but a bit wasteful since in C you can use buffers directly. (and, besides, when writing the C implementation we were concerned with exact compatibility with the Python version -- including line break semantics) Regards Antoine. From digitalxero at gmail.com Thu Aug 6 13:18:52 2009 From: digitalxero at gmail.com (Dj Gilcrease) Date: Thu, 6 Aug 2009 05:18:52 -0600 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) In-Reply-To: <4A7AB4D1.6070501@gmail.com> References: <4A7A0626.5050303@lanl.gov> <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> <20090806012101.B42493A406B@sparrow.telecommunity.com> <4A7AB4D1.6070501@gmail.com> Message-ID: On Thu, Aug 6, 2009 at 4:47 AM, Nick Coghlan wrote: > Option 2: > ?x = float(string) except ValueError: float('nan') > ?op(float(string) except ValueError: float('nan')) > > This has the virtue of closely matching the statement syntax, but > embedding colons inside expressions is somewhat ugly. Yes, lambda > already does it, but lambda can hardly be put forward as a paragon of > beauty. +1 on this option as it resembles the standard try/except block enough it would be a quick edit to convert it to one if later you realize you need to catch more exceptions* * I recommend NOT allowing multiple exceptions in this form eg x = float(string)/var except ValueError, ZeroDivisionError, ...: float('nan') as it will start to reduce readability quickly From solipsis at pitrou.net Thu Aug 6 13:32:16 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 6 Aug 2009 11:32:16 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?=28try-except=29_conditional_expression_si?= =?utf-8?q?milar_to=09=28if-else=29_conditional_=28PEP_308=29?= References: <4A7A0626.5050303@lanl.gov> <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> Message-ID: Raymond Hettinger rcn.com> writes: > > For example: > > x = min(seq) except ValueError else 0 # default to zero for empty sequences How about: x = min(seq) if seq else 0 Shorter and more readable ("except X else Y" isn't very logical). > sample_std_deviation = sqrt(sum(x - mu for x in seq) / (len(seq)-1)) except ZeroDivisionError else float('Inf') Same transformation here. I have to say that the original example: x = float(string) except ValueError else float('nan') looks artificial. I don't see how it's adequate behaviour to return a NaN when presented with a string which doesn't represent a float number. Besides, all this is python-ideas material (and has probably already been proposed before). Regards Antoine. From mal at egenix.com Thu Aug 6 13:34:24 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 06 Aug 2009 13:34:24 +0200 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: References: <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796ABC.80801@skippinet.com.au> <4A79C4DD.6000601@g.nevcal.com> <50862ebd0908051522h7a2f3a21g398ebd1d8522b4dc@mail.gmail.com> <4A7A94C8.90608@egenix.com> <4A7AAE3A.3050507@gmail.com> <4A7AB309.3080403@egenix.com> Message-ID: <4A7ABFC0.3090004@egenix.com> Antoine Pitrou wrote: > M.-A. Lemburg egenix.com> writes: >> >> What I don't understand is why the io layer tries to reinvent >> the wheel here instead of just using the codec's .readline() >> method - which *does* use .splitlines() and has full support >> for all Unicode line break characters (including the CRLF >> combination). > > As for the original Python implementation, the goal was probably to start from a > clean sheet. Besides, the new API has seek() and tell() as well. But I'm not > really qualified to say more -- I didn't participate in its design. > > As for the C implementation, it had to be written from scratch anyway -- > codecs.open() is pure Python and too slow. Deferring to str.splitlines() would > still have been possible but a bit wasteful since in C you can use buffers > directly. Sure, but the code for line splitting is not really all that complicated (see PyUnicode_Splitlines()), so could easily be adapted to work on buffers directly. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From catch-all at masklinn.net Thu Aug 6 12:25:25 2009 From: catch-all at masklinn.net (Xavier Morel) Date: Thu, 6 Aug 2009 12:25:25 +0200 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) In-Reply-To: <4A7A0626.5050303@lanl.gov> References: <4A7A0626.5050303@lanl.gov> Message-ID: On 6 Aug 2009, at 00:22 , Jeff McAninch wrote: > I'm new to this list, so please excuse me if this topic has been > discussed, but I didn't > see anything similar in the archives. > > I very often want something like a try-except conditional expression > similar > to the if-else conditional. I fear this idea is soon going to extend to all compound statements one by one. Wouldn't it be smarter to fix the issue once and for all by looking into making Python's compound statements (or even all statements without restrictions) expressions that can return values in the first place? Now I don't know if it's actually possible, but if it is the problem becomes solved not just for try:except: (and twice so for if:else:) but also for while:, for: (though that one's already served pretty well by comprehensions) and with:. From python at mrabarnett.plus.com Thu Aug 6 13:39:58 2009 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 06 Aug 2009 12:39:58 +0100 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) In-Reply-To: <4A7AB4D1.6070501@gmail.com> References: <4A7A0626.5050303@lanl.gov> <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> <20090806012101.B42493A406B@sparrow.telecommunity.com> <4A7AB4D1.6070501@gmail.com> Message-ID: <4A7AC10E.3090406@mrabarnett.plus.com> Nick Coghlan wrote: > P.J. Eby wrote: >> At 05:59 PM 8/5/2009 -0700, Raymond Hettinger wrote: >>> [Jeffrey E. McAninch, PhD] >>>> I very often want something like a try-except conditional expression >>>> similar >>>> to the if-else conditional. >>>> >>>> An example of the proposed syntax might be: >>>> x = float(string) except float('nan') >>>> or possibly >>>> x = float(string) except ValueError float('nan') >>> +1 I've long wanted something like this. >>> One possible spelling is: >>> >>> x = float(string) except ValueError else float('nan') >> I think 'as' would be better than 'else', since 'else' has a different >> meaning in try/except statements, e.g.: >> >> x = float(string) except ValueError, TypeError as float('nan') >> >> Of course, this is a different meaning of 'as', too, but it's not "as" >> contradictory, IMO... ;-) > > (We're probably well into python-ideas territory at this point, but I'll > keep things where the thread started for now) > > The basic idea appears sound to me as well. I suspect finding an > acceptable syntax is going to be the sticking point. > > Breaking the problem down, we have three things we want to separate: > > 1. The expression that may raise the exception > 2. The expression defining the exceptions to be caught > 3. The expression to be used if the exception actually is caught > >>From there it is possible to come up with all sorts of variants. > > Option 1: > > Change the relative order of the clauses by putting the exception > definition last: > > x = float(string) except float('nan') if ValueError > op(float(string) except float('nan') if ValueError) > > I actually like this one (that's why I listed it first). It gets the > clauses out of order relative to the statement, but the meaning still > seems pretty obvious to me. > A further extension (if we need it): result = foo(arg) except float('inf') if ZeroDivisionError else float('nan') The 'else' part handles any other exceptions (not necessarily a good idea!). or: result = foo(arg) except float('inf') if ZeroDivisionError else float('nan') if ValueError Handles a number of different exceptions. > Option 2: > > Follow the lamba model and allow a colon inside this form of expression: > > x = float(string) except ValueError: float('nan') > op(float(string) except ValueError: float('nan')) > > This has the virtue of closely matching the statement syntax, but > embedding colons inside expressions is somewhat ugly. Yes, lambda > already does it, but lambda can hardly be put forward as a paragon of > beauty. > A colon is also used in a dict literal. > Option 3a/3b: > > Raymond's except-else suggestion: > > x = float(string) except ValueError else float('nan') > op(float(string) except ValueError else float('nan')) > [snip] -1 From solipsis at pitrou.net Thu Aug 6 13:42:03 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 6 Aug 2009 11:42:03 +0000 (UTC) Subject: [Python-Dev] PEP 385: the eol-type issue References: <4A78C793.9020409@skippinet.com.au> <50862ebd0908041744n1b6e9376l2ea01054739fd5e2@mail.gmail.com> <4A79363E.1040606@v.loewis.de> <4A793852.1070106@gmail.com> <4A793E4B.30101@v.loewis.de> <79990c6b0908050304w7b42bbbay775bdd66f7cd9dac@mail.gmail.com> <4A796ABC.80801@skippinet.com.au> <4A79C4DD.6000601@g.nevcal.com> <50862ebd0908051522h7a2f3a21g398ebd1d8522b4dc@mail.gmail.com> <4A7A94C8.90608@egenix.com> <4A7AAE3A.3050507@gmail.com> <4A7AB309.3080403@egenix.com> <4A7ABFC0.3090004@egenix.com> Message-ID: M.-A. Lemburg egenix.com> writes: > > Sure, but the code for line splitting is not really all that > complicated (see PyUnicode_Splitlines()), so could easily > be adapted to work on buffers directly. Certainly indeed. It all comes down to compatibility with the original implementation. (PEP 3116 itself is vague on the subject, but it didn't come to me to question the validity of the Python implementation, I admit) Regards Antoine. From ilya.nikokoshev at gmail.com Thu Aug 6 14:03:12 2009 From: ilya.nikokoshev at gmail.com (ilya) Date: Thu, 6 Aug 2009 16:03:12 +0400 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) Message-ID: I took a look at the options 1 and 2: x = float(string) except float('nan') if ValueError y = float(string) except ValueError: float('nan') and I think this can be done just as easily with existing syntax: x = try_1(float, string, except_ = float('nan'), if_ = ValueError) y = try_2(float, string, { ValueError: float('nan') }) Here's the full example: ----- example starts ----- def try_1(func, *args, except_ = None, if_ = None): try: return func(*args) except if_ as e: return except_ def try_2(func, *args): 'The last argument is a dictionary {exception type: return value}.' dic = args[-1] try: return func(*args[:-1]) except Exception as e: for k,v in dic.items(): if isinstance(e, k): return v raise for string in ['5', 'five']: # x = float(string) except float('nan') if ValueError x = try_1(float, string, except_ = float('nan'), if_ = ValueError) # y = float(string) except ValueError: float('nan') y = try_2(float, string, { ValueError: float('nan') }) print(x, y) ----- example ends ----- As a side note, if I just subscribed to python-dev, is it possible to quote an old email? Below is my manual cut-and-paste quote: ---------- my quote -------------- Nick Coghlan wrote: > P.J. Eby wrote: >> At 05:59 PM 8/5/2009 -0700, Raymond Hettinger wrote: >>> [Jeffrey E. McAninch, PhD] >>>> I very often want something like a try-except conditional expression >>>> similar >>>> to the if-else conditional. >>>> >>>> An example of the proposed syntax might be: >>>> x = float(string) except float('nan') >>>> or possibly >>>> x = float(string) except ValueError float('nan') >>> +1 I've long wanted something like this. >>> One possible spelling is: >>> >>> x = float(string) except ValueError else float('nan') >> I think 'as' would be better than 'else', since 'else' has a different >> meaning in try/except statements, e.g.: >> >> x = float(string) except ValueError, TypeError as float('nan') >> >> Of course, this is a different meaning of 'as', too, but it's not "as" >> contradictory, IMO... ;-) > > (We're probably well into python-ideas territory at this point, but I'll > keep things where the thread started for now) > > The basic idea appears sound to me as well. I suspect finding an > acceptable syntax is going to be the sticking point. > > Breaking the problem down, we have three things we want to separate: > > 1. The expression that may raise the exception > 2. The expression defining the exceptions to be caught > 3. The expression to be used if the exception actually is caught > >>From there it is possible to come up with all sorts of variants. > > Option 1: > > Change the relative order of the clauses by putting the exception > definition last: > > x = float(string) except float('nan') if ValueError > op(float(string) except float('nan') if ValueError) > > I actually like this one (that's why I listed it first). It gets the > clauses out of order relative to the statement, but the meaning still > seems pretty obvious to me. > A further extension (if we need it): result = foo(arg) except float('inf') if ZeroDivisionError else float('nan') The 'else' part handles any other exceptions (not necessarily a good idea!). or: result = foo(arg) except float('inf') if ZeroDivisionError else float('nan') if ValueError Handles a number of different exceptions. > Option 2: > > Follow the lamba model and allow a colon inside this form of expression: > > x = float(string) except ValueError: float('nan') > op(float(string) except ValueError: float('nan')) > > This has the virtue of closely matching the statement syntax, but > embedding colons inside expressions is somewhat ugly. Yes, lambda > already does it, but lambda can hardly be put forward as a paragon of > beauty. > A colon is also used in a dict literal. > Option 3a/3b: > > Raymond's except-else suggestion: > > x = float(string) except ValueError else float('nan') > op(float(string) except ValueError else float('nan')) > [snip] -1 From eric.pruitt at gmail.com Thu Aug 6 15:39:59 2009 From: eric.pruitt at gmail.com (Eric Pruitt) Date: Thu, 6 Aug 2009 08:39:59 -0500 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) In-Reply-To: References: Message-ID: <171e8a410908060639v76560890ye42c8a5e49316558@mail.gmail.com> What about catching specific error numbers? Maybe an option so that the dictionary elements can also be dictionaries with integers as the keys: filedata = try_3(open, randomfile, except = { IOError, {2: None} } ) If it isn't found in the dictionary, then we raise the error. On Thu, Aug 6, 2009 at 07:03, ilya wrote: > I took a look at the options 1 and 2: > > ? ?x = float(string) except float('nan') if ValueError > ? ?y = float(string) except ValueError: float('nan') > > and I think this can be done just as easily with existing syntax: > > ? ?x = try_1(float, string, except_ = float('nan'), if_ = ValueError) > ? ?y = try_2(float, string, { ValueError: float('nan') }) > > Here's the full example: > > ----- example starts ----- > > def try_1(func, *args, except_ = None, if_ = None): > ? ?try: > ? ? ? ?return func(*args) > ? ?except if_ as e: > ? ? ? ?return except_ > > def try_2(func, *args): > ? ?'The last argument is a dictionary {exception type: return value}.' > ? ?dic = args[-1] > ? ?try: > ? ? ? ?return func(*args[:-1]) > ? ?except Exception as e: > ? ? ? ?for k,v in dic.items(): > ? ? ? ? ? ?if isinstance(e, k): > ? ? ? ? ? ? ? ?return v > ? ? ? ?raise > > for string in ['5', 'five']: > ? ?# ? x = float(string) except float('nan') if ValueError > ? ?x = try_1(float, string, except_ = float('nan'), if_ = ValueError) > ? ?# ? y = float(string) except ValueError: float('nan') > ? ?y = try_2(float, string, { ValueError: float('nan') }) > ? ?print(x, y) > > ----- example ends ----- > > As a side note, if I just subscribed to python-dev, is it possible to > quote an old email? Below is my manual cut-and-paste quote: > > ---------- my quote -------------- > > Nick Coghlan wrote: >> P.J. Eby wrote: >>> At 05:59 PM 8/5/2009 -0700, Raymond Hettinger wrote: >>>> [Jeffrey E. McAninch, PhD] >>>>> I very often want something like a try-except conditional expression >>>>> similar >>>>> to the if-else conditional. >>>>> >>>>> An example of the proposed syntax might be: >>>>> ? ?x = float(string) except float('nan') >>>>> or possibly >>>>> ? ?x = float(string) except ValueError float('nan') >>>> +1 I've long wanted something like this. >>>> One possible spelling is: >>>> >>>> ? x = float(string) except ValueError else float('nan') >>> I think 'as' would be better than 'else', since 'else' has a different >>> meaning in try/except statements, e.g.: >>> >>> ? ?x = float(string) except ValueError, TypeError as float('nan') >>> >>> Of course, this is a different meaning of 'as', too, but it's not "as" >>> contradictory, IMO... ?;-) >> >> (We're probably well into python-ideas territory at this point, but I'll >> keep things where the thread started for now) >> >> The basic idea appears sound to me as well. I suspect finding an >> acceptable syntax is going to be the sticking point. >> >> Breaking the problem down, we have three things we want to separate: >> >> 1. The expression that may raise the exception >> 2. The expression defining the exceptions to be caught >> 3. The expression to be used if the exception actually is caught >> >>>From there it is possible to come up with all sorts of variants. >> >> Option 1: >> >> Change the relative order of the clauses by putting the exception >> definition last: >> >> ? x = float(string) except float('nan') if ValueError >> ? op(float(string) except float('nan') if ValueError) >> >> I actually like this one (that's why I listed it first). It gets the >> clauses out of order relative to the statement, but the meaning still >> seems pretty obvious to me. >> > A further extension (if we need it): > > ? ? result = foo(arg) except float('inf') if ZeroDivisionError else > float('nan') > > The 'else' part handles any other exceptions (not necessarily a good idea!). > > or: > > ? ? result = foo(arg) except float('inf') if ZeroDivisionError else > float('nan') if ValueError > > Handles a number of different exceptions. > >> Option 2: >> >> Follow the lamba model and allow a colon inside this form of expression: >> >> ? x = float(string) except ValueError: float('nan') >> ? op(float(string) except ValueError: float('nan')) >> >> This has the virtue of closely matching the statement syntax, but >> embedding colons inside expressions is somewhat ugly. Yes, lambda >> already does it, but lambda can hardly be put forward as a paragon of >> beauty. >> > A colon is also used in a dict literal. > >> Option 3a/3b: >> >> Raymond's except-else suggestion: >> >> ? x = float(string) except ValueError else float('nan') >> ? op(float(string) except ValueError else float('nan')) >> > [snip] > -1 > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/eric.pruitt%40gmail.com > From cool-rr at cool-rr.com Thu Aug 6 15:44:26 2009 From: cool-rr at cool-rr.com (cool-RR) Date: Thu, 6 Aug 2009 15:44:26 +0200 Subject: [Python-Dev] Tkinter has many files Message-ID: Hello python-dev! I'm a Python programmer, but this is the first time I'm posting on python-dev, and I am not familiar at all with how the Python implementation works -- so this post may be way off. I've recently released a Python application, PythonTurtle, which is packaged using py2exe and InnoSetup. Due to the fact that my program needs to give the user a full Python shell, I've made py2exe package the entire Python standard library with my application. What I've noticed when I did that is that Tkinter has *a lot* of files. This is a bit inconvenient for several reasons, the main one being that the installer for PythonTurtle takes a long time to copy all of those little files. (I think the reason for the slowness is not the weight of the files, but the fact that there are so many of them.) There are also other reasons why it's annoying: Ohloh thinks my project is "Mostly written in Tcl," and git-gui gave me trouble for trying to commit so many files. Do you think it will be a good thing to package all of these Tkinter files into one big file (or several big files)? Best Wishes, Ram Rachum. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at voidspace.org.uk Thu Aug 6 15:59:01 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 06 Aug 2009 14:59:01 +0100 Subject: [Python-Dev] Tkinter has many files In-Reply-To: References: Message-ID: <4A7AE1A5.3080604@voidspace.org.uk> cool-RR wrote: > Hello python-dev! > > I'm a Python programmer, but this is the first time I'm posting on > python-dev, and I am not familiar at all with how the Python > implementation works -- so this post may be way off. > > I've recently released a Python application, PythonTurtle > , which is packaged using py2exe and > InnoSetup. Due to the fact that my program needs to give the user a > full Python shell, I've made py2exe package the entire Python standard > library with my application. What I've noticed when I did that is that > Tkinter has /a lot/ of files. This is a bit inconvenient for several > reasons, the main one being that the installer for PythonTurtle takes > a long time to copy all of those little files. (I think the reason for > the slowness is not the weight of the files, but the fact that there > are so many of them.) There are also other reasons why it's annoying: > Ohloh thinks my project is "Mostly written in Tcl," and git-gui gave > me trouble for trying to commit so many files. > Do you think it will be a good thing to package all of these Tkinter > files into one big file (or several big files)? Do you mean the .tcl files? Tkinter is a Python wrapper around Tcl - which is a separate project / programming environment that includes the Tk GUI. Python is not in a position to modify or repackage those files. Why do you need to keep the whole Python distribution under version control? Isn't all you need a script to *generate* the py2exe'd output from an *installed* Python? This is the approach I take with Movable Python which does something very similar. All the best, Michael Foord > > Best Wishes, > Ram Rachum. > ------------------------------------------------------------------------ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From cool-rr at cool-rr.com Thu Aug 6 16:10:48 2009 From: cool-rr at cool-rr.com (cool-RR) Date: Thu, 6 Aug 2009 16:10:48 +0200 Subject: [Python-Dev] Tkinter has many files In-Reply-To: <4A7AE1A5.3080604@voidspace.org.uk> References: <4A7AE1A5.3080604@voidspace.org.uk> Message-ID: > > Why do you need to keep the whole Python distribution under version > control? Isn't all you need a script to *generate* the py2exe'd output from > an *installed* Python? This is the approach I take with Movable Python which > does something very similar. > > Never mind the source control issue, it's minor. If it's not possible to minimize the number of files there, I guess I'll have to live with it. -- Sincerely, Ram Rachum -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcaninch at lanl.gov Thu Aug 6 16:36:45 2009 From: mcaninch at lanl.gov (Jeff McAninch) Date: Thu, 06 Aug 2009 08:36:45 -0600 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) In-Reply-To: <4A7AB4D1.6070501@gmail.com> References: <4A7A0626.5050303@lanl.gov> <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> <20090806012101.B42493A406B@sparrow.telecommunity.com> <4A7AB4D1.6070501@gmail.com> Message-ID: <4A7AEA7D.6020301@lanl.gov> Nick Coghlan wrote: > Option 1: > > Change the relative order of the clauses by putting the exception > definition last: > > x = float(string) except float('nan') if ValueError > op(float(string) except float('nan') if ValueError) > > I actually like this one (that's why I listed it first). It gets the > clauses out of order relative to the statement, but the meaning still > seems pretty obvious to me. > > Since I don't know the parser coding, I won't comment on the relative implentability (implementableness?) of the syntax options that Nick, P.J. and Raymond suggested. But all seem readable and debugable. Nick's option 1 seems like it might be the most understandable to a Python novice. Would the full syntax include multiple Exceptions after the "if"? Jeff -- ========================== Jeffrey E. McAninch, PhD Physicist, X-2-IFD Los Alamos National Laboratory Phone: 505-667-0374 Email: mcaninch at lanl.gov ========================== From rowen at uw.edu Thu Aug 6 21:55:10 2009 From: rowen at uw.edu (Russell E. Owen) Date: Thu, 06 Aug 2009 12:55:10 -0700 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) References: <4A7A0626.5050303@lanl.gov> Message-ID: In article , Xavier Morel wrote: > On 6 Aug 2009, at 00:22 , Jeff McAninch wrote: > > I'm new to this list, so please excuse me if this topic has been > > discussed, but I didn't > > see anything similar in the archives. > > > > I very often want something like a try-except conditional expression > > similar > > to the if-else conditional. > I fear this idea is soon going to extend to all compound statements > one by one. > > Wouldn't it be smarter to fix the issue once and for all by looking > into making Python's compound statements (or even all statements > without restrictions) expressions that can return values in the first > place? Now I don't know if it's actually possible, but if it is the > problem becomes solved not just for try:except: (and twice so for > if:else:) but also for while:, for: (though that one's already served > pretty well by comprehensions) and with:. I like this idea a lot. -- Russell From dinov at microsoft.com Thu Aug 6 23:55:47 2009 From: dinov at microsoft.com (Dino Viehland) Date: Thu, 6 Aug 2009 21:55:47 +0000 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) In-Reply-To: <4A7AB4D1.6070501@gmail.com> References: <4A7A0626.5050303@lanl.gov> <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> <20090806012101.B42493A406B@sparrow.telecommunity.com> <4A7AB4D1.6070501@gmail.com> Message-ID: <1A472770E042064698CB5ADC83A12ACD0377CBCB@TK5EX14MBXC116.redmond.corp.microsoft.com> On option 1 is this legal then? x = float(string) except float('nan') if some_check() else float('inf') if ValueError -----Original Message----- From: python-dev-bounces+dinov=microsoft.com at python.org [mailto:python-dev-bounces+dinov=microsoft.com at python.org] On Behalf Of Nick Coghlan Sent: Thursday, August 06, 2009 3:48 AM To: P.J. Eby Cc: python-dev at python.org; Jeff McAninch Subject: Re: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) P.J. Eby wrote: > At 05:59 PM 8/5/2009 -0700, Raymond Hettinger wrote: >> [Jeffrey E. McAninch, PhD] >>> I very often want something like a try-except conditional expression >>> similar to the if-else conditional. >>> >>> An example of the proposed syntax might be: >>> x = float(string) except float('nan') or possibly >>> x = float(string) except ValueError float('nan') >> >> +1 I've long wanted something like this. >> One possible spelling is: >> >> x = float(string) except ValueError else float('nan') > > I think 'as' would be better than 'else', since 'else' has a different > meaning in try/except statements, e.g.: > > x = float(string) except ValueError, TypeError as float('nan') > > Of course, this is a different meaning of 'as', too, but it's not "as" > contradictory, IMO... ;-) (We're probably well into python-ideas territory at this point, but I'll keep things where the thread started for now) The basic idea appears sound to me as well. I suspect finding an acceptable syntax is going to be the sticking point. Breaking the problem down, we have three things we want to separate: 1. The expression that may raise the exception 2. The expression defining the exceptions to be caught 3. The expression to be used if the exception actually is caught >From there it is possible to come up with all sorts of variants. Option 1: Change the relative order of the clauses by putting the exception definition last: x = float(string) except float('nan') if ValueError op(float(string) except float('nan') if ValueError) I actually like this one (that's why I listed it first). It gets the clauses out of order relative to the statement, but the meaning still seems pretty obvious to me. Option 2: Follow the lamba model and allow a colon inside this form of expression: x = float(string) except ValueError: float('nan') op(float(string) except ValueError: float('nan')) This has the virtue of closely matching the statement syntax, but embedding colons inside expressions is somewhat ugly. Yes, lambda already does it, but lambda can hardly be put forward as a paragon of beauty. Option 3a/3b: Raymond's except-else suggestion: x = float(string) except ValueError else float('nan') op(float(string) except ValueError else float('nan')) This has the problem of inverting the sense of the else clause relative to the statement form (where the else clause is executed only if no exception occurs) A couple of extra keywords would get the sense correct again, but I'm not sure the parser could cope with it and it is rather verbose (I much prefer option 1 to this idea): x = float(string) if not except ValueError else float('nan') op(float(string) if not except ValueError else float('nan')) Option 4: PJE's except-as suggestion: x = float(string) except ValueError as float('nan') op(float(string) except ValueError as float('nan')) Given that we now use "except ValueError as ex" in exception statements, the above strikes me a really confusing idea. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/dinov%40microsoft.com From kristjan at ccpgames.com Thu Aug 6 22:56:11 2009 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Thu, 6 Aug 2009 20:56:11 +0000 Subject: [Python-Dev] issue 6654 Message-ID: <930F189C8A437347B80DF2C156F7EC7F098493C8FD@exchis.ccp.ad.local> I added http://bugs.python.org/issue6654 I also put a not to python-ideas but have had no response yet. Any comments? Here's the summary: I've created http://codereview.appspot.com/100046 on Rietveld: by passing the "path" component of the xmlrpc request to the dispatch method, itbecomes possible to dispatch differently according to this. This patch providesthat addition. Additionally, it provides an MultiPathXMLRPCDispatcher mixin class and a MultiPathXMLRPCServer that uses it, to have multiple dispatchers for different paths. This allows a single server port to serve different XMLRPC servers as differentiated by the HTTP path. A test is also preovided. Kristj?n -------------- next part -------------- An HTML attachment was scrubbed... URL: From nyamatongwe at gmail.com Fri Aug 7 00:10:28 2009 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Fri, 7 Aug 2009 08:10:28 +1000 Subject: [Python-Dev] PEP 385: the eol-type issue In-Reply-To: <4A7AB494.1000505@egenix.com> References: <4A796ABC.80801@skippinet.com.au> <4A79C4DD.6000601@g.nevcal.com> <50862ebd0908051522h7a2f3a21g398ebd1d8522b4dc@mail.gmail.com> <4A7A94C8.90608@egenix.com> <4A7AAE3A.3050507@gmail.com> <4A7AB309.3080403@egenix.com> <4A7AB494.1000505@egenix.com> Message-ID: <50862ebd0908061510j3c2b9e06t9b2a73b73eb2bb31@mail.gmail.com> M.-A. Lemburg: > ... and because of this, the feature is already available if > you use codecs.open() instead of the built-in open(): So should I not add an issue for the basic open because codecs.open should be used for this case? Neil From python at mrabarnett.plus.com Fri Aug 7 01:33:49 2009 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 07 Aug 2009 00:33:49 +0100 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) In-Reply-To: <1A472770E042064698CB5ADC83A12ACD0377CBCB@TK5EX14MBXC116.redmond.corp.microsoft.com> References: <4A7A0626.5050303@lanl.gov> <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> <20090806012101.B42493A406B@sparrow.telecommunity.com> <4A7AB4D1.6070501@gmail.com> <1A472770E042064698CB5ADC83A12ACD0377CBCB@TK5EX14MBXC116.redmond.corp.microsoft.com> Message-ID: <4A7B685D.1080408@mrabarnett.plus.com> Dino Viehland wrote: > On option 1 is this legal then? > > x = float(string) except float('nan') if some_check() else float('inf') if ValueError > Well, is this is legal? try: x = float(string) except some_check(): x = float('nan') except ValueError: x = float('inf') In other words, some_check() returns an exception _class_. >>> def get_exception(): return ValueError >>> try: x = float("") except get_exception(): print "not a float" not a float From dinov at microsoft.com Fri Aug 7 02:01:23 2009 From: dinov at microsoft.com (Dino Viehland) Date: Fri, 7 Aug 2009 00:01:23 +0000 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) In-Reply-To: <4A7B685D.1080408@mrabarnett.plus.com> References: <4A7A0626.5050303@lanl.gov> <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> <20090806012101.B42493A406B@sparrow.telecommunity.com> <4A7AB4D1.6070501@gmail.com> <1A472770E042064698CB5ADC83A12ACD0377CBCB@TK5EX14MBXC116.redmond.corp.microsoft.com> <4A7B685D.1080408@mrabarnett.plus.com> Message-ID: <1A472770E042064698CB5ADC83A12ACD0377D56B@TK5EX14MBXC116.redmond.corp.microsoft.com> MRAB wrote: > Dino Viehland wrote: > > On option 1 is this legal then? > > > > x = float(string) except float('nan') if some_check() else float('inf') if > ValueError > > > Well, is this is legal? > > try: > x = float(string) > except some_check(): > x = float('nan') > except ValueError: > x = float('inf') > I was thinking this was would be equal to: x = float(string) except (float('nan') if some_check() else float('inf')) if ValueError From python at mrabarnett.plus.com Fri Aug 7 02:22:00 2009 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 07 Aug 2009 01:22:00 +0100 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) In-Reply-To: <1A472770E042064698CB5ADC83A12ACD0377D56B@TK5EX14MBXC116.redmond.corp.microsoft.com> References: <4A7A0626.5050303@lanl.gov> <592033416A4F4C20A60ACB28E40F84DF@RaymondLaptop1> <20090806012101.B42493A406B@sparrow.telecommunity.com> <4A7AB4D1.6070501@gmail.com> <1A472770E042064698CB5ADC83A12ACD0377CBCB@TK5EX14MBXC116.redmond.corp.microsoft.com> <4A7B685D.1080408@mrabarnett.plus.com> <1A472770E042064698CB5ADC83A12ACD0377D56B@TK5EX14MBXC116.redmond.corp.microsoft.com> Message-ID: <4A7B73A8.3030802@mrabarnett.plus.com> Dino Viehland wrote: > MRAB wrote: >> Dino Viehland wrote: >>> On option 1 is this legal then? >>> >>> x = float(string) except float('nan') if some_check() else float('inf') if >> ValueError >> Well, is this is legal? >> >> try: >> x = float(string) >> except some_check(): >> x = float('nan') >> except ValueError: >> x = float('inf') >> > > I was thinking this was would be equal to: > > x = float(string) except (float('nan') if some_check() else float('inf')) if ValueError > I suppose it depends on the precedence of 'x except y if z' vs 'x if y else y'. From python at mrabarnett.plus.com Fri Aug 7 02:36:34 2009 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 07 Aug 2009 01:36:34 +0100 Subject: [Python-Dev] (try-except) conditional expression similar to (if-else) conditional (PEP 308) In-Reply-To: References: <4A7A0626.5050303@lanl.gov>

Message-ID: <4A7B7712.50101@mrabarnett.plus.com> Russell E. Owen wrote: > In article , > Xavier Morel