From sjoerdjob at sjoerdjob.com Thu Dec 1 03:19:34 2016 From: sjoerdjob at sjoerdjob.com (Sjoerd Job Postmus) Date: Thu, 1 Dec 2016 09:19:34 +0100 Subject: [Python-ideas] Allow random.choice, random.sample to work on iterators In-Reply-To: References: <1480533959.3740598.804127281.6028420C@webmail.messagingengine.com> <1406db4f-8b71-bbd2-de81-4b8328f4b143@gmail.com> <73afd24e-a5c1-5646-2431-04a79a4937b9@gmail.com> Message-ID: <20161201081934.GF683@sjoerdjob.com> On Wed, Nov 30, 2016 at 02:32:54PM -0600, Nick Timkovich wrote: > a generator with known length that's not indexable (a rare beast?). Not as rare as you might think: >>> k = set(range(10)) >>> len(k) 10 >>> k[3] Traceback (most recent call last): File "", line 1, in TypeError: 'set' object does not support indexing From jelle.zijlstra at gmail.com Fri Dec 2 01:14:29 2016 From: jelle.zijlstra at gmail.com (Jelle Zijlstra) Date: Thu, 1 Dec 2016 22:14:29 -0800 Subject: [Python-ideas] Add optional defaults to namedtuple In-Reply-To: References: <583EEBBC.2050206@stoneleaf.us> Message-ID: 2016-11-30 8:11 GMT-08:00 Guido van Rossum : > On Wed, Nov 30, 2016 at 7:09 AM, Ethan Furman wrote: > >> On 11/30/2016 02:32 AM, Jelte Fennema wrote: >> >> It would be nice to have a supported way to add defaults to namedtuple, >>> so the slightly hacky solution here does not have to be used: >>> http://stackoverflow.com/a/18348004/2570866 >>> >> >> Actually, the solution right below it is better [1]: >> >> --> from collections import namedtuple >> --> class Node(namedtuple('Node', ['value', 'left', 'right'])): >> --> __slots__ = () >> --> def __new__(cls, value, left=None, right=None): >> --> return super(Node, cls).__new__(cls, value, left, right) >> >> But even more readable than that is using the NamedTuple class from my >> aenum [3] library (and on SO as [3]): >> >> --> from aenum import NamedTuple >> --> class Node(NamedTuple): >> --> val = 0 >> --> left = 1, 'previous Node', None >> --> right = 2, 'next Node', None >> >> shamelessly-plugging-my-own-solutions'ly yrs, >> > > Ditto: with PEP 526 and the latest typing.py (in 3.6) you will be able to > do this: > > class Employee(NamedTuple): > name: str > id: int > > We should make it so that the initial value in the class is used as the > default value, too. (Sorry, this syntax still has no room for a docstring > per attribute.) > > Implemented this in https://github.com/python/typing/pull/338 > -- > --Guido van Rossum (python.org/~guido ) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From torsava at redhat.com Fri Dec 2 11:56:29 2016 From: torsava at redhat.com (Tomas Orsava) Date: Fri, 2 Dec 2016 17:56:29 +0100 Subject: [Python-ideas] PEP: Distributing a Subset of the Standard Library In-Reply-To: References:

<6e27a05d-6a02-44f0-fa3f-4c14b9e1befc@redhat.com>

Message-ID: On 11/30/2016 03:56 AM, Nick Coghlan wrote: > Really, I think the ideal solution from a distro perspective would be > to enable something closer to what bash and other shells support for > failed CLI calls: > > $ blender > bash: blender: command not found... > Install package 'blender' to provide command 'blender'? [N/y] n > > This would allow redistributors to point folks towards platform > packages (via apt/yum/dnf/PyPM/conda/Canopy/etc) for the components > they provide, and towards pip/PyPI for everything else (and while we > don't have a dist-lookup-by-module-name service for PyPI *today*, it's > something I hope we'll find a way to provide sometime in the next few > years). > > I didn't suggest that during the Fedora-level discussions of this PEP > because it didn't occur to me - the elegant simplicity of the new > import suffix as a tactical solution to the immediate "splitting the > standard library" problem [1] meant I missed that it was really a > special case of the general "provide guidance on obtaining missing > modules from the system package manager" concept. > > The problem with that idea however is that while it provides the best > possible interactive user experience, it's potentially really slow, > and hence too expensive to do for every import error - we would > instead need to find a way to run with Wolfgang Maier's suggestion of > only doing this for *unhandled* import errors. > > Fortunately, we do have the appropriate mechanisms in place to support > that approach: > > 1. For interactive use, we have sys.excepthook > 2. For non-interactive use, we have the atexit module > > As a simple example of the former: > > >>> def module_missing(modname): > ... return f"Module not found: {modname}" > >>> def my_except_hook(exc_type, exc_value, exc_tb): > ... if isinstance(exc_value, ModuleNotFoundError): > ... print(module_missing(exc_value.name)) > ... > >>> sys.excepthook = my_except_hook > >>> import foo > Module not found: foo > >>> import foo.bar > Module not found: foo > >>> import sys.bar > Module not found: sys.bar > > For the atexit handler, that could be installed by the `site` module, > so the existing mechanisms for disabling site module processing would > also disable any default exception reporting hooks. Folks could also > register their own handlers via either `sitecustomize.py` or > `usercustomize.py`. Is there some reason not to use sys.excepthook for both interactive and non-interactive use? From the docs: "When an exception is raised and uncaught, the interpreter calls|sys.excepthook|with three arguments, the exception class, exception instance, and a traceback object. In an interactive session this happens just before control is returned to the prompt; in a Python program this happens just before the program exits. The handling of such top-level exceptions can be customized by assigning another three-argument function to|sys.excepthook|." Though I believe the default sys.excepthook function is currently written in C, so it wouldn't be very easy for distributors to customize it. Maybe it could be made to read module=error_message pairs from some external file, which would be easier to modify? Yours aye, Tomas > And at that point the problem starts looking less like "Customise the > handling of missing modules" and more like "Customise the rendering > and reporting of particular types of unhandled exceptions". For > example, a custom handler for subprocess.CalledProcessError could > introspect the original command and use `shutil.which` to see if the > requested command was even visible from the current process (and, in a > redistributor provided Python, indicate which system packages to > install to obtain the requested command). > >> My personal vote is a callback called at >> https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap.py#L948 >> with a default implementation that raises ModuleNotFoundError just like the >> current line does. > Ethan's observation about try/except import chains has got me think > that limiting this to handling errors within the context of single > import statement will be problematic, especially given that folks can > already write their own metapath hook for that case if they really > want to. > > Cheers, > Nick. > > [1] For folks wondering "This problem has existed for years, why > suddenly worry about it now?", Fedora's in the process of splitting > out an even more restricted subset of the standard library for system > tools to use: https://fedoraproject.org/wiki/Changes/System_Python > > That means "You're relying on a missing stdlib module" is going to > come up more often for system tools developers trying to stick within > that restricted subset. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Dec 2 20:58:21 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 2 Dec 2016 20:58:21 -0500 Subject: [Python-ideas] Allow random.choice, random.sample to work on iterators In-Reply-To: <20161201081934.GF683@sjoerdjob.com> References: <1480533959.3740598.804127281.6028420C@webmail.messagingengine.com> <1406db4f-8b71-bbd2-de81-4b8328f4b143@gmail.com> <73afd24e-a5c1-5646-2431-04a79a4937b9@gmail.com> <20161201081934.GF683@sjoerdjob.com> Message-ID: On 12/1/2016 3:19 AM, Sjoerd Job Postmus wrote: > On Wed, Nov 30, 2016 at 02:32:54PM -0600, Nick Timkovich wrote: >> a generator with known length that's not indexable (a rare beast?). I don't believe a generator is ever indexable. > Not as rare as you might think: > >>>> k = set(range(10)) >>>> len(k) > 10 >>>> k[3] > Traceback (most recent call last): > File "", line 1, in > TypeError: 'set' object does not support indexing It is also not a generator. (It is an iterable.). If an *arbitrary* choice (without replacement) from a set is sufficient, set.pop() works. Otherwise, make a list. If we wanted selection selection from sets to be easy, without making a list, we should add a method that accesses the internal indexable array. -- Terry Jan Reedy From ncoghlan at gmail.com Fri Dec 2 23:08:35 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 3 Dec 2016 14:08:35 +1000 Subject: [Python-ideas] PEP: Distributing a Subset of the Standard Library In-Reply-To: References:

<6e27a05d-6a02-44f0-fa3f-4c14b9e1befc@redhat.com>

Message-ID: On 3 December 2016 at 02:56, Tomas Orsava wrote: > Is there some reason not to use sys.excepthook for both interactive and > non-interactive use? From the docs: > > "When an exception is raised and uncaught, the interpreter calls > sys.excepthook with three arguments, the exception class, exception > instance, and a traceback object. In an interactive session this happens > just before control is returned to the prompt; in a Python program this > happens just before the program exits. The handling of such top-level > exceptions can be customized by assigning another three-argument function to > sys.excepthook." No, that was just me forgetting that sys.excepthook was also called for unhandled exceptions in non-interactive mode. It further strengthens the argument for seeing how far we can get with just the flexibility CPython already provides, though. > Though I believe the default sys.excepthook function is currently written in > C, so it wouldn't be very easy for distributors to customize it. Maybe it > could be made to read module=error_message pairs from some external file, > which would be easier to modify? The default implementation is written in C, but distributors could patch site.py to replace it with a custom one written in Python. For example, publish a "fedora-hooks" module to PyPI (so non-system Python installations or applications regularly run without the site module can readily use the same hooks if they choose to do so), and then patch site.py in the system Python to do: import fedora_hooks fedora_hooks.install_excepthook() The nice thing about that approach is it wouldn't need a new switch to turn it off - it would get turned off with all the other site-specific customisations when -S or -I is used. It would also better open things up to redistributor experimentation in existing releases (2.7, 3.5, etc) before we commit to a specific approach in the reference interpreter (such as adding an optional 'platform.hooks' submodule that vendors may provide, and relevant stdlib APIs will then call automatically to override the default upstream provided processing). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From askvictor at gmail.com Sun Dec 4 18:15:04 2016 From: askvictor at gmail.com (victor rajewski) Date: Sun, 04 Dec 2016 23:15:04 +0000 Subject: [Python-ideas] Better error messages [was: (no subject)] In-Reply-To: References:

<22590.13856.162202.818428@turnbull.sk.tsukuba.ac.jp> Message-ID: Thanks for all of the thoughtful replies (and for moving to a more useful subject line). There is currently a big push towards teaching coding and computational thinking to school students, but a lack of skilled teachers to actually be able to support this, and I don't see any initiatives that will address this in a long-term, large-scale fashion (I'm speaking primarily from an Australian perspective, and might be misreading the situation in other countries). It's worth considering a classroom where the teacher has minimal experience in programming, and a portion of the students have low confidence in computing matters. Anything that will empower either the teacher or the students to get past a block will be useful here; and error messages are, in my experience as a teacher, one of more threatening parts of Python for the beginner. A few clarifications and thoughts arising from the discussion: - I personally find the current error messages quite useful, and they have the advantage of being machine-parseable, so that IDEs such as PyCharm can add value to them. However, the audience of this idea is not me, and probably not you. It is students who are learning Python, and probably haven't done any programming at all. But it might also be casual programmers who never really look at error message as they are too computer-y. - Learning how to parse an error message is a very valuable skill for a programmer to learn. However, I believe that should come later on in their journey. A technical error message when a student is starting out can be a bit overwhelming to some learners, who are already taking in a lot of information. - I'm not suggesting this should become part of the normal operation of Python, particularly if that breaks compatibility or impacts performance. A switch, or a seperate executable would probably work. I'd lean against the idea of tying this to a particular IDE/environment, but if that's the way this can progress, then let's do that to get it moving. However, it has to be dead simple to get it running. - I think this is necessary for scripts as well as the REPL (also other envs like Jupyter notebooks). - It will be almost impossible to deal with all cases, but that isn't the point here. The trick would be to find the most common errors that a beginning programmer will make, find the most common fixes, and provide them as hints, or suggestions. - The examples listed in my original email are simply ideas, without much thought about how feasible (or useful) they are to implement. Going forward, we would identify common errors that beginners make, and what would help them fix these errors. -- Victor Rajewski -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sun Dec 4 18:40:21 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 5 Dec 2016 10:40:21 +1100 Subject: [Python-ideas] Better error messages [was: (no subject)] In-Reply-To: References:

<22590.13856.162202.818428@turnbull.sk.tsukuba.ac.jp> Message-ID: On Mon, Dec 5, 2016 at 10:15 AM, victor rajewski wrote: > There is currently a big push towards teaching coding and computational > thinking to school students, but a lack of skilled teachers to actually be > able to support this, and I don't see any initiatives that will address this > in a long-term, large-scale fashion (I'm speaking primarily from an > Australian perspective, and might be misreading the situation in other > countries). It's worth considering a classroom where the teacher has minimal > experience in programming, and a portion of the students have low confidence > in computing matters. Anything that will empower either the teacher or the > students to get past a block will be useful here; and error messages are, in > my experience as a teacher, one of more threatening parts of Python for the > beginner. While I fully support enhancements to error messages (and the possibility of a "programming student" mode that assumes a novice and tweaks the messages accordingly), I don't think it's right to aim at a classroom where *the teacher* doesn't have sufficient programming skills. Would you build a pocket calculator so it can be used in a classroom where even the teacher doesn't know about division by zero? Would you design a violin so a non-musician can teach its use? IMO the right way to teach computer programming is for it to be the day job for people who do all their programming in open source and/or personal projects. There are plenty of people competent enough to teach programming and would benefit from a day job. Design the error messages to minimize the load on the room's sole expert, but assume that there'll always be someone around who can deal with the edge cases. In other words, aim for the 90% or 95%, rather than trying to explain 100% of situations. ChrisA From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Dec 4 18:57:28 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 5 Dec 2016 08:57:28 +0900 Subject: [Python-ideas] Better error messages [was: (no subject)] In-Reply-To: References:

<22590.13856.162202.818428@turnbull.sk.tsukuba.ac.jp> Message-ID: <22596.44392.901811.945311@turnbull.sk.tsukuba.ac.jp> victor rajewski writes: > - I personally find the current error messages quite useful, and > they have the advantage of being machine-parseable, so that IDEs > such as PyCharm can add value to them. However, the audience of > this idea is not me, and probably not you. It is students who > are learning Python, and probably haven't done any programming > at all. But it might also be casual programmers who never really > look at error message as they are too computer-y. That's a misconception. You have not yet given up on a change to the Python interpreter, so the audience is *every* user of the Python interpreter (including other programs), and that's why you're getting pushback. The Python interpreter's main job is to execute code. A secondary job is provide *accurate* diagnostics of errors in execution. Interpreting those diagnostics is somebody else's job, typically the programmer's. For experienced programmers, that's usually what you want, because (1) the interpretation is frequently data-dependent and (2) the "obvious" suggestion may be wrong. FYI, a *lot* of effort has gone into making error messages more precise, more accurate, and more informative, eg, by improving stack traces. OTOH, if the diagnostics are accurate and machine-parsable, then the amount of annoying detail that needs to be dealt with in providing a "tutorial" front-end for those messages is small. That suggests to me that the problem really is that interpreting errors, even in "student" programs, is *hard* and rules of thumb are frequently mistaken. That's an excellent tradeoff if there's a teacher looking over the (student) programmer's shoulder. Not a good idea for the interpreter. > - I'm not suggesting this should become part of the normal > operation of Python, particularly if that breaks compatibility > or impacts performance. A switch, or a seperate executable would > probably work. I'd lean against the idea of tying this to a > particular IDE/environment, but if that's the way this can > progress, then let's do that to get it moving. It really should be a separate executable. There are multiple implementations of Python, and even restricted to CPython, with even a small amount of uptake this project will move a *lot* faster than CPython does. Every tiny change to the "better living through better errors" database makes a difference to all the students out there, so its release cycle should probably be really fast. > - The examples listed in my original email are simply ideas, > without much thought about how feasible (or useful) they are to > implement. Going forward, we would identify common errors that > beginners make, and what would help them fix these errors. In other words, you envision a long-term project with an ongoing level of effort. I think that it's worth doing. But I also think it's quite feasible to put it in a separate project, with cooperation from Python-Dev in the matter of ensuring that diagnostics are machine- parseable. Eg, this means that Python-Dev should not randomly change messages that are necessary to interpret an Exception, and in some cases it may be useful to add Exception/Error subtypes to make interpretation more precise (though this will often get pushback). From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Dec 4 20:14:35 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 5 Dec 2016 10:14:35 +0900 Subject: [Python-ideas] Better error messages [was: (no subject)] In-Reply-To: References:

<22590.13856.162202.818428@turnbull.sk.tsukuba.ac.jp> Message-ID: <22596.49019.511677.186175@turnbull.sk.tsukuba.ac.jp> Chris Angelico writes: > On Mon, Dec 5, 2016 at 10:15 AM, victor rajewski wrote: > > There is currently a big push towards teaching coding and > > computational thinking to school students, but a lack of skilled > > teachers to actually be able to support this, and I don't see any > > initiatives that will address this in a long-term, large-scale > > fashion (I'm speaking primarily from an Australian perspective, > > and might be misreading the situation in other countries). It's > > worth considering a classroom where the teacher has minimal > > experience in programming, and a portion of the students have low > > confidence in computing matters. Anything that will empower > > either the teacher or the students to get past a block will be > > useful here; and error messages are, in my experience as a > > teacher, one of more threatening parts of Python for the > > beginner. > > While I fully support enhancements to error messages (and the > possibility of a "programming student" mode that assumes a novice and > tweaks the messages accordingly), I don't think it's right to aim at a > classroom where *the teacher* doesn't have sufficient programming > skills. That's not exactly what he said. High school math teachers are likely to be the product of education schools, and may be highly skilled in building PowerPoint presentations, and have some experience in programming, but not as a professional. But nobody expects David Beazley at Pigsty High! So I can easily imagine a teacher responsible for several classes of 40 students for 2 hour-long sessions a week per class, who is unable to "interpret at a glance" many error messages produced by the Python interpreter. This is basically the "aim for 90%" approach you describe, and Victor admits that's the best we can do. And, realistically, in today's ed systems there *will* be teachers far below the level you advocate. > IMO the right way to teach computer programming is for it to be the > day job for people who do all their programming in open source and/or > personal projects. There are plenty of people competent enough to > teach programming and would benefit from a day job. I don't know where you live, but in both of my countries there is a teacher's union to ensure that nobody without an Ed degree gets near a classroom. More precisely, volunteers under the supervision of somebody with professional teaching credentials, yes, day job, not in this century. And "teaching credentials" == degree from a state- certified 4-year Ed program, not something you can get at a community college in an adult ed program. (Japan is somewhat more lenient than that, but you do need a 4 year degree and a truckload of credits in ed courses -- and it's not a career-track job.) > Design the error messages to minimize the load on the room's sole > expert, but assume that there'll always be someone around who can > deal with the edge cases. In other words, aim for the 90% or 95%, > rather than trying to explain 100% of situations. I think we all agree on that being the best approach. From turnbull.stephen.fw at u.tsukuba.ac.jp Sun Dec 4 20:40:47 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 5 Dec 2016 10:40:47 +0900 Subject: [Python-ideas] Better error messages [was: (no subject)] In-Reply-To: References:

<22590.13856.162202.818428@turnbull.sk.tsukuba.ac.jp> Message-ID: <22596.50591.129903.980234@turnbull.sk.tsukuba.ac.jp> Chris Angelico writes: > On Mon, Dec 5, 2016 at 10:15 AM, victor rajewski wrote: > > There is currently a big push towards teaching coding and > > computational thinking to school students, but a lack of skilled > > teachers to actually be able to support this, and I don't see any > > initiatives that will address this in a long-term, large-scale > > fashion (I'm speaking primarily from an Australian perspective, > > and might be misreading the situation in other countries). It's > > worth considering a classroom where the teacher has minimal > > experience in programming, and a portion of the students have low > > confidence in computing matters. Anything that will empower > > either the teacher or the students to get past a block will be > > useful here; and error messages are, in my experience as a > > teacher, one of more threatening parts of Python for the > > beginner. > > While I fully support enhancements to error messages (and the > possibility of a "programming student" mode that assumes a novice and > tweaks the messages accordingly), I don't think it's right to aim at a > classroom where *the teacher* doesn't have sufficient programming > skills. That's not exactly what he said. High school teachers are likely to be the product of education schools, and may be highly skilled in building PowerPoint presentations, and have some experience in programming, but not as a professional. So I can easily imagine a teacher responsible for several classes of 40 students for 2 hour-long sessions a week per class, and not being able to "interpret at a glance" many error messages produced by the Python interpreter. This is basically the "aim for 90%" approach you describe, and he admits that's the best we can do. > IMO the right way to teach computer programming is for it to be the > day job for people who do all their programming in open source and/or > personal projects. There are plenty of people competent enough to > teach programming and would benefit from a day job. I don't know where you live, but in both of my countries there is a teacher's union to ensure that nobody without an Ed degree gets near a classroom. More precisely, volunteers under the supervision of somebody with professional teaching credentials, yes, day job, not in this century. And "teaching credentials" == degree from a state- certified 4-year Ed program, not something you can get at a community college in an adult ed program. > Design the error messages to minimize the load on the room's sole > expert, but assume that there'll always be someone around who can > deal with the edge cases. In other words, aim for the 90% or 95%, > rather than trying to explain 100% of situations. I think we all agree on that. From rosuav at gmail.com Sun Dec 4 21:35:10 2016 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 5 Dec 2016 13:35:10 +1100 Subject: [Python-ideas] Better error messages [was: (no subject)] In-Reply-To: <22596.50591.129903.980234@turnbull.sk.tsukuba.ac.jp> References:

<22590.13856.162202.818428@turnbull.sk.tsukuba.ac.jp> <22596.50591.129903.980234@turnbull.sk.tsukuba.ac.jp> Message-ID: On Mon, Dec 5, 2016 at 12:40 PM, Stephen J. Turnbull wrote: > That's not exactly what he said. High school teachers are likely to > be the product of education schools, and may be highly skilled in > building PowerPoint presentations, and have some experience in > programming, but not as a professional. So I can easily imagine a > teacher responsible for several classes of 40 students for 2 hour-long > sessions a week per class, and not being able to "interpret at a > glance" many error messages produced by the Python interpreter. This > is basically the "aim for 90%" approach you describe, and he admits > that's the best we can do. Okay, then I misinterpreted. Seems we are indeed in agreement. Sounds good! > > IMO the right way to teach computer programming is for it to be the > > day job for people who do all their programming in open source and/or > > personal projects. There are plenty of people competent enough to > > teach programming and would benefit from a day job. > > I don't know where you live, but in both of my countries there is a > teacher's union to ensure that nobody without an Ed degree gets near a > classroom. More precisely, volunteers under the supervision of > somebody with professional teaching credentials, yes, day job, not in > this century. And "teaching credentials" == degree from a state- > certified 4-year Ed program, not something you can get at a community > college in an adult ed program. Sadly, that's probably true here in Australia too, but I don't know for sure. I have no specific qualifications, but I teach online; it's high time the unions got broken IMO... but that's outside the scope of this. If it takes a credentialed teacher to get a job in a school, so be it - but at least make sure it's someone who knows how to interpret the error messages, so that any student who runs into trouble can ask the prof. ChrisA From ncoghlan at gmail.com Sun Dec 4 21:40:14 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 5 Dec 2016 12:40:14 +1000 Subject: [Python-ideas] Better error messages [was: (no subject)] In-Reply-To: References:

<22590.13856.162202.818428@turnbull.sk.tsukuba.ac.jp> Message-ID: On 5 December 2016 at 09:15, victor rajewski wrote: > > There is currently a big push towards teaching coding and computational > thinking to school students, but a lack of skilled teachers to actually be > able to support this, and I don't see any initiatives that will address this > in a long-term, large-scale fashion (I'm speaking primarily from an > Australian perspective, and might be misreading the situation in other > countries). It's worth considering a classroom where the teacher has minimal > experience in programming, and a portion of the students have low confidence > in computing matters. Anything that will empower either the teacher or the > students to get past a block will be useful here; and error messages are, in > my experience as a teacher, one of more threatening parts of Python for the > beginner. > Hi Victor, I'm one of the co-coordinators of the PyCon Australia Education Seminar, and agree entirely with what you say here. However, it isn't a problem that *python-dev* is well-positioned to tackle. Rather, it requires ongoing attention from vendors, volunteers and non-profit organisations that are specifically focused on meeting the needs of the educational sector. So your goal is valid, it's only your current choice of audience that is slightly mistargeted. Within Australia specifically, the two main drivers of the improvements in Python's suitability for teachers are Grok Learning (who provide a subscription-based online learning environment directly to schools based on a service originally developed for the annual National Computer Science School) and Code Club Australia (the Australian arm of a UK-based non-profit aimed at providing support for after-school code clubs around Australia, as well as professional development opportunities for teachers needing to cope with the incoming Digital Technologies curriculum). > I'm not suggesting this should become part of the normal operation of > Python, particularly if that breaks compatibility or impacts performance. A > switch, or a seperate executable would probably work. I'd lean against the > idea of tying this to a particular IDE/environment, but if that's the way > this can progress, then let's do that to get it moving. However, it has to > be dead simple to get it running. The model adopted by Grok Learning and many other education focused service providers (codesters.com, etc) is to provide the learning environment entirely through the browser, as that copes with entirely locked down client devices, and only requires whitelisting of the vendor's site in the school's firewall settings. The only context where it doesn't work is when the school doesn't have reliable internet connectivity at all, in which case the cheap-dedicated-device model driven by the UK's Raspberry Pi Foundation may be a more suitable option. > It will be almost impossible to deal with all cases, but that isn't the > point here. The trick would be to find the most common errors that a > beginning programmer will make, find the most common fixes, and provide them > as hints, or suggestions. > The examples listed in my original email are simply ideas, without much > thought about how feasible (or useful) they are to implement. Going forward, > we would identify common errors that beginners make, and what would help > them fix these errors. Right, and the folks best positioned to identify those errors empirically, and also to make data-driven improvements based on the typical number of iterations needed for beginners to fix their own mistakes, are the educational service providers. Some of the more sophisticated providers (like Knewton in the US) are even able to adapt their curricula on the fly, offer learners additional problems in areas they seem to be struggling with. Don't get me wrong, there are definitely lots of areas where we can make the default error messages more beginner friendly just by providing relevant information that the interpreter has available, and this is important for helping out the teachers that *don't* have institutional mandates backing them up. But for cases like the Australian Digital Curriculum, it makes sense for schools to look into the local service providers rather than asking teachers to make do with what they can download from the internet (while the latter option is viable in some cases, it really does require a high level of technical skill on the teacher's part) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Dec 4 22:08:56 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 5 Dec 2016 13:08:56 +1000 Subject: [Python-ideas] Better error messages [was: (no subject)] In-Reply-To: References:

<22590.13856.162202.818428@turnbull.sk.tsukuba.ac.jp> <22596.50591.129903.980234@turnbull.sk.tsukuba.ac.jp> Message-ID: On 5 December 2016 at 12:35, Chris Angelico wrote: > On Mon, Dec 5, 2016 at 12:40 PM, Stephen J. Turnbull > wrote: >> I don't know where you live, but in both of my countries there is a >> teacher's union to ensure that nobody without an Ed degree gets near a >> classroom. More precisely, volunteers under the supervision of >> somebody with professional teaching credentials, yes, day job, not in >> this century. And "teaching credentials" == degree from a state- >> certified 4-year Ed program, not something you can get at a community >> college in an adult ed program. > > Sadly, that's probably true here in Australia too, but I don't know > for sure. I have no specific qualifications, but I teach online; it's > high time the unions got broken IMO... but that's outside the scope of > this. If it takes a credentialed teacher to get a job in a school, so > be it - but at least make sure it's someone who knows how to interpret > the error messages, so that any student who runs into trouble can ask > the prof. Graduate diplomas in Education in Australia are one- or two-year certificate programs, and some state level industry-to-education programs aim to get folks into the classroom early by offering pre-approvals for teaching subjects specifically related to their area of expertise. However, the main problem isn't the credentials, and it's definitely not unions, it's the fact that professional software developers have a lot of options open to them both locally and globally, and "empower the next generation to be the managers of digital systems rather than their servants" has a lot of downsides compared to the alternatives (most notably: you'll get paid a lot more in industry than you will as a teacher, so opting for teaching as a change in career direction here will necessarily be a lifestyle choice based on the non-monetary factors. That's not going to change as long as people assume that teaching is easy and/or not important). That means that we're not at a point in history where we can assume that teachers are going to be more computationally literate than their students - instead, we need to assume that many of the teachers involved will themselves be new to the concepts being taught and work on empowering them *anyway*. I just don't personally think that's feasible on a volunteer basis - you need professional service providers that are familiar not only with the specific concepts and technologies being taught, but also with the bureaucratic context that the particular schools and teachers they serve have to work within. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From torsava at redhat.com Mon Dec 5 04:56:58 2016 From: torsava at redhat.com (Tomas Orsava) Date: Mon, 5 Dec 2016 10:56:58 +0100 Subject: [Python-ideas] PEP: Distributing a Subset of the Standard Library In-Reply-To: References:

<6e27a05d-6a02-44f0-fa3f-4c14b9e1befc@redhat.com>

Message-ID: <176c5504-78a8-401d-9631-6b7126ac5af9@redhat.com> On 12/03/2016 05:08 AM, Nick Coghlan wrote: >> Though I believe the default sys.excepthook function is currently written in >> C, so it wouldn't be very easy for distributors to customize it. Maybe it >> could be made to read module=error_message pairs from some external file, >> which would be easier to modify? > The default implementation is written in C, but distributors could > patch site.py to replace it with a custom one written in Python. For > example, publish a "fedora-hooks" module to PyPI (so non-system Python > installations or applications regularly run without the site module > can readily use the same hooks if they choose to do so), and then > patch site.py in the system Python to do: > > import fedora_hooks > fedora_hooks.install_excepthook() > > The nice thing about that approach is it wouldn't need a new switch to > turn it off - it would get turned off with all the other site-specific > customisations when -S or -I is used. It would also better open things > up to redistributor experimentation in existing releases (2.7, 3.5, > etc) before we commit to a specific approach in the reference > interpreter (such as adding an optional 'platform.hooks' submodule > that vendors may provide, and relevant stdlib APIs will then call > automatically to override the default upstream provided processing). Ah, but of course! That leaves us with only one part of the PEP unresolved: When the build process is unable to compile some modules when building Python from source (such as _sqlite3 due to missing sqlite headers), it would be great to provide a custom message when one then tries to import such module when using the compiled Python. Do you see a 'pretty' solution for that within this framework? Yours aye, Tomas From torsava at redhat.com Mon Dec 5 07:53:02 2016 From: torsava at redhat.com (Tomas Orsava) Date: Mon, 5 Dec 2016 13:53:02 +0100 Subject: [Python-ideas] PEP: Distributing a Subset of the Standard Library In-Reply-To: References:

<6e27a05d-6a02-44f0-fa3f-4c14b9e1befc@redhat.com>

<176c5504-78a8-401d-9631-6b7126ac5af9@redhat.com> Message-ID: <6062460d-2cbe-63cf-8937-a2051cfbfa8a@redhat.com> On 12/05/2016 01:42 PM, Nick Coghlan wrote: > On 5 December 2016 at 19:56, Tomas Orsava wrote: >> On 12/03/2016 05:08 AM, Nick Coghlan wrote: >>>> Though I believe the default sys.excepthook function is currently written >>>> in >>>> C, so it wouldn't be very easy for distributors to customize it. Maybe it >>>> could be made to read module=error_message pairs from some external file, >>>> which would be easier to modify? >>> The default implementation is written in C, but distributors could >>> patch site.py to replace it with a custom one written in Python. For >>> example, publish a "fedora-hooks" module to PyPI (so non-system Python >>> installations or applications regularly run without the site module >>> can readily use the same hooks if they choose to do so), and then >>> patch site.py in the system Python to do: >>> >>> import fedora_hooks >>> fedora_hooks.install_excepthook() >>> >>> The nice thing about that approach is it wouldn't need a new switch to >>> turn it off - it would get turned off with all the other site-specific >>> customisations when -S or -I is used. It would also better open things >>> up to redistributor experimentation in existing releases (2.7, 3.5, >>> etc) before we commit to a specific approach in the reference >>> interpreter (such as adding an optional 'platform.hooks' submodule >>> that vendors may provide, and relevant stdlib APIs will then call >>> automatically to override the default upstream provided processing). >> Ah, but of course! That leaves us with only one part of the PEP unresolved: >> When the build process is unable to compile some modules when building >> Python from source (such as _sqlite3 due to missing sqlite headers), it >> would be great to provide a custom message when one then tries to import >> such module when using the compiled Python. >> >> Do you see a 'pretty' solution for that within this framework? > I'm not sure it qualifies as 'pretty', but one approach would be to > have a './Modules/missing/' directory that gets pre-populated with > checked in ".py" files for extension modules that aren't always > built. When getpath.c detects it's running from a development > checkout, it would add that directory to sys.path (just before > site-packages), while 'make install' and 'make altinstall' would only > copy files from that directory into the installation target if the > corresponding extension modules were missing. > > Essentially, that would be the "name.missing.py" part of the draft > proposal for optional standard library modules, just with a regular > "name.py" module name and a tweak to getpath.c. To my eye that looks like a complicated mechanism necessitating changes to several parts of the codebase. Have you considered modifying the default sys.excepthook implementation to read a list of modules and error messages from a file that was generated during the build process? To me that seems simpler, and the implementation will be only in one place. In addition, distributors could just populate that file with their data, thus we would have one mechanism for both use cases. Tomas From ncoghlan at gmail.com Mon Dec 5 07:42:04 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 5 Dec 2016 22:42:04 +1000 Subject: [Python-ideas] PEP: Distributing a Subset of the Standard Library In-Reply-To: <176c5504-78a8-401d-9631-6b7126ac5af9@redhat.com> References:

<6e27a05d-6a02-44f0-fa3f-4c14b9e1befc@redhat.com>

<176c5504-78a8-401d-9631-6b7126ac5af9@redhat.com> Message-ID: On 5 December 2016 at 19:56, Tomas Orsava wrote: > On 12/03/2016 05:08 AM, Nick Coghlan wrote: >>> >>> Though I believe the default sys.excepthook function is currently written >>> in >>> C, so it wouldn't be very easy for distributors to customize it. Maybe it >>> could be made to read module=error_message pairs from some external file, >>> which would be easier to modify? >> >> The default implementation is written in C, but distributors could >> patch site.py to replace it with a custom one written in Python. For >> example, publish a "fedora-hooks" module to PyPI (so non-system Python >> installations or applications regularly run without the site module >> can readily use the same hooks if they choose to do so), and then >> patch site.py in the system Python to do: >> >> import fedora_hooks >> fedora_hooks.install_excepthook() >> >> The nice thing about that approach is it wouldn't need a new switch to >> turn it off - it would get turned off with all the other site-specific >> customisations when -S or -I is used. It would also better open things >> up to redistributor experimentation in existing releases (2.7, 3.5, >> etc) before we commit to a specific approach in the reference >> interpreter (such as adding an optional 'platform.hooks' submodule >> that vendors may provide, and relevant stdlib APIs will then call >> automatically to override the default upstream provided processing). > > Ah, but of course! That leaves us with only one part of the PEP unresolved: > When the build process is unable to compile some modules when building > Python from source (such as _sqlite3 due to missing sqlite headers), it > would be great to provide a custom message when one then tries to import > such module when using the compiled Python. > > Do you see a 'pretty' solution for that within this framework? I'm not sure it qualifies as 'pretty', but one approach would be to have a './Modules/missing/' directory that gets pre-populated with checked in ".py" files for extension modules that aren't always built. When getpath.c detects it's running from a development checkout, it would add that directory to sys.path (just before site-packages), while 'make install' and 'make altinstall' would only copy files from that directory into the installation target if the corresponding extension modules were missing. Essentially, that would be the "name.missing.py" part of the draft proposal for optional standard library modules, just with a regular "name.py" module name and a tweak to getpath.c. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Dec 5 21:27:51 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 6 Dec 2016 12:27:51 +1000 Subject: [Python-ideas] PEP: Distributing a Subset of the Standard Library In-Reply-To: <6062460d-2cbe-63cf-8937-a2051cfbfa8a@redhat.com> References:

<6e27a05d-6a02-44f0-fa3f-4c14b9e1befc@redhat.com>

<176c5504-78a8-401d-9631-6b7126ac5af9@redhat.com> <6062460d-2cbe-63cf-8937-a2051cfbfa8a@redhat.com> Message-ID: On 5 December 2016 at 22:53, Tomas Orsava wrote: > On 12/05/2016 01:42 PM, Nick Coghlan wrote: >> Essentially, that would be the "name.missing.py" part of the draft >> proposal for optional standard library modules, just with a regular >> "name.py" module name and a tweak to getpath.c. > > To my eye that looks like a complicated mechanism necessitating changes to > several parts of the codebase. Have you considered modifying the default > sys.excepthook implementation to read a list of modules and error messages > from a file that was generated during the build process? To me that seems > simpler, and the implementation will be only in one place. > > In addition, distributors could just populate that file with their data, > thus we would have one mechanism for both use cases. That's certainly another possibility, and one that initially appears to confine most of the complexity to sys.excepthook(). However, the problem you run into in that case is that CPython, by default, doesn't have any configuration files other than site.py, sitecustomize.py, usercustomize.py and whatever PYTHONSTARTUP points to for interactive use. The only non-executable one that is currently defined is the recommendation to redistributors in PEP 493 for file-based configuration of HTTPS-verification-by-default backports to earlier 2.7.x versions. Probably the closest analogy I can think of is the way we currently generate _sysconfigdata-.py in order to capture the build time settings such that sysconfig.get_config_vars() can report them at runtime. So using _sysconfigdata as inspiration, it would likely be possible to provide a "sysconfig.get_missing_modules()" API that the default sys.excepthook() could use to report that a particular import didn't work because an optional standard library module hadn't been built. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From torsava at redhat.com Tue Dec 6 11:50:56 2016 From: torsava at redhat.com (Tomas Orsava) Date: Tue, 6 Dec 2016 17:50:56 +0100 Subject: [Python-ideas] PEP: Distributing a Subset of the Standard Library In-Reply-To: References:

<6e27a05d-6a02-44f0-fa3f-4c14b9e1befc@redhat.com>

<176c5504-78a8-401d-9631-6b7126ac5af9@redhat.com> <6062460d-2cbe-63cf-8937-a2051cfbfa8a@redhat.com> Message-ID: <53acb4a9-052e-fad8-888e-897cac0d0356@redhat.com> On 12/06/2016 03:27 AM, Nick Coghlan wrote: > On 5 December 2016 at 22:53, Tomas Orsava wrote: >> On 12/05/2016 01:42 PM, Nick Coghlan wrote: >>> Essentially, that would be the "name.missing.py" part of the draft >>> proposal for optional standard library modules, just with a regular >>> "name.py" module name and a tweak to getpath.c. >> To my eye that looks like a complicated mechanism necessitating >> changes to >> several parts of the codebase. Have you considered modifying the default >> sys.excepthook implementation to read a list of modules and error >> messages >> from a file that was generated during the build process? To me that >> seems >> simpler, and the implementation will be only in one place. >> >> In addition, distributors could just populate that file with their data, >> thus we would have one mechanism for both use cases. > That's certainly another possibility, and one that initially appears > to confine most of the complexity to sys.excepthook(). However, the > problem you run into in that case is that CPython, by default, doesn't > have any configuration files other than site.py, sitecustomize.py, > usercustomize.py and whatever PYTHONSTARTUP points to for interactive > use. The only non-executable one that is currently defined is the > recommendation to redistributors in PEP 493 for file-based > configuration of HTTPS-verification-by-default backports to earlier > 2.7.x versions. > > Probably the closest analogy I can think of is the way we currently > generate _sysconfigdata-.py in order to > capture the build time settings such that sysconfig.get_config_vars() > can report them at runtime. > > So using _sysconfigdata as inspiration, it would likely be possible to > provide a "sysconfig.get_missing_modules()" API that the default > sys.excepthook() could use to report that a particular import didn't > work because an optional standard library module hadn't been built. Quite interesting. And sysconfig.get_missing_modules() wouldn't even have to be generated during the build process, because it would be called only when the import has failed, at which point it is obvious Python was built without said component (like _sqlite3). So do you see that as an acceptable solution? Do you prefer the one you suggested previously? Alternatively, can the contents of site.py be generated during the build process? Because if some modules couldn't be built, a custom implementation of sys.excepthook might be generated there with the data for the modules that failed to be built. Regards, Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Tue Dec 6 16:01:24 2016 From: random832 at fastmail.com (Random832) Date: Tue, 06 Dec 2016 16:01:24 -0500 Subject: [Python-ideas] Proposal: Tuple of str with w'list of words' In-Reply-To: <20161112180556.GP3365@ando.pearwood.info> References: <20161112180556.GP3365@ando.pearwood.info> Message-ID: <1481058084.3493918.810576489.342344B4@webmail.messagingengine.com> On Sat, Nov 12, 2016, at 13:05, Steven D'Aprano wrote: > I'm rather luke-warm on this proposal, although I might be convinced to > support it if: > > - w'...' unconditionally split on any whitespace (possibly > excluding NBSP); > > - and normal escapes worked. Is there any particular objection to allowing the backslash-space escape (and for escapes that mean whitespace characters, such as \t, \x20, to not split, if you meant to imply that they do)? That would provide the extra push to this being beneficial over split(). I also have an alternate idea: sl{word1 word2 'string 3' "string 4"} From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Dec 6 19:51:25 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 7 Dec 2016 09:51:25 +0900 Subject: [Python-ideas] Proposal: Tuple of str with w'list of words' In-Reply-To: <1481058084.3493918.810576489.342344B4@webmail.messagingengine.com> References: <20161112180556.GP3365@ando.pearwood.info> <1481058084.3493918.810576489.342344B4@webmail.messagingengine.com> Message-ID: <22599.23821.553471.816507@turnbull.sk.tsukuba.ac.jp> Random832 writes: > Is there any particular objection to allowing the backslash-space escape > (and for escapes that mean whitespace characters, such as \t, \x20, to > not split, if you meant to imply that they do)? That would provide the > extra push to this being beneficial over split(). You're suggesting that (1) most escapes would be processed after splitting while (2) backslash-space (what about backslash-tab?) would be treated as an escape during splitting? > I also have an alternate idea: sl{word1 word2 'string 3' "string 4"} word1 and word2 are what perl would term "barewords"? Ie treated as strings? -1 to w"", -1 to inconsistent interpretation of escapes, and -1 to a completely new syntax. " ", "\x20", "\u0020", and "\U00000020" currently are different representations of the same string, so it would be confusing if the same notations meant different things in this context. Another syntax plus overloading standard string notation with yet another semantics (strings, rawstrings) doesn't seem like a win to me. As I accept the usual Pythonic aversion to mere abbreviations, I don't see any benefit to these notations, except for the case where a list just won't do, so you can avoid a call to tuple. We already have three good ways to do this: wordlist = ["word1", "word2", "string 3", "string 4"] wordlist = "word1,word2,string 3,string 4".split(",") wordlist = open(word_per_line_file).readlines() and for maximum Unicode-conforming generality with compact notation: wordlist = "word1\UFFFFword2\UFFFFstring 3\UFFFFstring 4".split("\UFFFF") More seriously, in most use cases there will be ASCII control characters that you could use, which most editors can enter (though they might be visually unattractive in many editors, eg, \x0C). Steve From steve at pearwood.info Tue Dec 6 20:03:46 2016 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 7 Dec 2016 12:03:46 +1100 Subject: [Python-ideas] Proposal: Tuple of str with w'list of words' In-Reply-To: <1481058084.3493918.810576489.342344B4@webmail.messagingengine.com> References: <20161112180556.GP3365@ando.pearwood.info> <1481058084.3493918.810576489.342344B4@webmail.messagingengine.com> Message-ID: <20161207010345.GW3365@ando.pearwood.info> On Tue, Dec 06, 2016 at 04:01:24PM -0500, Random832 wrote: > On Sat, Nov 12, 2016, at 13:05, Steven D'Aprano wrote: > > I'm rather luke-warm on this proposal, although I might be convinced to > > support it if: > > > > - w'...' unconditionally split on any whitespace (possibly > > excluding NBSP); > > > > - and normal escapes worked. > > Is there any particular objection to allowing the backslash-space escape > (and for escapes that mean whitespace characters, such as \t, \x20, to > not split, if you meant to imply that they do)? I hadn't actually considered the question of whether w-strings should split before, or after, applying the escapes. (Or if I had, it was so long ago that I forgot what I decided.) I suppose there's no good reason for them to apply before splitting. I cannot think of any reason why you would write: w"Nobody expects the Spanish\x20Inquisition!" expecting to split "Spanish" and "Inquisition!". It's easier to just press the spacebar. So let's suppose that escapes are processed after the string is split, so that the w-string above becomes: ['Nobody', 'expects', 'the', 'Spanish Inquisition!'] Do we still need a new "\ " escape for a literal string? We clearly don't *need* it, since the user can write \x20 or \o40 or even '\N{SPACE}'. I'm *moderately* against it, since its hard to spot escaped spaces in a forest of unescaped ones, or vice versa: # example from the OP songs = w'My\ Bloody\ Valentine Blue\ Suede\ Shoes' I think that escaping spaces like that will be an attractive nuisance. I had to read the OP's example three times before I noticed that the space between Valentine and Blue was not escaped. What about ordinary strings? What is 'spam\ eggs'? It could be: - allow the escape and return 'spam eggs', even though it is pointless; - disallow the escape, and raise an exception, even though that's inconsistent with w-strings. I'm not really happy with either of those solutions (although I'm slightly less unhappy with the first). So in order of preference, least to worst: strong opposition -1 to the original proposal of w-strings with no escapes except for \space; weak opposition -0.25 for w-strings where \space behaves differently (raises an exception) in regular strings; mildly negative indifference -0 for w-strings with \space allowed in regular strings as well; mildly positive approval +0 for w-strings without bothering to allow \space at all (the user can use \x20 or equivalent). For the avoidance of doubt, by \space I mean a backslash followed by a literal space character. > That would provide the extra push to this being beneficial over split(). True, but it's not a lot of extra value over split(). If Python had this feature, I'd probably use it, but since it doesn't, I cannot in fairness ask somebody else to do the work on the basis that it is needed. I still think the existing solutions are Good Enough: - use split when you don't have space in any term: "fe fi fo fum".split() - use a list of manually split terms when you care about spaces: ['spam and eggs', 'cheese', 'tomato'] > I also have an alternate idea: sl{word1 word2 'string 3' "string 4"} Why "sl"? That looks like a set or a dict. Its bad enough that w-strings return a list, but to have "sl-sets" return a list is just weird :-) -- Steve From ncoghlan at gmail.com Wed Dec 7 00:24:20 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 7 Dec 2016 15:24:20 +1000 Subject: [Python-ideas] PEP: Distributing a Subset of the Standard Library In-Reply-To: <53acb4a9-052e-fad8-888e-897cac0d0356@redhat.com> References:

<6e27a05d-6a02-44f0-fa3f-4c14b9e1befc@redhat.com>

<176c5504-78a8-401d-9631-6b7126ac5af9@redhat.com> <6062460d-2cbe-63cf-8937-a2051cfbfa8a@redhat.com> <53acb4a9-052e-fad8-888e-897cac0d0356@redhat.com> Message-ID: On 7 December 2016 at 02:50, Tomas Orsava wrote: > So using _sysconfigdata as inspiration, it would likely be possible to > provide a "sysconfig.get_missing_modules()" API that the default > sys.excepthook() could use to report that a particular import didn't > work because an optional standard library module hadn't been built. > > Quite interesting. And sysconfig.get_missing_modules() wouldn't even have to > be generated during the build process, because it would be called only when > the import has failed, at which point it is obvious Python was built without > said component (like _sqlite3). So do you see that as an acceptable > solution? Oh, I'd missed that - yes, the sysconfig API could potentially be something like `sysconfig.get_stdlib_modules()` and `sysconfig.get_optional_modules()` instead of specifically reporting which ones were missed by the build process. There'd still be some work around generating the manifests backing those APIs at build time (including getting them right for Windows as well), but it would make some other questions that are currently annoying to answer relatively straightforward (see http://stackoverflow.com/questions/6463918/how-can-i-get-a-list-of-all-the-python-standard-library-modules for more on that) > Do you prefer the one you suggested previously? The only strong preference I have around how this is implemented is that I don't want to add complex single-purpose runtime infrastructure for the task. For all of the other specifics, I think it makes sense to err on the side of "What will be easiest to maintain over time?" > Alternatively, can the contents of site.py be generated during the build > process? Because if some modules couldn't be built, a custom implementation > of sys.excepthook might be generated there with the data for the modules > that failed to be built. We don't really want site.py itself to be auto-generated (although it could be updated to use Argument Clinic selectively if we deemed that to be an appropriate thing to do), but there's no problem with generating either data modules or normal importable modules that get accessed from site.py. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From random832 at fastmail.com Wed Dec 7 01:34:06 2016 From: random832 at fastmail.com (Random832) Date: Wed, 07 Dec 2016 01:34:06 -0500 Subject: [Python-ideas] Proposal: Tuple of str with w'list of words' In-Reply-To: <22599.23821.553471.816507@turnbull.sk.tsukuba.ac.jp> References: <20161112180556.GP3365@ando.pearwood.info> <1481058084.3493918.810576489.342344B4@webmail.messagingengine.com> <22599.23821.553471.816507@turnbull.sk.tsukuba.ac.jp> Message-ID: <1481092446.3828676.811009777.3EAA483D@webmail.messagingengine.com> On Tue, Dec 6, 2016, at 19:51, Stephen J. Turnbull wrote: > Random832 writes: > > > Is there any particular objection to allowing the backslash-space escape > > (and for escapes that mean whitespace characters, such as \t, \x20, to > > not split, if you meant to imply that they do)? That would provide the > > extra push to this being beneficial over split(). > > You're suggesting that (1) most escapes would be processed after > splitting while (2) backslash-space (what about backslash-tab?) would > be treated as an escape during splitting? I don't understand what this "after splitting" you're talking about is. It would be a single pass through the characters of the token, with space alone meaning "eat all whitespace, next string" and space in backslash state meaning "next character of current string is space", just as "t" alone means "next character of current string is letter t" and t in backslash state means "next character of current string is space". I mean, even the idea that there would be a separate "splitting step" at all makes no sense to me, this implies building an "un-split string" as if the w weren't present, processing escapes as part of that, and then parsing the resulting string in a second pass, which is something we don't do for r"..." and *shouldn't* do for f"..." If you insist on consistency, backslash-space can mean space *everywhere* [once we've gotten through the deprecation cycle of backslash-unknown inserting a literal backslash], just like "\'" works fine despite double quotes not requiring it. As for backslash-tab, we already have \t. Maybe you'd like \s better for space. > > I also have an alternate idea: sl{word1 word2 'string 3' "string 4"} > > word1 and word2 are what perl would term "barewords"? Ie treated as > strings? The name "sl" was meant to evoke shlex (the syntax itself was also inspired by perl's qw{...} though perl doesn't provide any way of escaping whitespace). And I also meant this as a launching-off point for a general suggestion of word{ ... } as a readable syntax that doesn't collide with any currently valid constructs, for new kinds of literals (e.g. frozenset{a, b, c} and so on) So the result would be, more or less, the sequence that shlex.split('''word1 word2 'string 3' "string 4"''') gives. > -1 to w"", -1 to inconsistent interpretation of escapes, and -1 to a > completely new syntax. > > " ", "\x20", "\u0020", and "\U00000020" currently are different > representations of the same string, so it would be confusing if the > same notations meant different things in this context. "'" and "\x39" (etc) are representations of the same string, but '...\x39 doesn't act as an end quote. Unescaped whitespace within a w"" literal would be *syntax*, not *content*. (Whereas in a regular literal backslash is syntax but in a r'...' literal it's content) > Another syntax > plus overloading standard string notation with yet another semantics > (strings, rawstrings) doesn't seem like a win to me. > > As I accept the usual Pythonic aversion to mere abbreviations, I don't > see any benefit to these notations, except for the case where a list > just won't do, so you can avoid a call to tuple. We already have > three good ways to do this: > > wordlist = ["word1", "word2", "string 3", "string 4"] > wordlist = "word1,word2,string 3,string 4".split(",") > wordlist = open(word_per_line_file).readlines() > > and for maximum Unicode-conforming generality with compact notation: > > wordlist = "word1\UFFFFword2\UFFFFstring 3\UFFFFstring > 4".split("\UFFFF") You and I have very different definitions of the word "compact". In fact, this is *so obviously* non-compact that I find it hard to believe that you're being serious, but I don't think the joke's very funny if it's intended as one. > More seriously, in most use cases there will be ASCII control > characters that you could use, which most editors can enter (though > they might be visually unattractive in many editors, eg, \x0C). The point of using space is readability. (The point of returning a tuple is to avoid the disadvantage that the list returned by split must be built at runtime and can't be loaded as a constant, or perhaps turned into a frozenset constant by the optimizer in cases like "if x in w'foo bar baz':". From random832 at fastmail.com Wed Dec 7 01:44:29 2016 From: random832 at fastmail.com (Random832) Date: Wed, 07 Dec 2016 01:44:29 -0500 Subject: [Python-ideas] Proposal: Tuple of str with w'list of words' In-Reply-To: <20161207010345.GW3365@ando.pearwood.info> References: <20161112180556.GP3365@ando.pearwood.info> <1481058084.3493918.810576489.342344B4@webmail.messagingengine.com> <20161207010345.GW3365@ando.pearwood.info> Message-ID: <1481093069.3830650.811027345.6DCB6489@webmail.messagingengine.com> On Tue, Dec 6, 2016, at 20:03, Steven D'Aprano wrote: > > I also have an alternate idea: sl{word1 word2 'string 3' "string 4"} > > Why "sl"? Well, shlex was one of the inspirations. > That looks like a set or a dict. Its bad enough that w-strings return a > list, but to have "sl-sets" return a list is just weird :-) My idea was to have word{...} as a grand unifying solution for "we want a new kind of literal but can't think of a syntax for it that doesn't either look like grit on the screen or already means something", with this as one of the first examples. I think it's better than using word"..." for things that aren't strings. From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Dec 7 02:49:27 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 7 Dec 2016 16:49:27 +0900 Subject: [Python-ideas] Proposal: Tuple of str with w'list of words' In-Reply-To: <1481092446.3828676.811009777.3EAA483D@webmail.messagingengine.com> References: <20161112180556.GP3365@ando.pearwood.info> <1481058084.3493918.810576489.342344B4@webmail.messagingengine.com> <22599.23821.553471.816507@turnbull.sk.tsukuba.ac.jp> <1481092446.3828676.811009777.3EAA483D@webmail.messagingengine.com> Message-ID: <22599.48903.87287.318504@turnbull.sk.tsukuba.ac.jp> Random832 writes: > I don't understand what this "after splitting" you're talking about > is. It would be a single pass through the characters of the token, Which may as well be thought of as a string (not a str). Although you can implement this process in one pass, you can also think of it in terms of two passes that give the same result. I suspect many people will think in terms of two passes, and I certainly do. Steven d'Aprano appears to, as well (he also used the "before splitting" terminology). Of course, he may find "the implementation will be single pass" persuasive, even though I don't. > You and I have very different definitions of the word "compact". In > fact, this is *so obviously* non-compact I used \u notation to ensure that people would understand that the separator is a non-character. (Emacs allows me to enter it, and with my current font it displays an empty box. I could fiddle with my PYTHONIOENCODING to use some sort of escape error handler to make it convenient, but I won't use w"" anyway so the point is sort of moot.) > (The point of returning a tuple is to avoid the disadvantage that > the list returned by split must be built at runtime and can't be > loaded as a constant, or perhaps turned into a frozenset constant > by the optimizer in cases like "if x in w'foo bar baz':". That's true, but where's the use case where that optimization matters? From mal at egenix.com Wed Dec 7 03:33:00 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 7 Dec 2016 09:33:00 +0100 Subject: [Python-ideas] PEP: Distributing a Subset of the Standard Library In-Reply-To: References: <6e27a05d-6a02-44f0-fa3f-4c14b9e1befc@redhat.com>

<176c5504-78a8-401d-9631-6b7126ac5af9@redhat.com> <6062460d-2cbe-63cf-8937-a2051cfbfa8a@redhat.com> <53acb4a9-052e-fad8-888e-897cac0d0356@redhat.com> Message-ID: <5f1eea8d-dd17-9972-1865-f5c6d71d944a@egenix.com> I know that you started this thread focusing on the stdlib, but for the purpose of distributors, the scope goes far beyond just the stdlib. Basically any Python module or package which the distribution can provide should be usable as basis for a nice error message pointing to the package to install. Now, it's the distribution which knows which modules/packages are available, so we don't need a list of stdlib modules in Python to help with this. The helper function (whether called via sys.excepthook() or perhaps a new sys.importerrorhook()) would then check the imported module name against this list and write out the message pointing the user to the missing package. A list of stdlib modules may still be useful, but it comes with it's own set of problems, which should be irrelevant for this use case: some stdlib modules are optional and only available if the system provides (and Python can find) certain libs (or header files during compilation). For a distribution there are no optional stdlib modules, since the distributor will know the complete list of available modules in the distribution, including their external dependencies. In other words: Python already provides all the necessary logic to enable implementing the suggested use case. On 07.12.2016 06:24, Nick Coghlan wrote: > On 7 December 2016 at 02:50, Tomas Orsava wrote: >> So using _sysconfigdata as inspiration, it would likely be possible to >> provide a "sysconfig.get_missing_modules()" API that the default >> sys.excepthook() could use to report that a particular import didn't >> work because an optional standard library module hadn't been built. >> >> Quite interesting. And sysconfig.get_missing_modules() wouldn't even have to >> be generated during the build process, because it would be called only when >> the import has failed, at which point it is obvious Python was built without >> said component (like _sqlite3). So do you see that as an acceptable >> solution? > > Oh, I'd missed that - yes, the sysconfig API could potentially be > something like `sysconfig.get_stdlib_modules()` and > `sysconfig.get_optional_modules()` instead of specifically reporting > which ones were missed by the build process. There'd still be some > work around generating the manifests backing those APIs at build time > (including getting them right for Windows as well), but it would make > some other questions that are currently annoying to answer relatively > straightforward (see > http://stackoverflow.com/questions/6463918/how-can-i-get-a-list-of-all-the-python-standard-library-modules > for more on that) > >> Do you prefer the one you suggested previously? > > The only strong preference I have around how this is implemented is > that I don't want to add complex single-purpose runtime infrastructure > for the task. For all of the other specifics, I think it makes sense > to err on the side of "What will be easiest to maintain over time?" > >> Alternatively, can the contents of site.py be generated during the build >> process? Because if some modules couldn't be built, a custom implementation >> of sys.excepthook might be generated there with the data for the modules >> that failed to be built. > > We don't really want site.py itself to be auto-generated (although it > could be updated to use Argument Clinic selectively if we deemed that > to be an appropriate thing to do), but there's no problem with > generating either data modules or normal importable modules that get > accessed from site.py. > > Cheers, > Nick. > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 07 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From ncoghlan at gmail.com Wed Dec 7 07:57:41 2016 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 7 Dec 2016 22:57:41 +1000 Subject: [Python-ideas] PEP: Distributing a Subset of the Standard Library In-Reply-To: <5f1eea8d-dd17-9972-1865-f5c6d71d944a@egenix.com> References: <6e27a05d-6a02-44f0-fa3f-4c14b9e1befc@redhat.com>

<176c5504-78a8-401d-9631-6b7126ac5af9@redhat.com> <6062460d-2cbe-63cf-8937-a2051cfbfa8a@redhat.com> <53acb4a9-052e-fad8-888e-897cac0d0356@redhat.com> <5f1eea8d-dd17-9972-1865-f5c6d71d944a@egenix.com> Message-ID: On 7 December 2016 at 18:33, M.-A. Lemburg wrote: > I know that you started this thread focusing on the stdlib, > but for the purpose of distributors, the scope goes far > beyond just the stdlib. > > Basically any Python module or package which the distribution can > provide should be usable as basis for a nice error message pointing to > the package to install. The PEP draft covered two questions: - experienced redistributors breaking the standard library up into pieces - optional modules for folks building their own Python (even if they're new to that) > Now, it's the distribution which knows which modules/packages > are available, so we don't need a list of stdlib modules > in Python to help with this. Right, that's the case that we realised can be covered entirely by the suggestion "patch site.py to install a different default sys.excepthook()" > A list of stdlib modules may still be useful, but it comes > with it's own set of problems, which should be irrelevant > for this use case: some stdlib modules are optional and > only available if the system provides (and Python can find) > certain libs (or header files during compilation). While upstream changes turned out not to be necessary for the "distributor breaking up the standard library" use case, they may still prove worthwhile in making import errors more informative in the case of "I just built my own Python from upstream sources and didn't notice (or didn't read) the build message indicating that some modules weren't built". Given the precedent of the sysconfig metadata generation, providing some form of machine-readable build-time-generated module manifest should be pretty feasible if someone was motivated to implement it, and we already have the logic to track which optional modules weren't built in order to generate the message at the end of the build process. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Dec 7 13:22:08 2016 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 8 Dec 2016 03:22:08 +0900 Subject: [Python-ideas] PEP: Distributing a Subset of the Standard Library In-Reply-To: References: <6e27a05d-6a02-44f0-fa3f-4c14b9e1befc@redhat.com>

<176c5504-78a8-401d-9631-6b7126ac5af9@redhat.com> <6062460d-2cbe-63cf-8937-a2051cfbfa8a@redhat.com> <53acb4a9-052e-fad8-888e-897cac0d0356@redhat.com> <5f1eea8d-dd17-9972-1865-f5c6d71d944a@egenix.com> Message-ID: <22600.21328.501800.953514@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > While upstream changes turned out not to be necessary for the > "distributor breaking up the standard library" use case, they may > still prove worthwhile in making import errors more informative in the > case of "I just built my own Python from upstream sources and didn't > notice (or didn't read) the build message indicating that some modules > weren't built". This case-by-case line of argument gives me a really bad feeling. Do we have to play whack-a-mole with every obscure message that pops up that somebody might not be reading? OK, this is a pretty common and confusing case, but surely there's something more systematic (and flexible vs. turning every error message into a complete usage manual ... which tl;dr) we can do. One way to play would be an interactive checklist-based diagnostic module (ie, a "rule-based expert system") that could be plugged into IDEs or even into sys.excepthook. Given Python's excellent introspective facilities, with a little care the rule interpreter could be designed with access to namespaces to provide additional detail or tweak rule priority. We could even build in a learning engine to give priority to users' habitual bugs (including typical mistaken diagnoses). That said, I don't have time to work on it :-(, so feel free to ignore me. And I grant that since AFAIK we have zero existing code for the engine and rule database, it might be a good idea to do something for some particular obscure errors in the 3.7 timeframe. From mal at egenix.com Wed Dec 7 15:04:15 2016 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 7 Dec 2016 21:04:15 +0100 Subject: [Python-ideas] PEP: Distributing a Subset of the Standard Library In-Reply-To: References:

<176c5504-78a8-401d-9631-6b7126ac5af9@redhat.com> <6062460d-2cbe-63cf-8937-a2051cfbfa8a@redhat.com> <53acb4a9-052e-fad8-888e-897cac0d0356@redhat.com> <5f1eea8d-dd17-9972-1865-f5c6d71d944a@egenix.com> Message-ID: <2a1043f9-7c21-09e2-0990-0109a806f8d7@egenix.com> On 07.12.2016 13:57, Nick Coghlan wrote: > On 7 December 2016 at 18:33, M.-A. Lemburg wrote: >> I know that you started this thread focusing on the stdlib, >> but for the purpose of distributors, the scope goes far >> beyond just the stdlib. >> >> Basically any Python module or package which the distribution can >> provide should be usable as basis for a nice error message pointing to >> the package to install. > > The PEP draft covered two questions: > > - experienced redistributors breaking the standard library up into pieces > - optional modules for folks building their own Python (even if > they're new to that) > >> Now, it's the distribution which knows which modules/packages >> are available, so we don't need a list of stdlib modules >> in Python to help with this. > > Right, that's the case that we realised can be covered entirely by the > suggestion "patch site.py to install a different default > sys.excepthook()" > >> A list of stdlib modules may still be useful, but it comes >> with it's own set of problems, which should be irrelevant >> for this use case: some stdlib modules are optional and >> only available if the system provides (and Python can find) >> certain libs (or header files during compilation). > > While upstream changes turned out not to be necessary for the > "distributor breaking up the standard library" use case, they may > still prove worthwhile in making import errors more informative in the > case of "I just built my own Python from upstream sources and didn't > notice (or didn't read) the build message indicating that some modules > weren't built". > > Given the precedent of the sysconfig metadata generation, providing > some form of machine-readable build-time-generated module manifest > should be pretty feasible if someone was motivated to implement it, > and we already have the logic to track which optional modules weren't > built in order to generate the message at the end of the build > process. True, but the build process only covers C extensions. Writing the information somewhere for Python to pick up would be easy, though (just dump the .failed* lists somewhere). For pure Python modules, I suppose the install process could record all installed modules. Put all this info into a generated "_sysconfigstdlib" module, import this into sysconfig and you're set. Still, in all the years I've been using Python I never ran into a situation where I was interested in such information. For cases where a module is optional, you usually write a try...except and handle this on a case-by-case basis. That's safer than relying on some build time generated list, since the Python binary may well have been built on a different machine than the one the application is currently running on and so, even if an optional module is listed as built successfully, it may still fail to import. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 07 2016) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From mikhailwas at gmail.com Wed Dec 7 18:52:56 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 8 Dec 2016 00:52:56 +0100 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) Message-ID: In past discussion about inputing and printing characters, I was proposing decimal notation instead of hex. Since the discussion was lost in off-topic talks, I'll try to summarise my idea better. I use ASCII only for code input (there are good reasons for that). Here I'll use Python 3.6, and Windows 7, so I can use print() with unicode directly and it works now in system console. Suppose I only start programming and want to do some character manipulation. The vey first thing I would probably start with is a simple output for latin and cyrillic capital letters: caps_lat = "" for o in range(65, 91): caps_lat = caps_lat + chr(o) print (caps_lat) caps_cyr = "" for o in range(1040, 1072): caps_cyr = caps_cyr + chr(o) print (caps_cyr) Which prints: ABCDEFGHIJKLMNOPQRSTUVWXYZ ???????????????????????????????? Say, I want now to input something direct in code: s = "first cyrillic letters: " + chr(1040) + chr(1041) + chr(1042) Which works fine and has clean look. However it is not very convinient because of much typing and also, if I generate such strings, adds a bit more complexity. But in general it is fine, and I use this method currently. ========= Proposal: I would want to have a possibility to input it *by decimals*: s = "first cyrillic letters: \{1040}\{1041}\{1042}" or: s = "first cyrillic letters: \(1040)\(1041)\(1042)" ========= This is more compact and seems not very contradictive with current Python escape characters in string literals. So backslash is a start of some escaping in most cases. For me most important is that in such way I would avoid any presence of hex numbers in strings, which I find very good for readability and for me it is very convinient since I use decimals for processing everywhere (and encourage everyone to do so). So this is my proposal, any comments on this are appreciated. PS: Currently Python 3 supports these in addition to \x: (from https://docs.python.org/3/howto/unicode.html) """ If you can?t enter a particular character in your editor or want to keep the source code ASCII-only for some reason, you can also use escape sequences in string literals. >>> "\N{GREEK CAPITAL LETTER DELTA}" # Using the character name >>> "\u0394" # Using a 16-bit hex value >>> "\U00000394" # Using a 32-bit hex value """ So I have many possibilities and all of them strangely contradicts with my image of intuitive and readable. Well, using charater name is readable, but seriously not much of a practical solution for input, but could be very useful for printing description of a character. Mikhail From prometheus235 at gmail.com Wed Dec 7 19:13:31 2016 From: prometheus235 at gmail.com (Nick Timkovich) Date: Wed, 7 Dec 2016 18:13:31 -0600 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References: Message-ID: Out of curiosity, why do you prefer decimal values to refer to Unicode code points? Most references, http://unicode.org/charts/PDF/U0400.pdf (official) or https://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF , prefer to refer to them by hexadecimal as the planes and ranges are broken up by hex values. On Wed, Dec 7, 2016 at 5:52 PM, Mikhail V wrote: > In past discussion about inputing and printing characters, > I was proposing decimal notation instead of hex. > Since the discussion was lost in off-topic talks, I'll try to > summarise my idea better. > > I use ASCII only for code input (there are good reasons for that). > Here I'll use Python 3.6, and Windows 7, so I can use print() with unicode > directly and it works now in system console. > > Suppose I only start programming and want to do some character > manipulation. > The vey first thing I would probably start with is a simple output for > latin and cyrillic capital letters: > > caps_lat = "" > for o in range(65, 91): > caps_lat = caps_lat + chr(o) > print (caps_lat) > > caps_cyr = "" > for o in range(1040, 1072): > caps_cyr = caps_cyr + chr(o) > print (caps_cyr) > > > Which prints: > ABCDEFGHIJKLMNOPQRSTUVWXYZ > ???????????????????????????????? > > > Say, I want now to input something direct in code: > > s = "first cyrillic letters: " + chr(1040) + chr(1041) + chr(1042) > > Which works fine and has clean look. However it is not very convinient > because of much typing and also, if I generate such strings, > adds a bit more complexity. But in general it is fine, and I use this > method currently. > > ========= > Proposal: I would want to have a possibility to input it *by decimals*: > > s = "first cyrillic letters: \{1040}\{1041}\{1042}" > or: > s = "first cyrillic letters: \(1040)\(1041)\(1042)" > > ========= > > This is more compact and seems not very contradictive with > current Python escape characters in string literals. > So backslash is a start of some escaping in most cases. > > For me most important is that in such way I would avoid > any presence of hex numbers in strings, which I find very good > for readability and for me it is very convinient since I use decimals > for processing everywhere (and encourage everyone to do so). > > So this is my proposal, any comments on this are appreciated. > > > PS: > > Currently Python 3 supports these in addition to \x: > (from https://docs.python.org/3/howto/unicode.html) > """ > If you can?t enter a particular character in your editor or want to keep > the source code ASCII-only for some reason, you can also use escape > sequences in string literals. > > >>> "\N{GREEK CAPITAL LETTER DELTA}" # Using the character name > >>> "\u0394" # Using a 16-bit hex value > >>> "\U00000394" # Using a 32-bit hex value > > """ > So I have many possibilities and all of them strangely contradicts with > my image of intuitive and readable. Well, using charater name is readable, > but seriously not much of a practical solution for input, but could be > very useful > for printing description of a character. > > > Mikhail > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Wed Dec 7 19:22:59 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 8 Dec 2016 01:22:59 +0100 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References: Message-ID: On 8 December 2016 at 01:13, Nick Timkovich wrote: > Out of curiosity, why do you prefer decimal values to refer to Unicode code > points? Most references, http://unicode.org/charts/PDF/U0400.pdf (official) > or https://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF , > prefer to refer to them by hexadecimal as the planes and ranges are broken > up by hex values. Well, there was a huge discussion in October, see the subject name. Just didnt want it to go again in that direction. So in short hex notation not so readable and anyway decimal is kind of standard way to represent numbers and I treat string as a number array when I am processing it, so hex simply is redundant and not needed for me. Mikhail From ethan at stoneleaf.us Wed Dec 7 19:25:08 2016 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 07 Dec 2016 16:25:08 -0800 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References: Message-ID: <5848A864.6020005@stoneleaf.us> On 12/07/2016 03:52 PM, Mikhail V wrote: > In past discussion about inputing and printing characters, > I was proposing decimal notation instead of hex. > Since the discussion was lost in off-topic talks, I'll try to > summarise my idea better. While the discussion did range far and wide, one thing that was fairly constant is that the benefit of adding one more way to represent unicode characters is not worth the work involved to make it happen; and that using hexadecimal to reference unicode characters is nearly universal. To sum up: even if you wrote all the code yourself, it would not be accepted. -- ~Ethan~ From python at mrabarnett.plus.com Wed Dec 7 19:52:25 2016 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 8 Dec 2016 00:52:25 +0000 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References: Message-ID: On 2016-12-07 23:52, Mikhail V wrote: > In past discussion about inputing and printing characters, > I was proposing decimal notation instead of hex. > Since the discussion was lost in off-topic talks, I'll try to > summarise my idea better. > > I use ASCII only for code input (there are good reasons for that). > Here I'll use Python 3.6, and Windows 7, so I can use print() with unicode > directly and it works now in system console. > > Suppose I only start programming and want to do some character manipulation. > The vey first thing I would probably start with is a simple output for > latin and cyrillic capital letters: > > caps_lat = "" > for o in range(65, 91): > caps_lat = caps_lat + chr(o) > print (caps_lat) > > caps_cyr = "" > for o in range(1040, 1072): > caps_cyr = caps_cyr + chr(o) > print (caps_cyr) > > > Which prints: > ABCDEFGHIJKLMNOPQRSTUVWXYZ > ???????????????????????????????? > > > Say, I want now to input something direct in code: > > s = "first cyrillic letters: " + chr(1040) + chr(1041) + chr(1042) > > Which works fine and has clean look. However it is not very convinient > because of much typing and also, if I generate such strings, > adds a bit more complexity. But in general it is fine, and I use this > method currently. > > ========= > Proposal: I would want to have a possibility to input it *by decimals*: > > s = "first cyrillic letters: \{1040}\{1041}\{1042}" > or: > s = "first cyrillic letters: \(1040)\(1041)\(1042)" > > ========= > It's usually the case that escapes are \ followed by an ASCII-range letter or digit; \ followed by anything else makes it a literal, even if it's a metacharacter, e.g. " terminates a string that starts with ", but \" is a literal ", so I don't like \{...}. Perl doesn't have \u... or \U..., it has \x{...} instead, and Python already has \N{...}, so: s = "first cyrillic letters: \d{1040}\d{1041}\d{1042}" might be better, but I'm still -1 because hex is usual when referring to Unicode codepoints. From tjreedy at udel.edu Wed Dec 7 19:53:52 2016 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 7 Dec 2016 19:53:52 -0500 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References: Message-ID: On 12/7/2016 7:22 PM, Mikhail V wrote: > On 8 December 2016 at 01:13, Nick Timkovich wrote: >> Out of curiosity, why do you prefer decimal values to refer to Unicode code >> points? Most references, http://unicode.org/charts/PDF/U0400.pdf (official) >> or https://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF , >> prefer to refer to them by hexadecimal as the planes and ranges are broken >> up by hex values. > > Well, there was a huge discussion in October, see the subject name. > Just didnt want it to go again in that direction. > So in short hex notation not so readable and anyway decimal is > kind of standard way to represent numbers and I treat string as a number array > when I am processing it, so hex simply is redundant and not needed for me. I sympathize with your preference, but ... Perhap the hex numbers would bother you less if you thought of them as 'serial numbers'. It is standard for 'serial numbers' to include letters. It is also common for digit-letter serial numbers to have meaningful fields, as as do the hex versions of unicode serial numbers. The decimal versions are meaningless except as strict sequencers. -- Terry Jan Reedy From prometheus235 at gmail.com Wed Dec 7 19:57:50 2016 From: prometheus235 at gmail.com (Nick Timkovich) Date: Wed, 7 Dec 2016 18:57:50 -0600 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References: Message-ID: > > hex notation not so readable and anyway decimal is kind of standard way to > represent numbers Can you cite some examples of Unicode reference tables I can look up a decimal number in? They seem rare; perhaps in a list as a secondary column, but they're not organized/grouped decimally. Readability counts, and introducing a competing syntax will make it harder for others to read. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Wed Dec 7 21:07:54 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 8 Dec 2016 03:07:54 +0100 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References:

Message-ID: On 8 December 2016 at 01:57, Nick Timkovich wrote: >> hex notation not so readable and anyway decimal is kind of standard way to >> represent numbers > > > Can you cite some examples of Unicode reference tables I can look up a > decimal number in? They seem rare; perhaps in a list as a secondary column, > but they're not organized/grouped decimally. Readability counts, and > introducing a competing syntax will make it harder for others to read. There were links to such table in previos discussion. Googling "unicode table decimal" and first link will it be. I think most online tables include decimals as well, usually as tuples of 8-bit decimals. Also earlier the decimal code was the first column in most tables, but it somehow settled in peoples' minds that hex reference should be preferred, for no solid reason IMO. One reason I think due to HTML standards which started to use it in html files long ago and had much influence later, but one should understand, that is just for brevity in most cases. Other reason is, file viewers show hex by default, but that is just misfortune, nothin besides brevity and 4-bit word alignment gives the hex notation unfortunatly, at least in its current typeface. This was discussed actually in that thread. Many people also think they are cool hackers if they make everything in hex :) In some cases it is worth it, but not this case IMO. Mainly for bitwise stuff, but then one should look into binary/trinary/quaternary representation depending on nature of operations and hardware. Yes there is unicode table pagination correspondence in hex reference, but that hardly plays any positive role for real applications, most of the time I need to look in my code and also perform number operations on *specific* ranges and codes, but not on whole pages of the table. This could only play role if I do low-level filtering of large files and want to filter out data after character's page, but that is the only positive thing I can think of, and I don't think it is directly for Python. Imagine some cryptography exercise - you take 27 units, you just give them numbers (0..26) and you do calculations, yes you can view results as hex numbers, but I don't do it and most people don't and should not, since why? It is ugly and not readable. From mikhailwas at gmail.com Wed Dec 7 21:15:06 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 8 Dec 2016 03:15:06 +0100 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References:

Message-ID: On 8 December 2016 at 01:52, MRAB wrote: > On 2016-12-07 23:52, Mikhail V wrote: ... >> ========= >> Proposal: I would want to have a possibility to input it *by decimals*: >> >> s = "first cyrillic letters: \{1040}\{1041}\{1042}" >> or: >> s = "first cyrillic letters: \(1040)\(1041)\(1042)" >> >> ========= >> > It's usually the case that escapes are \ followed by an ASCII-range letter > or digit; \ followed by anything else makes it a literal, even if it's a > metacharacter, e.g. " terminates a string that starts with ", but \" is a > literal ", so I don't like \{...}. > > Perl doesn't have \u... or \U..., it has \x{...} instead, and Python already > has \N{...}, so: > > s = "first cyrillic letters: \d{1040}\d{1041}\d{1042}" > > might be better, I like this and I agree this corresponds the current style better . > but I'm still -1 because hex is usual when referring to > Unicode codepoints. :-( From boekewurm at gmail.com Wed Dec 7 21:32:20 2016 From: boekewurm at gmail.com (Matthias welp) Date: Thu, 8 Dec 2016 03:32:20 +0100 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References:

Message-ID: Dear Mikhail, With python3.6 you can use format strings to get very close to your desired behaviour: f"{48:c}" == "0" f"{:c}" == chr() It works with variables too: charvalue = 48 f"{charcvalue:c}" == chr(charvalue) # == "0" This is only 1 character overhead + 1 character extra per char formatted compared to your example. And as an extra you can use hex strings (f"{0x30:c}" == "0") and any other integer literal you might want. I don't see the added value of making character escapes in a non-default way only (chars escaped + 1) bytes shorter, with the added maintenance and development cost. I think that you can do a lot with f-strings, and using the built-in formatting options you can already get the behaviour you want in Python 3.6, months earlier than the next opportunity (Python 3.7). Check out the formatting options for integers and other built-in types here: https://docs.python.org/3.6/library/string.html#format-specification-mini-language I hope this helps solve your apparent usability problem. -Matthias On 8 December 2016 at 03:07, Mikhail V wrote: > On 8 December 2016 at 01:57, Nick Timkovich wrote: >>> hex notation not so readable and anyway decimal is kind of standard way to >>> represent numbers >> >> >> Can you cite some examples of Unicode reference tables I can look up a >> decimal number in? They seem rare; perhaps in a list as a secondary column, >> but they're not organized/grouped decimally. Readability counts, and >> introducing a competing syntax will make it harder for others to read. > > There were links to such table in previos discussion. Googling > "unicode table decimal" and > first link will it be. > I think most online tables include decimals as well, usually as tuples > of 8-bit decimals. > Also earlier the decimal code was the first column in most tables, but > it somehow settled in > peoples' minds that hex reference should be preferred, for no solid reason IMO. > One reason I think due to HTML standards which started to use it in html files > long ago and had much influence later, but one should understand, > that is just for brevity in most cases. Other reason is, file viewers > show hex by > default, but that is just misfortune, nothin besides brevity and 4-bit > word alignment > gives the hex notation unfortunatly, at least in its current typeface. > This was discussed actually in that thread. > Many people also think they are cool hackers if they make everything in hex :) > In some cases it is worth it, but not this case IMO. Mainly for > bitwise stuff, but > then one should look into binary/trinary/quaternary representation > depending on nature > of operations and hardware. > > Yes there is unicode table pagination correspondence in hex reference, > but that hardly plays > any positive role for real applications, most of the time I need to > look in my code > and also perform number operations on *specific* ranges and codes, but not > on whole pages of the table. This could only play role if I do > low-level filtering of large files > and want to filter out data after character's page, but that is the > only positive thing > I can think of, and I don't think it is directly for Python. > > Imagine some cryptography exercise - you take 27 units, you just give > them numbers (0..26) > and you do calculations, yes you can view results as hex numbers, but > I don't do it and most people > don't and should not, since why? It is ugly and not readable. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From alexander.belopolsky at gmail.com Wed Dec 7 21:36:48 2016 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 7 Dec 2016 21:36:48 -0500 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References:

Message-ID: On Wed, Dec 7, 2016 at 9:07 PM, Mikhail V wrote: > > it somehow settled in > peoples' minds that hex reference should be preferred, for no solid reason IMO. I may be showing my age, but all the facts that I remember about ASCII codes are in hex: 1. SPACE is 0x20 followed by punctuation symbols. 2. Decimal digits start at 0x30 with '0' = 0x30, '1' = 0x31, ... 3. @ is 0x40 followed by upper-case letter: 'A' = 0x41, 'B' = 0x42, ... 4. Lower-case letters are offset by 0x20 from the uppercase ones: 'a' = 0x61, 'b' = 0x62, ... Unicode is also organized around hexadecimal codes with various scripts positioned in sections that start at round hexadecimal numbers. For example Cyrillic is at 0x0400 through 0x4FF < http://unicode.org/charts/PDF/U0400.pdf>. The only decimal fact I remember about Unicode is that the largest code-point is 1114111 - a palindrome! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikhailwas at gmail.com Wed Dec 7 22:06:06 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 8 Dec 2016 04:06:06 +0100 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References:

Message-ID: On 8 December 2016 at 03:36, Alexander Belopolsky wrote: > > On Wed, Dec 7, 2016 at 9:07 PM, Mikhail V wrote: >> >> it somehow settled in >> peoples' minds that hex reference should be preferred, for no solid reason >> IMO. > > I may be showing my age, but all the facts that I remember about ASCII codes > are in hex: > > 1. SPACE is 0x20 followed by punctuation symbols. > 2. Decimal digits start at 0x30 with '0' = 0x30, '1' = 0x31, ... > 3. @ is 0x40 followed by upper-case letter: 'A' = 0x41, 'B' = 0x42, ... > 4. Lower-case letters are offset by 0x20 from the uppercase ones: 'a' = > 0x61, 'b' = 0x62, ... > > Unicode is also organized around hexadecimal codes with various scripts > positioned in sections that start at round hexadecimal numbers. For example > Cyrillic is at 0x0400 through 0x4FF > . > > The only decimal fact I remember about Unicode is that the largest > code-point is 1114111 - a palindrome! As an aside, I've just noticed that in my example: s = "first cyrillic letters: \{1040}\{1041}\{1042}" s = "first cyrillic letters: \u0410\u0411\u0412" the hex and decimal codes are made up of same digits, such a peculiar coincidence... So you were catched up from the beginning with hex, as I see ;) I on the contrary in dark times of learning programming (that was C) always oriented myself on decimal codes and don't regret it now. From mikhailwas at gmail.com Wed Dec 7 22:45:51 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 8 Dec 2016 04:45:51 +0100 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References:

Message-ID: On 8 December 2016 at 03:32, Matthias welp wrote: > Dear Mikhail, > > With python3.6 you can use format strings to get very close to your > desired behaviour: > > f"{48:c}" == "0" > f"{:c}" == chr() > > It works with variables too: > > charvalue = 48 > f"{charcvalue:c}" == chr(charvalue) # == "0" > Waaa! This works! > > I hope this helps solve your apparent usability problem. Big big thanks, I didn't now this feature, but I have googled alot about "input characters as decimals" , so it is just added? Another evidence that Python rules! I'll rewrite some code, hope it'll have no side issues. Mikhail From jcgoble3 at gmail.com Wed Dec 7 22:57:52 2016 From: jcgoble3 at gmail.com (Jonathan Goble) Date: Wed, 7 Dec 2016 22:57:52 -0500 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References:

Message-ID: On Wed, Dec 7, 2016 at 10:45 PM, Mikhail V wrote: > Big big thanks, I didn't now this feature, but I have googled alot > about "input characters as decimals" , so it is just added? > Another evidence that Python rules! Yes, f-strings are a new feature in Python 3.6, which is currently in the release candidate stage. The final release of 3.6.0 (and thus the first stable release with this feature) is scheduled for December 16. From random832 at fastmail.com Wed Dec 7 23:39:42 2016 From: random832 at fastmail.com (Random832) Date: Wed, 07 Dec 2016 23:39:42 -0500 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References:

Message-ID: <1481171982.1720302.812217169.039D7550@webmail.messagingengine.com> On Wed, Dec 7, 2016, at 22:06, Mikhail V wrote: > So you were catched up from the beginning with hex, as I see ;) > I on the contrary in dark times of learning programming > (that was C) always oriented myself on decimal codes > and don't regret it now. C doesn't support decimal in string literals either, only octal and hex (incidentally octal seems to have been much more common in the environments where C was first invented). I can think of one context where decimal is used for characters, actually, now that I think about it. ANSI/ISO standards for 8-bit character sets often use a 'split' decimal format (i.e. DEL = 7/15 rather than 0x7F or 127.) From mikhailwas at gmail.com Thu Dec 8 00:06:38 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 8 Dec 2016 06:06:38 +0100 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: <1481171982.1720302.812217169.039D7550@webmail.messagingengine.com> References:

<1481171982.1720302.812217169.039D7550@webmail.messagingengine.com> Message-ID: On 8 December 2016 at 05:39, Random832 wrote: > On Wed, Dec 7, 2016, at 22:06, Mikhail V wrote: >> So you were catched up from the beginning with hex, as I see ;) >> I on the contrary in dark times of learning programming >> (that was C) always oriented myself on decimal codes >> and don't regret it now. > > C doesn't support decimal in string literals either, only octal and hex > (incidentally octal seems to have been much more common in the > environments where C was first invented). I can think of one context > where decimal is used for characters, actually, now that I think about > it. ANSI/ISO standards for 8-bit character sets often use a 'split' > decimal format (i.e. DEL = 7/15 rather than 0x7F or 127.) That is true, it does not support decimals in string literals, but I don't remember (it was more than 10 years ago) that I used anything but decimals for text processing in C. So normally load a file in memory, iterate over bytes, compare the value, and so on. And somewhat very foggy in my memory, but at that time most ASCII tables included decimals and they stood normally in the first column, but I can be wrong now, got to google some original tables. Jeez, how positive came this thread out, first Ethan said it will be never implemented, and it turns out it has already been implemented. Christmas magic. From greg.ewing at canterbury.ac.nz Thu Dec 8 00:52:21 2016 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 08 Dec 2016 18:52:21 +1300 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References:

<1481171982.1720302.812217169.039D7550@webmail.messagingengine.com> Message-ID: <5848F515.2080104@canterbury.ac.nz> Mikhail V wrote: > first Ethan said > it will be never implemented, and it turns out it has already > been implemented. Only by accident -- I don't think anyone anticipated that f-strings would be used that way! -- Greg From p.f.moore at gmail.com Thu Dec 8 04:00:55 2016 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 8 Dec 2016 09:00:55 +0000 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References: Message-ID: On 7 December 2016 at 23:52, Mikhail V wrote: > Proposal: I would want to have a possibility to input it *by decimals*: > > s = "first cyrillic letters: \{1040}\{1041}\{1042}" > or: > s = "first cyrillic letters: \(1040)\(1041)\(1042)" > > ========= > > This is more compact and seems not very contradictive with > current Python escape characters in string literals. > So backslash is a start of some escaping in most cases. > > For me most important is that in such way I would avoid > any presence of hex numbers in strings, which I find very good > for readability and for me it is very convinient since I use decimals > for processing everywhere (and encourage everyone to do so). > > So this is my proposal, any comments on this are appreciated. -1. We already have plenty of ways to specify characters in strings[1], we don't need another. If readability is what matters to you, and you (unlike many others) consider hex to be unreadable, use the \N{...} form. Paul [1] Including (ab)using f-strings to hide the use of chr(). From victor.stinner at gmail.com Thu Dec 8 05:27:48 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 8 Dec 2016 11:27:48 +0100 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References: Message-ID: FYI you can also get a character by its name: >>> import unicodedata >>> unicodedata.name(chr(1040)) 'CYRILLIC CAPITAL LETTER A' >>> "\N{CYRILLIC CAPITAL LETTER A}" '?' Victor 2016-12-08 0:52 GMT+01:00 Mikhail V : > In past discussion about inputing and printing characters, > I was proposing decimal notation instead of hex. > Since the discussion was lost in off-topic talks, I'll try to > summarise my idea better. > > I use ASCII only for code input (there are good reasons for that). > Here I'll use Python 3.6, and Windows 7, so I can use print() with unicode > directly and it works now in system console. > > Suppose I only start programming and want to do some character manipulation. > The vey first thing I would probably start with is a simple output for > latin and cyrillic capital letters: > > caps_lat = "" > for o in range(65, 91): > caps_lat = caps_lat + chr(o) > print (caps_lat) > > caps_cyr = "" > for o in range(1040, 1072): > caps_cyr = caps_cyr + chr(o) > print (caps_cyr) > > > Which prints: > ABCDEFGHIJKLMNOPQRSTUVWXYZ > ???????????????????????????????? > > > Say, I want now to input something direct in code: > > s = "first cyrillic letters: " + chr(1040) + chr(1041) + chr(1042) > > Which works fine and has clean look. However it is not very convinient > because of much typing and also, if I generate such strings, > adds a bit more complexity. But in general it is fine, and I use this > method currently. > > ========= > Proposal: I would want to have a possibility to input it *by decimals*: > > s = "first cyrillic letters: \{1040}\{1041}\{1042}" > or: > s = "first cyrillic letters: \(1040)\(1041)\(1042)" > > ========= > > This is more compact and seems not very contradictive with > current Python escape characters in string literals. > So backslash is a start of some escaping in most cases. > > For me most important is that in such way I would avoid > any presence of hex numbers in strings, which I find very good > for readability and for me it is very convinient since I use decimals > for processing everywhere (and encourage everyone to do so). > > So this is my proposal, any comments on this are appreciated. > > > PS: > > Currently Python 3 supports these in addition to \x: > (from https://docs.python.org/3/howto/unicode.html) > """ > If you can?t enter a particular character in your editor or want to keep > the source code ASCII-only for some reason, you can also use escape > sequences in string literals. > >>>> "\N{GREEK CAPITAL LETTER DELTA}" # Using the character name >>>> "\u0394" # Using a 16-bit hex value >>>> "\U00000394" # Using a 32-bit hex value > > """ > So I have many possibilities and all of them strangely contradicts with > my image of intuitive and readable. Well, using charater name is readable, > but seriously not much of a practical solution for input, but could be > very useful > for printing description of a character. > > > Mikhail > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abrault at mapgears.com Thu Dec 8 09:46:36 2016 From: abrault at mapgears.com (Alexandre Brault) Date: Thu, 8 Dec 2016 09:46:36 -0500 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References:

Message-ID: <2d86531d-9297-a55a-f24f-cb111a153bf6@mapgears.com> On 2016-12-07 09:07 PM, Mikhail V wrote: > On 8 December 2016 at 01:57, Nick Timkovich wrote: >>> hex notation not so readable and anyway decimal is kind of standard way to >>> represent numbers >> >> Can you cite some examples of Unicode reference tables I can look up a >> decimal number in? They seem rare; perhaps in a list as a secondary column, >> but they're not organized/grouped decimally. Readability counts, and >> introducing a competing syntax will make it harder for others to read. > There were links to such table in previos discussion. Googling > "unicode table decimal" and > first link will it be. > I think most online tables include decimals as well, usually as tuples > of 8-bit decimals. The fact that you need to specify "unicode table *decimal*" in your search, and that even then around half of the top results give the table in hex, to me illustrates quite well how much of a minority opinion "writing unicode characters in decimal is more logical" is From mikhailwas at gmail.com Thu Dec 8 11:06:39 2016 From: mikhailwas at gmail.com (Mikhail V) Date: Thu, 8 Dec 2016 17:06:39 +0100 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: <2d86531d-9297-a55a-f24f-cb111a153bf6@mapgears.com> References:

<2d86531d-9297-a55a-f24f-cb111a153bf6@mapgears.com> Message-ID: On 8 December 2016 at 15:46, Alexandre Brault wrote: >>> Can you cite some examples of Unicode reference tables I can look up a >>> decimal number in? They seem rare; perhaps in a list as a secondary column, >>> but they're not organized/grouped decimally. Readability counts, and >>> introducing a competing syntax will make it harder for others to read. >> There were links to such table in previos discussion. Googling >> "unicode table decimal" and >> first link will it be. >> I think most online tables include decimals as well, usually as tuples >> of 8-bit decimals. > The fact that you need to specify "unicode table *decimal*" in your > search, and that even then around half of the top results give the table > in hex, to me illustrates quite well how much of a minority opinion > "writing unicode characters in decimal is more logical" is No I don't need to specify "unicode table *decimal*". Results for "unicode table" in google: Top Result # 2: www.utf8-chartable.de/ Top Result # 4: http://www.tamasoft.co.jp/en/general-info/index.html Some sites does not provide any code conversion, but everybody can do it easily, also I don't have problems generating a table programmatically. And I hope it is clear why most people stick to hex (I never argued that BTW), but it is mostly historical, nothing to do with "logical". There is just tendency to repeat what majority does and not always it is good, this case would be an example. From vgr255 at live.ca Thu Dec 8 11:32:02 2016 From: vgr255 at live.ca (Emanuel Barry) Date: Thu, 8 Dec 2016 16:32:02 +0000 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References:

<2d86531d-9297-a55a-f24f-cb111a153bf6@mapgears.com> Message-ID: > From: Mikhail V > Sent: Thursday, December 08, 2016 11:07 AM > Subject: Re: [Python-ideas] Input characters in strings by decimals (Was: > Proposal for default character representation) > No I don't need to specify "unicode table *decimal*". > > Results for "unicode table" in google: > > Top Result # 2: > www.utf8-chartable.de/ > > Top Result # 4: > http://www.tamasoft.co.jp/en/general-info/index.html Except that both of these websites show you hexadecimal notation. > And I hope it is clear why most people stick to hex (I never argued that BTW), > but it is mostly historical, nothing to do with "logical". That's not true. Characters are sorted by ranges. For example, I know that everything below 0x20 is control code, uppercase ASCII letters start at 0x41 (0x40 is '@') and lowercase ASCII letters start at 0x61 (where 0x60 is '`') - trivial to remember. I also know that ASCII goes as high as half a byte, or 0x7f (half of 0x100). For instance, the first letter of my name is 0xc9, and anyone can know, at a glance and without knowing my name or what the letter is, that it's not ASCII. Also, as far as I know, lowercase letters (ASCII or not) begin some multiple of 0x10 after the beginning of the uppercase letters (0x20 for ASCII or latin-1). As such, since I know that '?' is 0xc9, I can know, without even looking, that 0xe9 is '?'. That would be a lot trickier in decimal to remember and get right. As an aside, and I don't know this by heart, various sets of characters begin at fixed points, and knowing those points (when you need to work with specific sets of characters) can be very useful. If you look at a website (https://unicode-table.com/ seems good), you can even select ranges of characters, which conveniently end up being multiples of 0x10 (or 16 in decimal). If your point is "it's easier to work with numbers ending with 0", then you'll be pleased to know that character sets are actually designed so that, using hexadecimal notation, you're dealing with numbers ending with 0! Doing this using decimal notation is clunky at best. Yours, \xc9manuel From rosuav at gmail.com Thu Dec 8 11:52:23 2016 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 9 Dec 2016 03:52:23 +1100 Subject: [Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation) In-Reply-To: References: