Refactoring in a large code base

Fri Jan 22 07:34:37 EST 2016

On Fri, Jan 22, 2016 at 11:04 PM, Rustom Mody <rustompmody at gmail.com> wrote:
> On Friday, January 22, 2016 at 4:49:19 PM UTC+5:30, Chris Angelico wrote:
>> On Fri, Jan 22, 2016 at 9:19 PM, Marko Rauhamaa  wrote:
>> > The knowhow, vision and skill is apparently very rare. On the product
>> > management side, we have the famous case of Steve Jobs, who simply told
>> > the engineers to go back to the drawing boards when he didn't like the
>> > user experience. Most others would have simply surrendered to the
>> > mediocre designs and shipped the product.
>> >
>> > We need similar code sanity management. Developers are given much too
>> > much power to mess up the source code. That's why "legacy" is considered
>> > a four-letter word among developers.
>>
>> So what do you do with a huge program? Do you send it back to the
>> developers and say "Do this is less lines of code"?
>>
>> CPython is a large and complex program. How do you propose doing it "right"?
>
> Put thus 'generistically' this is a rhetorical question and makes Marko look like
> he's making a really foolish point
>
> Specifically, what little Ive seen under the CPython hood looked distinctly improvable. egs.
>
> 1. My suggestion to have the docs re. generator-function vs generator-objects
> cleaned up had no takers
> 2. My students trying to work inside the lexer made a mess because the extant lexer is a mess.
> I.e. while python(3) *claims* to accept Unicode input, the actual lexer is
> an ASCII lexer special-cased for unicode rather than pre-lexing utf8 to unicode
>
> These are just specific examples that I am familiar with

Yes, there are some parts of CPython that can be improved. That's true
of every large project (it's said that every program has at least one
bug and could be shortened by at least one instruction, from which it
can be deduced that every program can be reduced to a single
instruction that doesn't work).

Regarding lexers specifically, I have never seen any full-size
language parser that I've wanted to tinker with. They're always highly
optimized pieces of code, dealing with innumerable edge and corner
cases, and exploring them is always like dipping my toe into something
that's either ice-cold water or highly caustic acid, and I can't tell
which.

> Chris' general point still stands, viz take the large and complex program that is cpython
> and clean up these messinesses: You will still have a large and complex program

Right. You could definitely spin off *some* of CPython into a separate
project (flip through the standard library - quite a few of those
modules, if proposed for stdlib inclusion today, would be denied
"better on PyPI"), but my point isn't that it can't be improved, but
that there's an irreducible complexity to it that exceeds the "rewrite
in a quarter" mark by a huge margin.

ChrisA