Import order question

Tue Feb 18 20:57:34 EST 2014

On Wed, Feb 19, 2014 at 9:17 AM, Tim Chase
<python.list at tim.thechases.com> wrote:
> On 2014-02-19 08:49, Chris Angelico wrote:
>> > Are you telling me you're willing to search through a single
>> > file containing 3,734 lines of code (yes, Tkinter) looking
>> > for a method named "destroy" of a class named "OptionMenu"
>>
>> At my last job, I had a single C++ file of roughly 5K lines, and
>> it wasn't at all unmanageable. Probably wouldn't have been a
>> problem to have another order of magnitude on that. What sort of
>> wimpy text editor are you using that you can't find what you're
>> looking for in a single large file?
>
> Even the venerable "ed" handles files of those sizes without batting
> an eye.  I just opened a 2MB XML file (50k lines) in ed and jumped
> all around in it with no trouble at all.

It's not just about whether your editor can handle a file of that
size, of course [1], but how well they help you find your way around
the codebase.

But ultimately, you're going to have those KLOC somewhere. Whether
they're in one file, a small handful of files, or eight hundred
separate tiny .PHP files (UGH UGH UGH UGH! and yes that was a real
thing and yes I had to search through it wince wince!), you somehow
need to find the one line you want out of, let's say, 50K. That's a
fairly small project by a lot of standards. Somehow, you need to find
one particular function. It might be named beautifully, in which case
a grep-across-files will do the job, or it might not. Maybe you can
see, as a human, that one of the subbranches MUST contain what you're
looking for (one file, or one directory, or something); but that often
requires knowledge of the codebase, which you mightn't have. So
splitting into files doesn't solve anything, and it introduces its own
set of problems.

Just for reference, here are a few line counts. They're based on "find
-name \*.FILE_EXT|xargs wc -l" in the base directories of several
projects (using an appropriate extension, as listed).

Small projects:
Proprietary project at my last job, *.cpp: 4782 (one file)
Same project, *.pike: 6559
Gypsum (open source MUD client), *.pike: 5083

Medium size projects:
CPython/Python, *.c: 55329
alsa-lib, *.c: 91131
SciTE, *.cxx: 114785 (34876 + Scintilla's 79909)
Pike, *.pike: 138705

Large projects:
Pike, *.c: 298773
CPython/Lib, *.py: 318232

(See how much of Python is written in Python? Although I can't help
feeling that my figures are wrong somehow. And that figure is ignoring
the test/ subdirectory with another 267872 lines for a total of
586104.)

I've done "search across files" in all of these, bar alsa-lib which I
line-counted just because it happened to be there. (Incidentally, I'm
fully aware that some of those figures are unfair. overall, I'd say
Pike and Python are comparable-sized projects, but Python's more
heavily written in Python and Pike's more heavily written in C, partly
by using a precompiler that's somewhat along the lines of Cython.
Scintilla includes a whole bunch of individual language lexers, and
it's usually pretty clear that you don't need to look in any of those.
Etc, etc.) It's usually been easy enough to find what I want; in fact,
the few times when it _hasn't_ have generally turned out to be because
of bugs (something had "desrtuct" instead of "destruct" in its name
and I of course couldn't find it). SciTE's search-across-files isn't
the best, but it covers 99%+ of my use cases. For anything else, well,
there's always popping up a terminal and using primitives like grep,
find, and so on.

ChrisA

[1] But as Grant hinted, some do have issues.