[Tutor] C++, Java, Python, metrics, etc. (was Re: Re: small program in Python and in C++)

Derrick 'dman' Hudson dman@dman.ddts.net
Wed, 17 Jul 2002 18:42:37 -0500


--NzB8fVQJ5HfG6fxh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Jul 16, 2002 at 06:11:54PM +0100, alan.gauld@bt.com wrote:
| Hi Dman,
|=20
| > I thought only ASM needed a line of comment per line of code ;-).
| > Whatever happened to self-documenting code?
|=20
| It's alive and well, but most corporate coding standards=20
| require extensive comment block headers at the top of=20
| each file as well as above each function.

Yeah, those are useful, particularly for the non-obvious functions.

| Thus a 5 line function may have a 15 line header on top!(see our=20
| local example appended at the end...)

15 lines!?  That's a rather low signal:noise ratio.  At least that is
outside the body of the function :-).

| > | In the real world many functions contain 200 plus lines of=20
| > | executable code(*) which means maybe 400-500 lines of text.
| >=20
| > I'm glad I'm not in the "real world" ;-).
|=20
| Sorry, that sounded a bit patronising. A better term might=20
| have been "industrial scale" software projects.=20
| eg Air traffic Control, Nuclear reactors, Telephone exchanges=20
| and the like. (Recall we were talking about C++ projects=20
| in this thread not Python!)

All of my C++ projects were in class.  Hence saying it isn't "real
world" is rather accurate.  Pretty much all the non-school stuff I did
was either for a small java shop (my first co-op) or a non-profit org.
where "if it works that's what counts" (there's only me and another
programmer/admin, and they're switching accounting systems too.  more
than enough work for the two of us and then some :-)).

| > Yeah, I can see 100 lines of
| >     stream << this->foo
|=20
| More usually 100 times:
|=20
|  if (this->foo) || (this->bar){
|       stream << this->foo;
|  }
|=20
| ie 300 lines!
|=20
| > attribute should have its own self-contained serialization function,
|=20
| Could be but for primitive types that makes the files even=20
| bigger and most attributes will be either integers, floats=20
| or strings - ie the things you can store in an RDBMS.
|=20
| > | time consuming and error prone. Jumping to the top of the function=20
| > | is a single keystroke!
| >=20
| > Which keystroke, btw?  It would be helpful if I knew it. =20
|=20
| > I know how to use ctags, but that's quite a few keystrokes=20
| > (depending on function names and polymorphic ussage)
|=20
| It shouldn't be if your ctags is C++ aware...

I didn't know about ctags back when I had my C++ projects, but I used
it for a large Java project.  If there was an abstract class defining
the method "meth", and 3 concrete subclasses, :tselect would list 4
occurences of the tag "meth".

| However ctags doesn't help for the current function,=20

True.

| but [[ should go to the start of a paragraph in text=20
| mode or the start of the current function in C/C++ mode.

Ahh, cool!

| I just tried it and it relies on you putting your opening=20
| brace in the leftmost collumn (Which I do normally so it=20
| works for me  :-) like so:

In vim type ':help [[' and at the bottom of the paragraph about
"section" is a series of maps to handle an indented brace.  I'll be
practicing with this one!

| > | Well written variable declarations should all be commented! :-)
| >=20
| > It shouldn't need a comment, except in the *few* (right? ;-))
| > not-so-obvious cases :-).
|=20
| Not so obvious is rarely the case in my experience. What the=20
| original programmer thinks is obvious very rarely is to somebody=20
| else later! (Of course the fact that modern compilers tend to=20
| allow more than 6 characters in variables helps ;-)

Yeah, I think the number of libraries around now would certainly have
overflowed the 6-character universe a long time ago.

| > | Only if reading the code sequentially, if you are jumping in=20
| > | at random - as is usually the case for maintenance programmers=20
| > THat makes sense.
|=20
| Just to elaborate. The normal mode of operation for a=20
| maintenance programmer with a hot fault is to load up=20
| the executable in the debugger with a core dump(if Unix or MVS)
| and do a stacktrace. They will then navigate up the stack=20
| which means they arrive in the upper level function=20
| wherever the call to the lower level one was made, not=20
| at the entry point..

Yeah, that's what I do when I try running a not-quite-mature app and
it crashes :-).  Then I can submit at least a halfway decent bug
report, and maybe fix or workaround the crash if it is simple enough.

| > | A crash in those circumstances is a good thing! And NULLS help=20
| > | achieve it!
| >=20
| > While I agree here, what if you got a 'NameError' exception instead of
| > 'AttributeError: None object has no member foo' _or_ if the operation
|=20
| We were talking C++ remember. A Name error will be caught at=20
| compile time in C++.

Not if you declared the name at the top of the function :-).  You
won't even get an "uninitialized variable" warning if you initialize
it to NULL.

| > you performed was perfectly valid with None/NULL but that was not the
| > value the variable should have had.
|=20
| There are obviously a few possible cases where this can happen=20
| but to be honest they are very rare. If you think abouit it=20
| most functions that accept a NULL do so only to indicate=20
| that a default value be used instead

Yeah, that is true.  I don't do enough large-scale C/C++ work.  I'll
be doing some more C soon than I have in a long time.

| > | I see the point of wasted resource but not sure how Java/Python=20
| > | helps here. If you initialize the variable to a dummy object you=20
| > Here's an example :
| >=20
| > // C++
| > class Foo
| > {
| > }
| >=20
| > void func()
| > {
| >     Foo local_foo ;
|=20
| Ah yes, non dynamic classes. I forgot about those in C++,=20
| I hardly ever use them. You are quite right, in those cases=20
| premature initialisation would waste resource.

Yeah, you trade off the convenience of stack-automatic memory
management for premature initialization (or declaring it later so that
the init isn't premature).

Another fun effect of is the ability to subvert the Singleton pattern.
Alex Martelli has explained on c.l.p how the copy constructor (which
the compiler automatically creates for you) can be used to copy a
singleton to a stack-allocated instance of the class.  Oops.  (just
one of his reasons for preferring the Flyweight-Proxy pattern as a
replacement for Singleton)

| > One way to avoid it is to use a pointer, and then allocate the object
| > using 'new' when you need it.  Then you must remember to 'delete' it
| > afterwards.  Also, supposedly, the heap is slower than the stack.
|=20
| I always use dymanic objects in C++ and the performance=20
| overhead is minimal. The loss of polymorphism in statically=20
| declared classes is much more serious IMHO!
|=20
| > (I also like python's ability to get rid of a local ('del') after
| > you're done with it.  It helps prevent you from erroneously re-using
| > the old data later on -- which I've done before!)
|=20
| Yes memory management is a perennial problem in C/C++, it's about
| the only good thing I have to say about Java :-)

Mmm hmm.  However, early JVMs (eg jdk 1.1.8) handled the gc so poorly
that it was effectively the same as just never freeing anything.

| > Surely if you can actually and accurately quantize the effect, then
| > you can make a better argument than aesthetics. =20
|=20
| Unfortunately there is a wealth of objective collected data about
| maintenance programming. Its always seem as a cost rather than=20
| revenue earner so the bean counters monitor it closely. Sadly=20
| the mainsatream software community seems to ignore it by and=20
| large! Possibly because it invariable originates in industry=20
| rather than  academia? No surely not... :-)

Of course not.  :-).  (actually, I don't know)  I do know that
comp.lang.python has periodic threads regarding precisely which coding
style is more effective, and they always seem to go 'round and 'round
in circles.

| > benefits of each method and choose the best one.  Since quantization
| > isn't a viable option, experience is the next best decider. =20
|=20
| I'm interested in why you think quantization is hard?=20
| Measuring time to fix, lines changed, code quality metrics=20
| etc is part and parcel of every maintenance teams work that=20
| I've ever worked with (erm, thats only 3, all in the same=20
| organisation BTW!!)

LOC depends on many factors that tend to be project-specific.  For
example, the programming language used, the libraries used (the
convenience of their APIs), and the design of the system all
contributed.  Not to mention the fact that "line" is not well-defined
(across the field).  If LOC is weighted too heavily (eg programmer
performance reviews) then the programmers will have a tendency to
skew it.

Code quality is also hard to measure quantitatively.  The measurements
I've seen (McCabe Cyclomatic Number and something else I forget)
didn't seem to be very effective.  They also seemed to require more
effort than writing the software in the first place!

The main thing I learned in my "Software Process and Product Metrics"
course is that software is hard to measure.  (it was also a boring
course and the prof. didn't help that at all)

I'm sure that a large enough organization (neither of the orgs I
worked for qualify) will have a defined process and metrics used to
keep the paperwork flowing, but I am skeptical as to their real
usefulness.

| > My comments here aren't intended as arguments, just clarification of
| > what I had said before and of the "other" perspective.  (and some of
| > it is meant to be a bit humorous)
|=20
| Me too, I must apologize if my earliuer post sounded a bit=20
| pompous, its an unfortunate tendency I have. They were meant=20
| entirely as my personal experience of maintaining large scale=20
| projects. (The first was 3,500,000 lines of C/C++, the second=20
| 500,000 lines of C++ and the 3rd 350,000 lines of COBOL)

I understood your comments to be based on your work experience,
whereas I don't have that.

The largest C++ project I've worked on had 4 developers with 3,000
lines hand-coded and 3,100 lines generated by glade--.  (where 1
semicolon =3D=3D 1 line, and that was an after-the-fact "I wonder how big
this thing is" measurement)  I haven't measured any of the other
projects I worked on, which were mainly java or "trivial".  I did work
on a few sizeable java projects, but most were so short lived that
they had no maintenance (school projects.  once you have a grade,
you're done.)  At my last job I worked on the most significant java
project I have worked on, but we didn't measure it for size.  At my
current job I have mostly been pulling things together, and also
making a couple of zope-based web apps.  They're not huge, though, and
it would be hard to measure LOC with the mix of python scripts, page
templates, and SQL.  An LOC measurement would also have no relation to
your projects since zope already provides lots and lots of glue for
the bits that I actually wrote.

| Hopefully there are some points of interest to the general=20
| Python community in here too, hidden amongst the rantings :-)

Yes, hopefully.

| That has to be filled in for every function regardless=20
| of how small... A similar but bigger template is filled=20
| in at the top of every file includeiong the full RCS log...

RCS?  Don't you mean CVS? ;-).  You don't put all your files in one
big directory and work on them one person at a time, now do you?
(I'm just picking on the limitations of RCS that CVS solves)

-D

--=20
Emacs is a nice operating system, it lacks a decent editor though
=20
http://dman.ddts.net/~dman/

--NzB8fVQJ5HfG6fxh
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAj02AOwACgkQO8l8XBKTpRQPlgCgn470iH/H7wpjC6eJB5Cic2/8
mU0AoJ5Bzsj77PADow9PjWjTU+q4L1uc
=LDfC
-----END PGP SIGNATURE-----

--NzB8fVQJ5HfG6fxh--