Newbie question on code vetting

Thu May 4 23:26:00 EDT 2006

I have no deep connections to any open source projects.  I do however know
quite a few engineers.  Bear that in mind.

william.boquist at gte.net wrote:
> It seems to me that Open Source generally would be more pervasive if there
> was more transparency with respect to the practices observed within the
> projects.  What possible harm could there be in letting the world know how 
> decisions to incorporate code are reached? 

I don't think it's a question of transparency but effort.  Documenting the
processes takes time which many probably feel is better spent on functional
aspects of the project.  And for what benefit?  Are open source projects
more concerned about approval or quality?  Besdies, most commercial
products have zero transparency in their development processes and it
doesn't hinder their market acceptance.

> The goal of collaborative 
> development is to build a body of code with many minds that is better than
> the body of code that could be built by any subset of them. The same
> principle could be applied to identification of best practices for
> committers across projects. 

If those practices are identifiable and repeatable, then sure, maybe
projects could be more productive following a "best practices" approach. 
OTOH if successful projects function more as little fiefdoms run by 
benevolent dictators, the reasons for success may be too idiosyncratic and
happenstance to translate to other projects.  

Note I said more productive, as in allow the code to improve quicker.  My
impression is most engineers don't want to invest time on IP rules and hate
when legal shenanigans impinge on their development.  They're generally
good at providing attribution and avoiding intentional misuses, but won't
go out of their way to check sources.  Nor should they, as 1) that's not
their job and 2) it's a very difficult task (unless you know of some master
database containing all the world's code).  Hence IP verification "best
practices" are likely to be summarily ignored as a foolish waste of time.

Closed commercial projects fare no better, indeed they may be even worse
since the risk of getting caught is lower.  Google for the number of times
unlicensed GPLed code turned up in some commercial product.

> To me, being unable to reach an understanding
> of the practices is analogous to being unable to see and run the JUnit
> suites on a bunch of classes - being in the position of assuming that
> there is coverage, but not being able to understand how much or how
> thorough.

Transparency can be valuable to an outsider, but I don't see how most
projects would have the time, resources, or inclination to provide it.

> I think it is obvious that if every consumer of the code who has an
> interest in controlling risk has to reinvent the wheel, there will be a
> lot of effort wasted on redundant work. 

Sure, but this task is better handled by vendors like Red Hat than by
individual projects.  IP verification is no easy matter to handle, and
vendors have a financial incentive to perform the checks.  Vendors can also
offer indemnity, which individual projects can't.

> Why not have the project publish a 
> document that says "here are the practices by which we manage our code
> base - take it or leave it". Just as most licenses are variations on a few
> (GPL, LGPL, CPL, etc.), it seems to me that very quickly, a set of common
> management practices would evolve if most projects published, perhaps with
> a few variations.

I suspect part of the reason you have trouble finding answers about IP
issues is that there are none to give.  I doubt most projects follow
anything resembling a formal process for verifying sources.  It's more
likely left up to individual contributors, and probably runs along the
lines of "if it doesn't look obviously ripped-off, then it's ok".

> With regard to the issue of trust, how can I either trust or decide not to
> trust in an information vacuum?

By looking at indirect sources of evidence.  How much trust do others put in
this project?  (Google for one uses python heavily.)  How do most open
source projects function?  How do most engineers handle IP issues?  Can I
hire an auditor to sample a representative portion of the code base for IP
issues?  Has anyone else already done so?  How often are open source
projects accused of IP violations?  How serious are they and what are the
outcomes?  Do closed projects handle things any differently?  What
assurances do they really provide?

Risk analysis means there's risk.  Unknowns are inherent.  Work around them
the best you can.

> I may be splitting hairs, but my 
> understanding is that belief despite absence of evidence is faith, not
> trust. Trust is the result of observation, and I want to be able to
> observe.

Faith is a belief that something is true without evidence.  Trust is relying
on something/someone to perform a certain way.  Blindly believing a project
to be free of IP issues is faith.  Expecting the committers to take the same
reasonable precautions most engineers do is trust.  The first is foolhardy,
the second is rational, especially when you make your own safeguards
against that trust being violated.  Faith is binary, trust is by degree.

What you're asking isn't unreasonable, but it's also not within the scope of
most open source projects.  Nor should it be.