[Python-Dev] Split unicodeobject.c into subfiles

Thu Oct 25 17:13:53 CEST 2012

On 10/24/2012 03:15 PM, Nick Coghlan wrote:
> Breaking such files up into separately compiled modules serves two 
> purposes:
>
> 1. It proves that the code *isn't* a tangled monolithic mess;
> 2. It enlists the compilation toolchain's assistance in ensuring that 
> remains the case in the future.
>

Either the code is a "tangled monolithic mess" or it isn't.  If it is, 
then let's fix that, regardless of the size of the file.  If it isn't, I 
don't see breaking up the code among multiple files as providing any 
benefit.  And I see no need for the toolchain's assistance to help us do 
something without benefit.  The line count of the file is essentially 
unrelated to its inherent quality / maintainability.

> We are not special snow flakes - good software engineering practice is 
> advisable for us as well, so a big +1 from me for breaking up the 
> monstrosity that is unicodeobject.c and lowering the barrier to entry 
> for hacking on the individual pieces. This should come with a large 
> block comment in unicodeobject.c explaining how the pieces are put 
> back together again.
>

I'm all for good software engineering practice.  But can you cite 
objective reasons why large source files are provably bad?  Not "tangled 
monolithic messes", not poorly-factored code.  I agree that those are 
bad--but so far nobody has proposed that either of those is true about 
unicodeobject.c (unless you are implicitly doing so above), nor have 
they proposed credible remedies.  All I've seen is that unicodeobject.c 
is a large file, and some people want to break it up into smaller 
files.  I have yet to see anything but handwaving as justification.  For 
example, what is this barrier to entry you suggest exists to hacking on 
the str object, that will apparently be dispelled simply by splitting 
one file into multiple files?

Someone proposed breaking up unicodeobject.c into three distinct 
subsystems and putting those in separate files.  I still don't agree.  
It seems natural to me to have everything associated with the str object 
in one file, just as we do with every other object I can think of.  If 
this were a genuinely good idea, we should consider doing it with every 
similar object.  But nobody is proposing that.  My guess is because the 
other files in CPython are "small enough".  At which point we're right 
back to the primary motivation simply being the line count of 
unicodeobject.c, as a purely aesthetic and subjective judgment.

//arry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20121025/cd0ecdbd/attachment.html>