A Python script to put CTAN into git (from DVDs)

Jakub Narebski jnareb at gmail.com
Mon Nov 7 16:50:23 EST 2011


The following message is a courtesy copy of an article
that has been posted to comp.text.tex as well.

Jonathan Fine <jfine at pytex.org> writes:
> On 06/11/11 20:28, Jakub Narebski wrote:
> 
> > Note that for gitPAN each "distribution" (usually but not always
> > corresponding to single Perl module) is in separate repository.
> > The dependencies are handled by CPAN / CPANPLUS / cpanm client
> > (i.e. during install).
> 
> Thank you for your interest, Jakub, and also for this information.
> With TeX there's a difficult which Perl, I think, does not have.  With
> TeX we process documents, which may demand specific versions of
> packages. LaTeX users are concerned that move on to a later version
> will cause documents to break.

How you can demand specific version of package?

In the "\usepackage[options]{packages}[version]" LaTeX command the
<version> argument specifies _minimal_ (oldest) version.  The same
is true for Perl "use Module VERSION LIST".
 
Nevertheless while with "use Module VERSION" / "use Module VERSION LIST"
you can request minimal version of Perl Module, the META build-time spec 
can include requirement of exact version of required package:

http://p3rl.org/CPAN::Meta::Spec

  Version Ranges
  ~~~~~~~~~~~~~~

  Some fields (prereq, optional_features) indicate the particular
  version(s) of some other module that may be required as a
  prerequisite. This section details the Version Range type used to
  provide this information.

  The simplest format for a Version Range is just the version number
  itself, e.g. 2.4. This means that *at least* version 2.4 must be
  present. To indicate that *any* version of a prerequisite is okay,
  even if the prerequisite doesn't define a version at all, use the
  version 0.

  Alternatively, a version range *may* use the operators < (less than),
  <= (less than or equal), > (greater than), >= (greater than or
  equal), == (equal), and != (not equal). For example, the
  specification < 2.0 means that any version of the prerequisite less
  than 2.0 is suitable.

  For more complicated situations, version specifications *may* be
  AND-ed together using commas. The specification >= 1.2, != 1.5, <
  2.0 indicates a version that must be *at least* 1.2, *less than* 2.0,
  and *not equal to* 1.5.

> > Putting all DVD (is it "TeX Live" DVD by the way?) into single
> > repository would put quite a bit of stress to git; it was created for
> > software development (although admittedly of large project like Linux
> > kernel), not 4GB+ trees.
> 
> I'm impressed by how well git manages it.  It took about 15 minutes to
> build the 4GB tree, and it was disk speed rather than CPU which was
> the bottleneck.

I still think that using modified contrib/fast-import/import-zips.py
(or import-tars.perl, or import-directories.perl) would be a better
solution here...
 
[...]
> We may be at cross purposes.  My first task is get the DVD tree into
> git, performing necessary transformations such as expanding zip files
> along the way.  Breaking the content into submodules can, I believe,
> be done afterwards.

'reposurgeon' might help there... or might not.  Same with git-subtree
tool.

But now I understand that you are just building tree objects, and
creating references to them (with implicit ordering given by names,
I guess).  This is to be a start of further work, isn't it?

> With DVDs from several years it could take several hours to load
> everything into git.  For myself, I'd like to do that once, more or
> less as a batch process, and then move on to the more interesting
> topics. Getting the DVD contents into git is already a significant
> piece of work.
> 
> Once done, I can them move on to what you're interested in, which is
> organising the material.  And I hope that others in the TeX community
> will get involved with that, because I'm not building this repository
> just for myself.

[...]

> > > In addition, many TeX users have a TeX DVD.  If they import it into a
> > > git repository (using for example my script) then the update from 2011
> > > to 2012 would require much less bandwidth.
> >
> > ???
> 
> A quick way to bring your TeX distribution up to date is to do a delta
> with a later distribution, and download the difference.  That's what
> git does, and it does it well.  So I'm keen to convert a TeX DVD into
> a git repository, and then differences can be downloaded.

Here perhaps you should take a look at git-based 'bup' backup system.

Anyway I am not sure if for git to be able to generate deltas well you
have to have DAG of commits, so Git can notice what you have and what
you have not.  Trees might be not enough here. (!)
 
> > Commit = tree + parent + metadata.
> 
> Actually, any number of parents, including none.  What metadata do I
> have to provide?  At this time nothing, I think, beyond that provided
> by the name of a reference (to the root of a tree).

Metadata = commit message (here you can e.g. put the official name of
DVD), author and committer info (name, email, date and time, timezone;
date and time you can get from mtime / creation time of DVD). 

[cut]
 
-- 
Jakub Narębski



More information about the Python-list mailing list