unpacking TAR 1.14/ 1.15 archives on Windows (first step towards own static HTML version of Wikipedia)

Claudio Grondi claudio.grondi at freenet.de
Sat Aug 20 16:01:13 EDT 2005


Thank you both (Martin and Diez) for your help.

The 17 GByte TAR archive was unpacked
without problems the way you suggested.

Let's summarize:

# Python  tarfile  module can't extract files from
newer TAR archives (archived with tar 1.14 or later)

# The core of my problems was, that I was not aware
how easy it is to install and work with Cygwin and that to
get Cygwins  tar.exe  to work it is necessary to use the
provided Cygwin bash-3.00 shell an NOT the Windows
command shell (DOS-box) .

The receipt for unpacking Wikipedia media files
provided as TAR archives when using Microsoft
Windows is:

Step 1. download http://sources.redhat.com/cygwin/setup.exe

Step 2a. run the downloaded setup.exe which goes online and
  lets you choose which packages should be installed

Step 2b. select for the installation additional to suggested ones
  the  tar  package version 1.15

Step 3. use the Cygwin icon on the Desktop or in Start-Programs-
  Cygwin-Cygwin Bash Shell to start a Cygwin shell and type :

bash-3.00$
./bin/tar.exe  --extract  --directory=/cygdrive/i/wikipedia/en/media
   -f
/cygdrive/j/download.wikimedia.org/archives/images/en/20050530_upload.tar

where i: and j: are the drive letters of appropriate Windows drives.

The media files stored in TAR archive
j: \download.wikimedia.org\archives\images\en\20050530_upload.tar
will be unpacked to
  i:\wikipedia\en\media
directory

Step 4. wait, wait, wait ... (how long depends most on
speed of your  harddrives, on my system with USB drives
appr. one hour)

Step 5.  BE HAPPY  :))
  and enjoy it, because you have mastered a step towards
  your own static HTML version of Wikipedia.
  The problems with extracting the content from the MySQL
  database dumps will kill you soon - and if not, especially
  for non-english languages (like German, Polish, Russian)
  and with math-formulas converted to pictures,
  all done on a Windows system __PLEASE__ share your
  know-how with me !!!
  (the only useful information I found on Internet about it
  were postings within the thread I initiated a longer time
  ago myself
http://www.pythonforum.org/ftopic19424_Wikipedia___conversion_of_in_SQL_database_stored_data_to_HTM.html
)

Claudio

"Diez B. Roggisch" <deets at nospam.web.de> schrieb im Newsbeitrag
news:3mp16gF17vuggU1 at uni-berlin.de...
> Claudio Grondi wrote:
> > remember. I work in a Windows command shell
> > (DOS-box) and mount says:
> > j: on /cygdrive/j , but I don't know how to write
> > the entire path
> > "j:\o\archives\images\dump.tar",
> > so that the file can be found by tar.exe and
> > unpacked to "i:\images" .
> > tar.exe --extract --directory=tmp -f j:/o/archives/images/dump.tar
> > results in:
> > /usr/bin/tar: j\:/o/archives/images/dump.tar: Cannot open: Input/Output
> > error
> > telling
> > tar.exe --extract --directory=tmp -f
/cygdrive/j/o/archives/images/dump.tar
> > doesn't work either.
>
>
> Try the cygpath-command like this:
>
> echo `cygpath c:\\some\\windows\\path`
>
> That should yield
>
> /cygdrive/c/some/windows/path
>
> Alternatively, do somethink like this
>
> mkdir -p /mnt/j
>
> mount j: /mnt/j
>
> Then /mnt/j should be the root for all files under j:
>
> HTH Diez





More information about the Python-list mailing list