Wikipedia - conversion of in SQL database stored data to HTML

Claudio Grondi claudio.grondi at freenet.de
Mon Mar 21 17:49:05 EST 2005


<http://tinyurl.com/692pt> redirects (if not just down) to
http://en.wikipedia.org/wiki/Wikipedia:Database_download#Static_HTML_tree_dumps_for_mirroring_or_CD_distribution

I see from this page only one tool (not a couple) which is available
to download and use:

http://www.tommasoconforti.com/ the home of Wiki2static

Wiki2static (version 0.61, 02th Aug 2004)
http://www.tommasoconforti.com/wiki/wiki2static.tar.gz
is a Perl script to convert a Wikipedia SQL dump
into an html tree suitable for offline browsing or CD distribution.

I failed to find any documentation, so was forced to play
directly with the script settings myself:

  $main_prefix = "u:/WikiMedia-Static-HTML/";
  $wiki_language = "pl";

and running (in the current directory of the script):
\> wiki2static.pl Q:\WikiMedia-MySQL-Dump\pl\20040727_cur_table.sql
to test the script on a file with small (112 MByte)
size of the SQL dump .

The script is running now for over half an hour
and has created yet 1.555 folders and
generated 527 files with a total size of  6 MBytes
consuming only 16 seconds of CPU time.
I estimate the time until the script is ready to appr.
6 hours for a 100 MByte file, which gives 120 hours
for a 2 GByte file of the english dump ...

Any further hints? What am I doing wrong?

(There are now 1.627 folders and 1.307 files with
a total size of 15.6 MB after one hour runtime and
consumption of 20 seconds CPU time even if
I increased the priority of the process to high
on my W2K box running perl 5.8.3 half an hour
ago)

Claudio
P.S.
>> I loaded all of the Wikipedia data into a local MySQL server a while
>> back without any problems.
What was the size of the dump file imported to
the MySQL database? Importing only the current
version which was "a while back" smaller
than 2 GByte (skipping the history dump)
causes no problems with MySQL.

"Leif K-Brooks" <eurleif at ecritters.biz> schrieb im Newsbeitrag
news:3a8hlmF68iogqU1 at individual.net...
> Claudio Grondi wrote:
> > Is there an already available script/tool able to extract records and
> > generate proper HTML code out of the data stored in the Wikipedia SQL
> > data base?
>
> They're not in Python, but there are a couple of tools available here:
> <http://tinyurl.com/692pt>.
>
> > By the way: has someone succeeded in installation of a local
> > Wikipedia server?
>
> I loaded all of the Wikipedia data into a local MySQL server a while
> back without any problems. I haven't attempted to run Mediawiki on top
> of that, but I don't see why that wouldn't work.





More information about the Python-list mailing list