Re: Wikipedia - conversion of in SQL database stored data to HTML

From: Claudio Grondi (claudio.grondi_at_freenet.de)
Date: 03/21/05


Date: Mon, 21 Mar 2005 22:49:05 -0000


<http://tinyurl.com/692pt> redirects (if not just down) to
http://en.wikipedia.org/wiki/Wikipedia:Database_download#Static_HTML_tree_dumps_for_mirroring_or_CD_distribution

I see from this page only one tool (not a couple) which is available
to download and use:

http://www.tommasoconforti.com/ the home of Wiki2static

Wiki2static (version 0.61, 02th Aug 2004)
http://www.tommasoconforti.com/wiki/wiki2static.tar.gz
is a Perl script to convert a Wikipedia SQL dump
into an html tree suitable for offline browsing or CD distribution.

I failed to find any documentation, so was forced to play
directly with the script settings myself:

  $main_prefix = "u:/WikiMedia-Static-HTML/";
  $wiki_language = "pl";

and running (in the current directory of the script):
\> wiki2static.pl Q:\WikiMedia-MySQL-Dump\pl\20040727_cur_table.sql
to test the script on a file with small (112 MByte)
size of the SQL dump .

The script is running now for over half an hour
and has created yet 1.555 folders and
generated 527 files with a total size of 6 MBytes
consuming only 16 seconds of CPU time.
I estimate the time until the script is ready to appr.
6 hours for a 100 MByte file, which gives 120 hours
for a 2 GByte file of the english dump ...

Any further hints? What am I doing wrong?

(There are now 1.627 folders and 1.307 files with
a total size of 15.6 MB after one hour runtime and
consumption of 20 seconds CPU time even if
I increased the priority of the process to high
on my W2K box running perl 5.8.3 half an hour
ago)

Claudio
P.S.
>> I loaded all of the Wikipedia data into a local MySQL server a while
>> back without any problems.
What was the size of the dump file imported to
the MySQL database? Importing only the current
version which was "a while back" smaller
than 2 GByte (skipping the history dump)
causes no problems with MySQL.

"Leif K-Brooks" <eurleif@ecritters.biz> schrieb im Newsbeitrag
news:3a8hlmF68iogqU1@individual.net...
> Claudio Grondi wrote:
> > Is there an already available script/tool able to extract records and
> > generate proper HTML code out of the data stored in the Wikipedia SQL
> > data base?
>
> They're not in Python, but there are a couple of tools available here:
> <http://tinyurl.com/692pt>.
>
> > By the way: has someone succeeded in installation of a local
> > Wikipedia server?
>
> I loaded all of the Wikipedia data into a local MySQL server a while
> back without any problems. I haven't attempted to run Mediawiki on top
> of that, but I don't see why that wouldn't work.



Relevant Pages

  • [Summary] puzzle with cold mirror script
    ... puzzle with cold mirror script ... DUMP: Date of last level 0 dump: the epoch ... all the disk information and a listing of the script follows. ... /export/home/chris# format ...
    (SunManagers)
  • Re: different ip addresses at different locations
    ... > Neither locations have a DHCP in existence. ... script you write for them to change the IP appropriately. ... Do a GOOGLE search for NETSH for more information, ... netsh -c interface dump>settingssiteone.txt ...
    (microsoft.public.windowsxp.work_remotely)
  • Re: Need some small help on shell script - delete old files
    ... I am using your script. ... recent dump is always left un-compressed with no .gz extension. ... or use the "wait" command to see if it must delay the start of the ... The second dump has to be coming from an alternate CRON job or an AT ...
    (comp.unix.shell)
  • Re: emulate an end-of-media
    ... It does not grab the output of the script to count the data after- ... a pipe has a SIGPIPE signal. ... the multi-process dump design) would be to make dump treat a sigpipe ... And as I wrote, fixing this is desirable. ...
    (freebsd-hackers)
  • puzzle with cold mirror script
    ... This is the mirror script from BigAdmin modified just ... DUMP: Date of last level 0 dump: the epoch ... all the disk information and a listing of the script follows. ... /export/home/chris# format ...
    (SunManagers)