Page 2 of 2

Re: Everipedia 2.0 coming over like a supernova

Posted: Wed Aug 21, 2019 11:11 pm
by ericbarbour
Abd wrote:What is resource intensive is a full article dump, which contains the exact text of every revision.

Sir, you have a gift for understatement.....a full dump of all the SQL/XML data is available, but adds up to terabytes of info. I have NO idea how to reassemble this. (Don't bother with the HTML dumps, the last update was in 2008.)

They do no one any favors by offering "partial dumps" that are currently being processed.
https://dumps.wikimedia.org/backup-index-bydb.html

The most recent complete dump is only available on mirrors (that I can see), dates from August 1, and damn is it big.
https://ftp.acc.umu.se/mirror/wikimedia ... /20190801/

Re: Everipedia 2.0 coming over like a supernova

Posted: Thu Aug 22, 2019 9:47 am
by Abd
ericbarbour wrote:
Abd wrote:What is resource intensive is a full article dump, which contains the exact text of every revision.

Sir, you have a gift for understatement.....a full dump of all the SQL/XML data is available, but adds up to terabytes of info. I have NO idea how to reassemble this. (Don't bother with the HTML dumps, the last update was in 2008.)

They do no one any favors by offering "partial dumps" that are currently being processed.
https://dumps.wikimedia.org/backup-index-bydb.html

The most recent complete dump is only available on mirrors (that I can see), dates from August 1, and damn is it big.
https://ftp.acc.umu.se/mirror/wikimedia ... /20190801/


Yeah. I recovered the content from Wikiversity, deleted by that rampaging 'crat, using full WV dumps. To extract pages of interest, I needed to write a utility that would parse the text, line by line, and extract them. PITA. I could find no utility that allows page recovery from the full site dumps. One might think. . . .