Everipedia 2.0 coming over like a supernova

You can talk about anything related to Wikipedia criticism here.
User avatar
ericbarbour
Sucks Admin
Posts: 1954
Joined: Sat Feb 25, 2017 1:56 am
Has thanked: 69 times
Been thanked: 149 times

Re: Everipedia 2.0 coming over like a supernova

Post by ericbarbour » Wed Aug 21, 2019 11:11 pm

Abd wrote:What is resource intensive is a full article dump, which contains the exact text of every revision.

Sir, you have a gift for understatement.....a full dump of all the SQL/XML data is available, but adds up to terabytes of info. I have NO idea how to reassemble this. (Don't bother with the HTML dumps, the last update was in 2008.)

They do no one any favors by offering "partial dumps" that are currently being processed.
https://dumps.wikimedia.org/backup-index-bydb.html

The most recent complete dump is only available on mirrors (that I can see), dates from August 1, and damn is it big.
https://ftp.acc.umu.se/mirror/wikimedia ... /20190801/

User avatar
Abd
Sucks Warrior
Posts: 727
Joined: Mon Nov 27, 2017 11:22 pm
Has thanked: 70 times
Been thanked: 35 times

Re: Everipedia 2.0 coming over like a supernova

Post by Abd » Thu Aug 22, 2019 9:47 am

ericbarbour wrote:
Abd wrote:What is resource intensive is a full article dump, which contains the exact text of every revision.

Sir, you have a gift for understatement.....a full dump of all the SQL/XML data is available, but adds up to terabytes of info. I have NO idea how to reassemble this. (Don't bother with the HTML dumps, the last update was in 2008.)

They do no one any favors by offering "partial dumps" that are currently being processed.
https://dumps.wikimedia.org/backup-index-bydb.html

The most recent complete dump is only available on mirrors (that I can see), dates from August 1, and damn is it big.
https://ftp.acc.umu.se/mirror/wikimedia ... /20190801/


Yeah. I recovered the content from Wikiversity, deleted by that rampaging 'crat, using full WV dumps. To extract pages of interest, I needed to write a utility that would parse the text, line by line, and extract them. PITA. I could find no utility that allows page recovery from the full site dumps. One might think. . . .

Post Reply