Why does the Wayback Machine Only Seem to Archive Wikipedia?

You can talk about anything related to Wikipedia criticism here.
Post Reply
User avatar
RichardMan
Sucks Noob
Posts: 1
Joined: Wed May 05, 2021 4:28 pm

Why does the Wayback Machine Only Seem to Archive Wikipedia?

Post by RichardMan » Wed May 05, 2021 4:36 pm

The Wayback Machine only ever seems to archive Wikipedia but for every other website they only archive a few measly pages. Even when websites are going to be taken down and urgently need saving they only get partially archived. Yet Wikipedia gets obsessively archived down to the most obscure sub pages and files. If Geocities, Yahoo Groups, Yahoo Answers and various Modding sites etc. were archived like Wikipedia is then not a single page of them would have been lost. But no those are not Wikipedia so they're not important and do not get the same privileges. Don't tell me this is because it's a wikia so it has an export function because there are a lot of wikis that are now only partially archived or even lost forever that I am sure could have been downloaded in full.

User avatar
ericbarbour
Sucks Admin
Posts: 4547
Joined: Sat Feb 25, 2017 1:56 am
Location: The ass-tral plane
Has thanked: 1099 times
Been thanked: 1797 times

Re: Why does the Wayback Machine Only Seem to Archive Wikipedia?

Post by ericbarbour » Thu May 06, 2021 1:13 am

I've posted this before but here we go again. Archive.org does NOT take snaps of all Wikipedia pages. In fact, it leaves a vast bulk of it out of the database--especially pre-2010.

From the book wiki
Founded in 1996 by Internet champion Brewster Kahle, the Internet Archive and its associated Wayback Machine are the largest nonprofit archive of noncopyrighted information and captures of websites.

Sadly, the Wayback Machine is useless for studying Wikipedia's history, because of Wikimedia's early adoption of the "Nofollow" tag on all its websites, preventing search engines (and the Internet Archive) from crawling and taking full snapshots of their wikis. As of 2017, Wayback takes dozens of snapshots of article content of wikipedia.org every year but does not save histories, old revisions, or some administrative pages. See Brion Vibber.

Needless to say, Wikipedians scrape information from the Archive routinely. As posted on this thread on Reddit's r/wikiinaction, September 2017:

"Why are you surprised? Would you like to know how much Brewster Kahle likes Wikipedia, and has been aping their management "style"?"

"Sure."

"1. He was listed in the notorious leaked list of major WMF donors from 2011. "-50917,"Kahle, Brewster",Brewster,Kahle,Individual " He was listed right next to "almost a Who's Who of wealth and power in America: Warren Buffett, Richard Branson, Amazon founder Jeff Bezos, former U.S. president Jimmy Carter, the principals of Google as well as their "captive organization" the Tides Foundation, Michael Dell, Craigslist, Yahoo co-founders Jerry Yang and David Filo, Apple co-founder Steve Wozniak, Flickr co-founder Caterina Fake, George Soros, former Wikia CEO Gil Penchina, Facebook principals Mark Zuckerberg and Sean Parker, Yahoo CEO Marissa Mayer, gamer hero Gabe Newell, venture capitalist Vinod Khosla and his wife, Paul Hewson (aka "Bono"), and various co-founders of Sun Microsystems, plus a number of other venture capitalists and tech industry players. And Kate Garvey, who married Jimbo a few months later. This list helps to explain why Wales and his insider gang feel invulnerable."

"2. His WP article is VERY carefully manicured and protected. It has the usual litany of "single-purpose accounts" and IP-address editing I usually associate with paid or other COI editing. (https://en.wikipedia.org/wiki/Special:C ... .goldsmith used to work for Kahle at the Internet Archive...and this is anyone's guess: https://en.wikipedia.org/wiki/User:TBMforeverNowhere) The Internet Archive article shows similar patterns. (https://en.wikipedia.org/w/index.php?ti ... on=history) and so does this (https://en.wikipedia.org/w/index.php?ti ... on=history).

"3. there are 30 pics of him on Commons at present: (https://commons.wikimedia.org/wiki/Cate ... ster_Kahle) and 192 pics of the Archive's HQ: (https://commons.wikimedia.org/wiki/Cate ... adquarters) and plenty more.

"4. He has spoken at Wikimania: https://wikimania2006.wikimedia.org/wik ... ster_Kahle

"5. His Internet Archive and Wikimedia have been cooperating for years: (https://blog.wikimedia.org/2016/10/26/i ... ken-links/) (https://archive.org/details/wikimediado ... &tab=about)"
Whups, now there are 146 photos of Kahle on Wikimedia Commons.

User avatar
ihatewaggawagga
Sucks
Posts: 16
Joined: Mon Apr 19, 2021 3:59 am
Has thanked: 4 times

Re: Why does the Wayback Machine Only Seem to Archive Wikipedia?

Post by ihatewaggawagga » Thu May 06, 2021 10:08 am

I feel the same

Post Reply