Related to Wikipedia in the broadest sense, both as input for the AI models and its own likely future.
https://www.ft.com/content/ae507468-7f5 ... a81c6bf4a5
AI model collapse
-
- Sucks Fan
- Posts: 248
- Joined: Thu Jun 27, 2024 5:19 pm
- Has thanked: 2 times
- Been thanked: 55 times
Re: AI model collapse
Looks like a paywal, though I suppose I could sign up for their trial. Tried archive but the paywall is still there.
The title seems dubious. Companies like google and facebook, NGOs like Wikipedia (despite what they might say), and government agencies like the NSA must have tremendous amounts of data, and for most of these companies it's probably their most valuable asset. There's no lack of data per se, but the public has access to very little of it and as such they are at a large disadvantage. Things like LMMs receive a lot of favorable press but I don't think they can or should replace a site like Wikipedia, even though Wikipedia is shite. They're a one-way, one-to-many form of communication. They simulate discourse; ideal for the propagandist but detrimental to the public interest. AI is dangerous, just not in the way one might expect.
A few notes, not having read the article, but in general:
- LMMs that cite sources will probably come about soon. I've seen this talking point before but it doesn't seem like there's any large technical obstacle.
- I get the sense that many search results point to websites generate by LLMS.
- It's becoming hard to talk with people on the internet and have a conversation.
The title seems dubious. Companies like google and facebook, NGOs like Wikipedia (despite what they might say), and government agencies like the NSA must have tremendous amounts of data, and for most of these companies it's probably their most valuable asset. There's no lack of data per se, but the public has access to very little of it and as such they are at a large disadvantage. Things like LMMs receive a lot of favorable press but I don't think they can or should replace a site like Wikipedia, even though Wikipedia is shite. They're a one-way, one-to-many form of communication. They simulate discourse; ideal for the propagandist but detrimental to the public interest. AI is dangerous, just not in the way one might expect.
A few notes, not having read the article, but in general:
- LMMs that cite sources will probably come about soon. I've seen this talking point before but it doesn't seem like there's any large technical obstacle.
- I get the sense that many search results point to websites generate by LLMS.
- It's becoming hard to talk with people on the internet and have a conversation.
-
- Sucks Mod
- Posts: 626
- Joined: Wed Jul 26, 2017 3:24 am
- Has thanked: 786 times
- Been thanked: 382 times
Re: AI model collapse
There ya go.
It's about what happen to AIs when AIs content is among their inputs.
But since you'd like to discuss something else.
The opposite, the developers of at least ChatGPTa dded a hack which instructs it to notname its sources. This is to obscure plagiarism and associated potential copyright claims. From their perspective this is one of the points, maybe even the main point, behind chatbots. It provides a layer of plausible deniability for the developers which would not exist if they just copy pasted the content they steal.
Some Wikipedia content is like this too.
-
- Sucks Fan
- Posts: 248
- Joined: Thu Jun 27, 2024 5:19 pm
- Has thanked: 2 times
- Been thanked: 55 times
Re: AI model collapse
That's not a hard problem, particularly for one of these organizations. I imagine they have a fair idea of what's original and what's LLM slop. LLMs do not write original material.
Whether or not it attempts to cite sources would depend largely (or perhaps entirely) on the training set (and perhaps a penalty term in the objective function). It probably wouldn't be hard to achieve either outcome. Your point seems like a good one though. The engineers and generally whoever is responsible for a given LLM can always use the "black box" characteristic as an excuse, presuming an artificial-neural-network-based model, which of course they are. The general public understand AI even less and have been primed with large quantities of scifi schlock, so it's even more believable from their perspective. The usefulness of LLMs is probably quite limited outside of advertising, surveillance, propaganda and other such deceit, and thus of little value to the general public. I suppose this is part of why I lost interest in applied AI. While it is interesting in a theoretical sense, layer-stacking and organizing datasets is essentially clerical work and gets very boring very fast, and the field is very over-saturated.
The opposite, the developers of at least ChatGPTa dded a hack which instructs it to notname its sources. This is to obscure plagiarism and associated potential copyright claims. From their perspective this is one of the points, maybe even the main point, behind chatbots. It provides a layer of plausible deniability for the developers which would not exist if they just copy pasted the content they steal.
Some Wikipedia content is like this too.
-
- Sucks Mod
- Posts: 626
- Joined: Wed Jul 26, 2017 3:24 am
- Has thanked: 786 times
- Been thanked: 382 times
Re: AI model collapse
Same story.
https://www.popsci.com/technology/ai-tr ... gibberish/
Here's the actual paper in Nature.
https://www.nature.com/articles/s41586-024-07566-y
https://www.popsci.com/technology/ai-tr ... gibberish/
Here's the actual paper in Nature.
https://www.nature.com/articles/s41586-024-07566-y
-
- Sucks Fan
- Posts: 248
- Joined: Thu Jun 27, 2024 5:19 pm
- Has thanked: 2 times
- Been thanked: 55 times
Re: AI model collapse
boredbird wrote: ↑Fri Jul 26, 2024 1:37 amSame story.
https://www.popsci.com/technology/ai-tr ... gibberish/
Here's the actual paper in Nature.
https://www.nature.com/articles/s41586-024-07566-y
More or less what I'd have expected. Thanks for the links though.