For WMF employee / slave nonsense, developer hijinks, and MediaWiki and related software screw-ups.
-
ericbarbour
- Sucks Admin
- Posts: 5272
- Joined: Sat Feb 25, 2017 1:56 am
- Location: The ass-tral plane
- Has thanked: 1427 times
- Been thanked: 2203 times
Post
by ericbarbour » Thu May 11, 2023 6:17 am
From my experience working with the Wiktionary HTML dumps I can say that the data quality
is quite poor: there are stale and missing entries
(
https://phabricator.wikimedia.org/T305407).
There are also entire namespaces excluded from the dumps, and more recently there have
been issues with the dumps not getting updated.
So it depends what kind of processing you need to do–in general I find the parsing to be
much easier, hopefully they'll manage to sort out the problems.
Ah ha ha ha ha ha
-
Bbb23sucks
- Sucker
- Posts: 1440
- Joined: Fri Jan 06, 2023 9:08 am
- Location: The Astral Plane
- Has thanked: 1497 times
- Been thanked: 311 times
Post
by Bbb23sucks » Thu May 11, 2023 6:18 am
ericbarbour wrote: ↑Thu May 11, 2023 6:17 am
From my experience working with the Wiktionary HTML dumps I can say that the data quality
is quite poor: there are stale and missing entries
(
https://phabricator.wikimedia.org/T305407).
There are also entire namespaces excluded from the dumps, and more recently there have
been issues with the dumps not getting updated.
So it depends what kind of processing you need to do–in general I find the parsing to be
much easier, hopefully they'll manage to sort out the problems.
Ah ha ha ha ha ha
As usual - STILL NOT FIXED after over a year!