WikiWho (original) (raw)
WikiWho
A service that provides authorship attribution of Wikipedia articles
| Group: | Wiki Education Foundation |
|---|---|
| Team members: | MusikAnimal (WMF) • Sage (Wiki Ed) • TheresNoTime |
| Backlog: | Phabricator board |
WikiWho is a service providing authorship attribution using a content persistence algorithm. It was first developed by the Karlsruhe Institute of Technology and GESIS – Leibniz Institute for the Social Sciences. In August 2021, it was moved to Wikimedia Cloud Services infrastructure and is now maintained and under further development by Community Tech and the Wiki Education Foundation.
The core functionality of WikiWho involves parsing the complete set of all historical revisions of a wiki page in order to find out who wrote and/or removed and/or reinserted which exact text at token level at what revision. This means that for every token (such as a word), its individual addition, removal, and reintroduction history becomes available.
The original algorithm working behind the scenes is described in a WWW 2014 paper, along with an extensive evaluation resulting in 95% accuracy on fairly revision-rich articles. The current code version is available on GitHub.
In a nutshell, the approach divides each revision into hierarchically nested paragraph, sentence and token elements and tracks their appearance through the complete content graph it builds in this way over all revisions. It is implemented currently for wikitext, but can run on any kind of text in principle (although tokenization rules might have to be adapted).
Example of how the token metadata is generated
In this way, it becomes possible to track – for each single token – all original additions, deletions, re-insertions and re-deletions and in which revision they took place. Which in turn allows to infer the editor, timestamp, etc. of those revisions. Also, individual tokens retain a unique ID, making it possible to distinguish two tokens with identical strings in different text positions.
The WikiWho APIs and OpenAPI documentation may be found at https://wikiwho.wmcloud.org/.
This service can be thought of as an additional service on top of the core WikiWho data described above, available for the same languages. The same term descriptions as above apply.
Instead of annotated, tokenized wikitext, the WhoColor API delivers annotated HTML of a wiki article that can be read by a browser.
Annotations available per token (realized via <span>...</span> tag) include:
- The original revision and author
- The changes applied
- The "present" authors and their percentages of words originally written in the current revision
- The revision list with metadata
The WhoColor API is what powers the Who Wrote That? tool as well as features within the Wiki Education Dashboard and Programs and Events Dashboard.
Afrikaans Wikipedia (
af)Albanian Wikipedia (
sq)Alemannisch Wikipedia (
als)Arabic Wikipedia (
ar)Basque Wikipedia (
eu)Belarusian Wikipedia (
be)Bengali Wikipedia (
bn)Bosnian Wikipedia (
bs)Bulgarian Wikipedia (
bg)Chechen Wikipedia (
ce)Croatian Wikipedia (
hr)Czech Wikipedia (
cs)Danish Wikipedia (
da)Dutch Wikipedia (
nl)English Wikipedia (
en)Esperanto Wikipedia (
eo)Estonian Wikipedia (
et)Finnish Wikipedia (
fi)French Wikipedia (
fr)Galician Wikipedia (
gl)Georgian Wikipedia (
ka)German Wikipedia (
de)Greek Wikipedia (
el)Hebrew Wikipedia (
he)Hindi Wikipedia (
hi)Hungarian Wikipedia (
hu)Indonesian Wikipedia (
id)Italian Wikipedia (
it)Kazakh Wikipedia (
kk)Kurdish Wikipedia (
ku)Latvian Wikipedia (
lv)Lithuanian Wikipedia (
lt)Lower Sorbian Wikipedia (
dsb)Macedonian Wikipedia (
mk)Malay Wikipedia (
ms)Malayalam Wikipedia (
ml)Marathi Wikipedia (
mr)Nepali Wikipedia (
ne)Persian Wikipedia (
fa)Polish Wikipedia (
pl)Portuguese Wikipedia (
pt)Romanian Wikipedia (
ro)Russian Wikipedia (
ru)Serbian Wikipedia (
sr)Simple English Wikipedia (
simple)Slovak Wikipedia (
sk)Slovenian Wikipedia (
sl)Spanish Wikipedia (
es)Swahili Wikipedia (
sw)Swedish Wikipedia (
sv)Swedish Wikipedia (
sv)Tagalog Wikipedia (
tl)Tajik Wikipedia (
tg)Tamil Wikipedia (
ta)Telugu Wikipedia (
te)Thai Wikipedia (
th)Turkish Wikipedia (
tr)Ukrainian Wikipedia (
uk)Urdu Wikipedia (
ur)Uzbek Wikipedia (
uz)Venetian Wikipedia (
vec)Vietnamese Wikipedia (
vi)Welsh Wikipedia (
cy)