A Practical Approach to Language Complexity: A Wikipedia Case Study (original) (raw)
Figure 1
Word-level statistical analysis of Main and Simple.
Condition WB, as explained the Methods section. left: Zipf’s law for the Main (black) and Simple (red) samples. middle: Heaps’ law (same colors). The exponents are 0.72±0.01 (Main) and 0.69±0.01 (Simple). right: Comparing token frequencies in the two samples for 300 randomly selected words (“S” and “M” stand for Simple and Main respectively), the correlation coefficient is C = 0.985. All three diagrams show that the two samples have statistically almost the same vocabulary richness.