Corpus Factory Method | Sketch Engine (original) (raw)

This page contains information about a corpus building method that is no longer used by Sketch Engine but Sketch Engine still contains older corpora built using this method. They are mainly the WaC corpora . Nowadays, Sketch Engine builds corpora using the method used for TenTen corpora and described here.


A method for developing large general language corpora which can be applied to many languages.

Corpus Factory performs the following steps to collect a corpus of a language


Bibliography

Adam Kilgarriff, Siva Reddy, Jan Pomikálek and Avinesh PVS (2010). A corpus factory for many languages. In LREC workshop on Web Services and Processing Pipelines, Malta, May 2010.