GitHub - Docma-TU/tosca: Tools for Statistical Content Analysis (original) (raw)
tosca
Tools for Statistical Content Analysis
created at TU Dortmund University.
About
tosca
is a framework for statistical methods in content analysis. We offer a pipeline for preprocessing, model text corpora using a link to the implemantation of Latent Dirichlet Allocation from the lda
package. Useful plot routines for both - pre- and post-modeled corpora - are given for the descriptive analysis of text corpora and topic models. Moreover, an implementation of Chang's intruder words and intruder topics is provided; as well as reasoned sampling of text ids to get effective sets of texts for human labeling/coding regarding accuracy of estimating Precision and Recall.
Installation
See examples how to use tosca
at the Vignette.
Citation
For a BibTeX entry please use citation(package = "tosca")
.
Contribution
This R package is licensed under the GPLv3. For wishes, issues, and bugs please use the issue tracker.