GitHub - hipster-philology/pie-models: Models for various tasks to use with Pie (original) (raw)

Pie Models

This repository contains pretrained models for Pie (A Framework for Joint Learning of Sequence Labeling Tasks).

More on Pie:https://github.com/emanjavacas/pie.

Find a model

Models are arranged by language. TODO: add a json documentation file per model.

German (de)

german-ren.model.tar: Lemmatizer pretrained on a subset of the Referenzkorpus Mittelniederdeutsch/Niederrheinisch: https://www.slm.uni-hamburg.de/ren.html

Spanish (es)

spanish-AnCora.model.tar: Lemmatizer pretrained on the AnCora corpus for Spanish (part of the Universal Dependencies)

Old French (fro)

french-Geste.model.tar: Lemmatizer pretrained on the Geste corpus

fro-poslemmes_cat-lemma-2019_01_22-02_34_11.tar: lemmatizer and POS-tagger trained on the Geste corpus, and other Old French data from the École des chartes.

Target task: lemma. 
Accuracy on test data
  lemma: 0.9383
  pos: 0.9473

fro-poslemmes_cat-lemma-2019_01_23-00_34_12: same as the previous one, but using pre-trained word embeddings from a large unlabelled corpus.

Target task: lemma. 
Accuracy on test data
  lemma: 0.9409
  pos: 0.9468

fro-poslemmes_cat-lemma-2019_01_24-00_05_57.tar: same as the previous one, but using convolutions (cnn) for the character embeddings.

Target task: lemma. 
Accuracy on test data
  lemma: 0.9462
  pos: 0.9509

model_fro_poslemmesmorph.tar: POS-tagger, lemmatizer and morphological analyzer trained on the Geste corpus

Latin (lat)

capitula.model.tar: Lemmatizer pretrained on a non-open source dataset of medieval latin

Turkish (tur)

turkish-IMST.model.tar: Lemmatizer pretrained on the IMST corpus for Turkish (part of the Universal Dependencies)

Example config file for training a lemmatizer

lemma.config.json is an example config file for training a lemmatizer to reasonable good accuracy.

PIE

Installation

For more information check the repo at , but in short:

virtualenv env -p python3.7 source env/bin/activate pip3 install -r requirements.txt