GitHub - js1010/cusim: Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA) (original) (raw)

CUSIM

License Build Status contributions welcome Documentation Status

Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)

Introduction

This project is to speed up various ML models (e.g. topic modeling, word embedding, etc) by CUDA. It would be nice to think of it as gensim's GPU version project. As a starting step, I implemented the most widely used word embedding model, the word2vec model, and the most representative topic model, the LDA (Latent Dirichlet Allocation) model.

Requirements

How to install

clone repo and submodules

git clone git@github.com:js1010/cusim.git && cd cusim && git submodule update --init

install requirements

pip install -r requirements.txt

generate proto

python -m grpc_tools.protoc --python_out cusim/ --proto_path cusim/proto/ config.proto

install

python setup.py install

How to use

Performance

attr 1 workers (gensim) 2 workers (gensim) 4 workers (gensim) 8 workers (gensim) NVIDIA T4 (cusim)
training time (sec) 892.596 544.212 310.727 226.472 16.162
pearson 0.487832 0.487696 0.482821 0.487136 0.492101
spearman 0.500846 0.506214 0.501048 0.506718 0.479468
attr 1 workers (gensim) 2 workers (gensim) 4 workers (gensim) 8 workers (gensim) NVIDIA T4 (cusim)
training time (sec) 586.545 340.489 220.804 146.23 33.9173
pearson 0.354448 0.353952 0.352398 0.352925 0.360436
spearman 0.369146 0.369365 0.370565 0.365822 0.355204
attr 1 workers (gensim) 2 workers (gensim) 4 workers (gensim) 8 workers (gensim) NVIDIA T4 (cusim)
training time (sec) 250.135 155.121 103.57 73.8073 6.20787
pearson 0.309651 0.321803 0.324854 0.314255 0.480298
spearman 0.294047 0.308723 0.318293 0.300591 0.480971
attr 1 workers (gensim) 2 workers (gensim) 4 workers (gensim) 8 workers (gensim) NVIDIA T4 (cusim)
training time (sec) 176.923 100.369 69.7829 49.9274 9.90391
pearson 0.18772 0.193152 0.204509 0.187924 0.368202
spearman 0.243975 0.24587 0.260531 0.237441 0.358042
attr gensim (8 vpus) cusim (NVIDIA T4)
training time (sec) 447.376 76.6972

Future tasks