GitHub - ivan-bilan/The-NLP-Pandect: A comprehensive reference for all topics related to Natural Language Processing (original) (raw)

The-NLP-Pandect

This pandect (πανδέκτης is Ancient Greek for encyclopedia) was created to help you find almost anything related to Natural Language Processing that is available online.

NoteQuick legend on available resource types:

⭐ - open source project, usually a GitHub repository with its number of stars

📙 - resource you can read, usually a blog post or a paper

🗂️ - a collection of additional resources

🔱 - non-open source tool, framework or paid service

🎥️ - a resource you can watch

🎙️ - a resource you can listen to

Table of Contents

📇 Main Section 🗃️ Sub-sections Sample
NLP Resources Paper Summaries, Conference Summaries, NLP Datasets
NLP Podcasts NLP-only Podcasts, Podcasts with many NLP Episodes
NLP Newsletters -
NLP Meetups -
NLP YouTube Channels -
NLP Benchmarks General NLU, Question Answering, Multilingual
Research Resources Resource on Transformer Models, Distillation and Pruning, Automated Summarization
Industry Resources Best Practices for NLP Systems, MLOps for NLP
Speech Recognition General Resources, Text to Speech, Speech to Text, Datasets
Topic Modeling Blogs, Frameworks, Repositories and Projects
Keyword Extraction Text Rank, Rake, Other Approaches
Responsible NLP NLP and ML Interpretability, Ethics, Bias, and Equality in NLP, Adversarial Attacks for NLP
NLP Frameworks General Purpose, Data Augmentation, Machine Translation, Adversarial Attacks, Dialog Systems & Speech, Entity and String Matching, Non-English Frameworks, Text Annotation
Learning NLP Courses, Books, Tutorials
NLP Communities -
Other NLP Topics Tokenization, Data Augmentation, Named Entity Recognition, Error Correction, AutoML/AutoNLP, Text Generation

The-NLP-Resources

NoteSection keywords: paper summaries, compendium, awesome list

Compendiums and awesome lists on the topic of NLP:

NLP Conferences, Paper Summaries and Paper Compendiums:

Papers and Paper Summaries

Conference Summaries

NLP Progress and NLP Tasks:

NLP Datasets:

Word and Sentence embeddings:

Notebooks, Scripts and Repositories

Non-English resources and Compendiums

Pre-trained NLP models

NLP History

General

2020 Year in Review

The-NLP-Podcasts

🔙 Back to the Table of Contents

NLP-only podcasts

Many NLP episodes

Some NLP episodes

The-NLP-Newsletter

The-NLP-Meetups

The-NLP-Youtube

The-NLP-Benchmarks

🔙 Back to the Table of Contents

General NLU

Summarization

Question Answering

Multilingual and Non-English Benchmarks

Bio, Law, and other scientific domains

Transformer Efficiency

Speech Processing

Other

The-NLP-Research

🔙 Back to the Table of Contents

General

Embeddings

Repositories

Blogs

Cross-lingual Word and Sentence Embeddings

Byte Pair Encoding

Transformer-based Architectures

General

Transformer

BERT

Other Transformer Variants

T5

BigBird

Reformer / Linformer / Longformer / Performers

Switch Transformer

GPT-family

General

GPT-3

Learning Resources

Applications

Open-source Efforts

Other

Distillation, Pruning and Quantization

Reading Material

Tools

Automated Summarization

Knowledge Graphs and NLP

The-NLP-Industry

NoteSection keywords: best practices, MLOps

🔙 Back to the Table of Contents

Best Practices for building NLP Projects

MLOps for NLP

MLOps, especially when applied to NLP, is a set of best practices around automating various parts of the workflow when building and deploying NLP pipelines.

In general, MLOps for NLP includes having the following processes in place:

Additionally, there are two more components that are not as prevalent for NLP and are mostly used for Computer Vision and other sub-fields of AI:

MLOps Compilations & Awesome Lists

Reading Material

Learning Material

MLOps Communities

Data Versioning

Experiment Tracking

Model Registry

Automated Testing and Behavioral Testing

Model Deployability and Serving

Model Debugging

Model Accuracy Prediction

Data and Model Observability

General

Model Centric

Data Centric

Feature Stores

Metadata Management

MLOps Frameworks

Transformer-based Architectures

🔙 Back to the Table of Contents

General

Multi-GPU Transformers

Training Transformers Effectively

Embeddings as a Service

NLP Recipes Industrial Applications:

The-NLP-Speech

NoteSection keywords: speech recognition

🔙 Back to the Table of Contents

General Speech Recognition

Text to Speech / Speech Generation

Speech to Text

Datasets

The-NLP-Topics

NoteSection keywords: topic modeling

🔙 Back to the Table of Contents

Blogs

Frameworks for Topic Modeling

Repositories

Keyword-Extraction

NoteSection keywords: keyword extraction

🔙 Back to the Table of Contents

Text Rank

RAKE - Rapid Automatic Keyword Extraction

Other Approaches

Further Reading

Responsible-NLP

NoteSection keywords: ethics, responsible NLP

🔙 Back to the Table of Contents

NLP and ML Interpretability

NLP-centric

General

Ethics, Bias, and Equality in NLP

Adversarial Attacks for NLP

Hate Speech Analysis

The-NLP-Frameworks

NoteSection keywords: frameworks

🔙 Back to the Table of Contents

General Purpose

Data Augmentation

Adversarial NLP Attacks & Behavioral Testing

Transformer-oriented

Dialogue Systems and Speech

Word/Sentence-embeddings oriented

Social Media Oriented

Phonetics

Morphology

Multi-lingual tools

Distributed NLP / Multi-GPU NLP

Machine Translation

Entity and String Matching

Discourse Analysis

PII scrubbing

Hastag Segmentation

Non-English oriented

Japanese

Thai

Chinese

Ukrainian

Other

Text Data Labelling & Classification

The-NLP-Learning

NoteSection keywords: learn NLP

🔙 Back to the Table of Contents

General

Courses

Books

Tutorials

The-NLP-Communities

Other-NLP-Topics

🔙 Back to the Table of Contents

Tokenization

Data Augmentation and Weak Supervision

Libraries and Frameworks

Reading Material and Tutorials

Named Entity Recognition (NER)

Relation Extraction

Coreference Resolution

Sentiment Analysis

Domain Adaptation

Low Resource NLP

Spell Correction / Error Correction

Style Transfer for NLP

Automata Theory for NLP

Obscene words detection

Reddit Analysis

Skill Detection

Reinforcement Learning for NLP

AutoML / AutoNLP

OCR - Optical Character Recognition

Document AI

Text Generation

Title / Headlines Generation

NLP research reproducibility

License CC0

Attributions

Resources

Icons

Fonts


The Pandect Series also includes