Natural language processing

From David's Wiki
Jump to navigation Jump to search

Natural language processing (NLP)

Classical NLP

The Classical NLP consists of creating a pipeline using processors to create annotations from text files.
Below is an example of a few processors.

  • Tokenization
    • Convert a paragraph of test or a file into an array of words.
  • Part-of-speech annotation
  • Named Entity Recognition

Machine Learning

Datasets and Challenges


The Stanford Question Answering Dataset. There are two versions of this dataset, 1.1 and 2.0.


Attention is all you need paper
A neural network architecture by Google. It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks.

Google Bert

Github Link Paper Blog Post
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A pretrained NLP neural network. Note the code is written in TensorFlow 1.


Apache OpenNLP