Natural language processing

From David's Wiki
Revision as of 21:42, 13 November 2019 by David (talk | contribs) (→‎Transformer)
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

Natural language processing (NLP)

Classical NLP

The Classical NLP consists of creating a pipeline using processors to create annotations from text files.
Below is an example of a few processors.

  • Tokenization
    • Convert a paragraph of test or a file into an array of words.
  • Part-of-speech annotation
  • Named Entity Recognition

Machine Learning

Datasets and Challenges


The Stanford Question Answering Dataset. There are two versions of this dataset, 1.1 and 2.0.


Attention is all you need paper
A neural network architecture by Google. It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks.

Guides and explanations

Google Bert

Github Link Paper Blog Post
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A pretrained NLP neural network. Note the code is written in TensorFlow 1.


Apache OpenNLP
