Natural language processing: Difference between revisions

From David's Wiki
Created page with "Natural language processing (NLP) ==Classical NLP== ==Machine Learning== ===Transformer=== ===Google Bert==="
 
No edit summary
Line 1: Line 1:
__FORCETOC__
Natural language processing (NLP)
Natural language processing (NLP)


==Classical NLP==
==Classical NLP==
The Classical NLP consists of creating a pipeline using processors to create annotation from text files.<br>
Below is an example of a few processors.<br>
* Tokenization
** Convert a paragraph of test or a file into an array of words.
* Part-of-speech annotation
* Named Entity Recognition
==Machine Learning==
==Machine Learning==
====Datasets and Challenges====
====SQuAD====
[https://rajpurkar.github.io/SQuAD-explorer/ Link]<br>
The Stanford Question Answering Dataset. There are two version of this dataset, 1.1 and 2.0.
===Transformer===
===Transformer===
[https://arxiv.org/abs/1706.03762 Attention is all you need paper]<br>
A neural network architecture by Google.
It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks.
===Google Bert===
===Google Bert===
[https://github.com/google-research/bert Github Link]
[https://arxiv.org/abs/1810.04805 Paper]
[https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html Blog Post]<br>
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding<br>
A pretrained NLP neural network.
Note the code is written in TensorFlow 1.
==Libraries==
===Apache OpenNLP===
[https://opennlp.apache.org/ Link]

Revision as of 16:31, 4 November 2019


Natural language processing (NLP)

Classical NLP

The Classical NLP consists of creating a pipeline using processors to create annotation from text files.
Below is an example of a few processors.

  • Tokenization
    • Convert a paragraph of test or a file into an array of words.
  • Part-of-speech annotation
  • Named Entity Recognition

Machine Learning

Datasets and Challenges

SQuAD

Link
The Stanford Question Answering Dataset. There are two version of this dataset, 1.1 and 2.0.

Transformer

Attention is all you need paper
A neural network architecture by Google. It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks.

Google Bert

Github Link Paper Blog Post
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A pretrained NLP neural network. Note the code is written in TensorFlow 1.

Libraries

Apache OpenNLP

Link