Natural language processing: Difference between revisions
Created page with "Natural language processing (NLP) ==Classical NLP== ==Machine Learning== ===Transformer=== ===Google Bert===" |
No edit summary |
||
Line 1: | Line 1: | ||
__FORCETOC__ | |||
Natural language processing (NLP) | Natural language processing (NLP) | ||
==Classical NLP== | ==Classical NLP== | ||
The Classical NLP consists of creating a pipeline using processors to create annotation from text files.<br> | |||
Below is an example of a few processors.<br> | |||
* Tokenization | |||
** Convert a paragraph of test or a file into an array of words. | |||
* Part-of-speech annotation | |||
* Named Entity Recognition | |||
==Machine Learning== | ==Machine Learning== | ||
====Datasets and Challenges==== | |||
====SQuAD==== | |||
[https://rajpurkar.github.io/SQuAD-explorer/ Link]<br> | |||
The Stanford Question Answering Dataset. There are two version of this dataset, 1.1 and 2.0. | |||
===Transformer=== | ===Transformer=== | ||
[https://arxiv.org/abs/1706.03762 Attention is all you need paper]<br> | |||
A neural network architecture by Google. | |||
It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks. | |||
===Google Bert=== | ===Google Bert=== | ||
[https://github.com/google-research/bert Github Link] | |||
[https://arxiv.org/abs/1810.04805 Paper] | |||
[https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html Blog Post]<br> | |||
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding<br> | |||
A pretrained NLP neural network. | |||
Note the code is written in TensorFlow 1. | |||
==Libraries== | |||
===Apache OpenNLP=== | |||
[https://opennlp.apache.org/ Link] |