Natural language processing: Difference between revisions
Created page with "Natural language processing (NLP) ==Classical NLP== ==Machine Learning== ===Transformer=== ===Google Bert===" |
|||
(7 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
__FORCETOC__ | |||
Natural language processing (NLP) | Natural language processing (NLP) | ||
==Classical NLP== | ==Classical NLP== | ||
The Classical NLP consists of creating a pipeline using processors to create annotations from text files.<br> | |||
Below is an example of a few processors.<br> | |||
* Tokenization | |||
** Convert a paragraph of test or a file into an array of words. | |||
* Part-of-speech annotation | |||
* Named Entity Recognition | |||
==Machine Learning== | ==Machine Learning== | ||
===Datasets and Challenges=== | |||
====SQuAD==== | |||
[https://rajpurkar.github.io/SQuAD-explorer/ Link]<br> | |||
The Stanford Question Answering Dataset. There are two versions of this dataset, 1.1 and 2.0. | |||
===Transformer=== | ===Transformer=== | ||
{{ main | Transformer (machine learning model)}} | |||
[https://arxiv.org/abs/1706.03762 Attention is all you need paper] | |||
A neural network architecture by Google which uses encoder-decoder attention and self-attention. | |||
It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks. | |||
However, it's computational complexity is quadratic in the number of input and output tokens due to attention. | |||
;Guides and explanations | |||
* [https://nlp.seas.harvard.edu/2018/04/03/attention.html The Annotated Transformer] | |||
* [https://www.youtube.com/watch?v=iDulhoQ2pro Youtube Video] | |||
===Google Bert=== | ===Google Bert=== | ||
{{main| BERT (language model)}} | |||
[https://github.com/google-research/bert Github Link] | |||
[https://arxiv.org/abs/1810.04805 Paper] | |||
[https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html Blog Post]<br> | |||
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding<br> | |||
A pretrained NLP neural network. | |||
Note the code is written in TensorFlow 1. | |||
====Albert==== | |||
[https://github.com/google-research/google-research/tree/master/albert Github]<br> | |||
;A Lite BERT for Self-supervised Learning of Language Representations | |||
This is a parameter reduction on Bert. | |||
==Libraries== | |||
===Apache OpenNLP=== | |||
[https://opennlp.apache.org/ Link] |
Latest revision as of 19:48, 15 January 2021
Natural language processing (NLP)
Classical NLP
The Classical NLP consists of creating a pipeline using processors to create annotations from text files.
Below is an example of a few processors.
- Tokenization
- Convert a paragraph of test or a file into an array of words.
- Part-of-speech annotation
- Named Entity Recognition
Machine Learning
Datasets and Challenges
SQuAD
Link
The Stanford Question Answering Dataset. There are two versions of this dataset, 1.1 and 2.0.
Transformer
Attention is all you need paper
A neural network architecture by Google which uses encoder-decoder attention and self-attention.
It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks.
However, it's computational complexity is quadratic in the number of input and output tokens due to attention.
- Guides and explanations
Google Bert
Github Link
Paper
Blog Post
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A pretrained NLP neural network.
Note the code is written in TensorFlow 1.
Albert
- A Lite BERT for Self-supervised Learning of Language Representations
This is a parameter reduction on Bert.