Natural language processing: Difference between revisions
Created page with "Natural language processing (NLP) ==Classical NLP== ==Machine Learning== ===Transformer=== ===Google Bert===" |
|||
| (7 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
__FORCETOC__ | |||
Natural language processing (NLP) | Natural language processing (NLP) | ||
==Classical NLP== | ==Classical NLP== | ||
The Classical NLP consists of creating a pipeline using processors to create annotations from text files.<br> | |||
Below is an example of a few processors.<br> | |||
* Tokenization | |||
** Convert a paragraph of test or a file into an array of words. | |||
* Part-of-speech annotation | |||
* Named Entity Recognition | |||
==Machine Learning== | ==Machine Learning== | ||
===Datasets and Challenges=== | |||
====SQuAD==== | |||
[https://rajpurkar.github.io/SQuAD-explorer/ Link]<br> | |||
The Stanford Question Answering Dataset. There are two versions of this dataset, 1.1 and 2.0. | |||
===Transformer=== | ===Transformer=== | ||
{{ main | Transformer (machine learning model)}} | |||
[https://arxiv.org/abs/1706.03762 Attention is all you need paper] | |||
A neural network architecture by Google which uses encoder-decoder attention and self-attention. | |||
It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks. | |||
However, it's computational complexity is quadratic in the number of input and output tokens due to attention. | |||
;Guides and explanations | |||
* [https://nlp.seas.harvard.edu/2018/04/03/attention.html The Annotated Transformer] | |||
* [https://www.youtube.com/watch?v=iDulhoQ2pro Youtube Video] | |||
===Google Bert=== | ===Google Bert=== | ||
{{main| BERT (language model)}} | |||
[https://github.com/google-research/bert Github Link] | |||
[https://arxiv.org/abs/1810.04805 Paper] | |||
[https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html Blog Post]<br> | |||
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding<br> | |||
A pretrained NLP neural network. | |||
Note the code is written in TensorFlow 1. | |||
====Albert==== | |||
[https://github.com/google-research/google-research/tree/master/albert Github]<br> | |||
;A Lite BERT for Self-supervised Learning of Language Representations | |||
This is a parameter reduction on Bert. | |||
==Libraries== | |||
===Apache OpenNLP=== | |||
[https://opennlp.apache.org/ Link] | |||