Natural language processing: Difference between revisions
Line 23: | Line 23: | ||
===Google Bert=== | ===Google Bert=== | ||
{{main| BERT (language model)}} | |||
[https://github.com/google-research/bert Github Link] | [https://github.com/google-research/bert Github Link] | ||
[https://arxiv.org/abs/1810.04805 Paper] | [https://arxiv.org/abs/1810.04805 Paper] |
Revision as of 20:24, 5 November 2019
Natural language processing (NLP)
Classical NLP
The Classical NLP consists of creating a pipeline using processors to create annotations from text files.
Below is an example of a few processors.
- Tokenization
- Convert a paragraph of test or a file into an array of words.
- Part-of-speech annotation
- Named Entity Recognition
Machine Learning
Datasets and Challenges
SQuAD
Link
The Stanford Question Answering Dataset. There are two versions of this dataset, 1.1 and 2.0.
Transformer
Attention is all you need paper
A neural network architecture by Google.
It is currently the best at NLP tasks and has mostly replaced RNNs for these tasks.
Google Bert
Github Link
Paper
Blog Post
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A pretrained NLP neural network.
Note the code is written in TensorFlow 1.