NLP

2021-01-05 1 minute read CN/中文

Overview

awesome nlp

nlp overview benchmark
paperswithcode - https://paperswithcode.com/sota/sentiment-analysis-on-imdb

Pre-trained words embeddding

fasttext [glove]

Pre-train model

BERT (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
XLNet (from Google/CMU) released with the paper XLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
RoBERTa (from Facebook), a Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du et al.
DistilBERT (from HuggingFace), released together with the blogpost Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT by Victor Sanh, Lysandre Debut and Thomas Wolf.

Courses

nlp standford

Others

github awesome nlp list

Visualisation

Pre-trained word embedding demo

Pre-processing

nltk tokenize

emoticons
urls

techniques

porter stemming

We can see that the stop words list above contains some words that could be important in some contexts. These could be words like i, not, between, because, won, against. You might need to customize the stop words list for some applications.

For the punctuation, we saw earlier that certain groupings like ’:)’ and ‘…‘ should be retained when dealing with tweets because they are used to express emotions. In other contexts, like medical analysis, these should also be removed.

NLP

Overview

Pre-trained words embeddding

Pre-train model

Courses

Others

Visualisation

Pre-processing

Topic modelling

My template

TODO

Leave a comment

You may also enjoy

may Logs

april Logs

Mar Logs

feb Logs