Kaggle - NLP

2021-12-27 less than 1 minute read CN/中文

Description

This is about sentiment analysis. This is a continuation from the previous one which determines whether a comment is toxic. This new one rates how toxic a comment is.

End goal is to have a rating similar to the experts in the field.

Google Jiasaw is where the name comes from.

TODO

Pre-trained word embedding

glove standford

word2vec

All comprehensive More concise

Text cleaning

https://github.com/jfilter/clean-text

Jigsaw-Ridge Ensemble + TFIDF + FastText [0.868]

data argumentation nlpaug

[RAPIDS] TFIDF_linear_model_ensemble

For words duplicates, shorten to one and add new col/feature counting number of times. E.g. happy is less happy than happy*5.

Past winning solution

Toxic Comment Classification Challenge

Reading

Pre-trained word embedding

https://github.com/Hironsan/awesome-embedding-models

Comprehensive pre-processing

https://www.kaggle.com/vinayakshanawad/text-preprocess-py
https://www.kaggle.com/xbf6xbf/processing-helps-boosting-about-0-0005-on-lb
machinelearningmastery

gensim

Mis

Start with a simple model, here I used: Incredibly Simple Naive Bayes [0.768]

Kaggle - NLP

Description

TODO

Pre-trained word embedding

Text cleaning

Past winning solution

Reading

Mis

Leave a comment

You may also enjoy

may Logs

april Logs

Mar Logs

feb Logs