12 KiB
TextAttack Model Zoo
TextAttack includes pre-trained models for different common NLP tasks. This makes it easier for users to get started with TextAttack. It also enables a more fair comparison of attacks from the literature.
All evaluation results were obtained using textattack eval to evaluate models on their default
test dataset (test set, if labels are available, otherwise, eval/validation set). You can use
this command to verify the accuracies for yourself: for example, textattack eval --model roberta-base-mr.
The LSTM and wordCNN models' code is available in textattack.models.helpers. All other models are transformers
imported from the transformers package. To list evaluate all
TextAttack pretrained models, invoke textattack eval without specifying a model: textattack eval --num-examples 1000.
All evaluations shown are on the full validation or test set up to 1000 examples.
LSTM
- AG News (
lstm-ag-news)- nlp dataset
ag_news, splittest- Successes: 914/1000
- Accuracy: 91.4%
- nlp dataset
- IMDB (
lstm-imdb)- nlp dataset
imdb, splittest- Successes: 883/1000
- Accuracy: 88.30%
- nlp dataset
- Movie Reviews [Rotten Tomatoes] (
lstm-mr)- nlp dataset
rotten_tomatoes, splitvalidation- Successes: 807/1000
- Accuracy: 80.70%
- nlp dataset
rotten_tomatoes, splittest- Successes: 781/1000
- Accuracy: 78.10%
- nlp dataset
- SST-2 (
lstm-sst2)- nlp dataset
glue, subsetsst2, splitvalidation- Successes: 737/872
- Accuracy: 84.52%
- nlp dataset
- Yelp Polarity (
lstm-yelp)- nlp dataset
yelp_polarity, splittest- Successes: 922/1000
- Accuracy: 92.20%
- nlp dataset
wordCNN
- AG News (
cnn-ag-news)- nlp dataset
ag_news, splittest- Successes: 910/1000
- Accuracy: 91.00%
- nlp dataset
- IMDB (
cnn-imdb)- nlp dataset
imdb, splittest- Successes: 863/1000
- Accuracy: 86.30%
- nlp dataset
- Movie Reviews [Rotten Tomatoes] (
cnn-mr)- nlp dataset
rotten_tomatoes, splitvalidation- Successes: 794/1000
- Accuracy: 79.40%
- nlp dataset
rotten_tomatoes, splittest- Successes: 768/1000
- Accuracy: 76.80%
- nlp dataset
- SST-2 (
cnn-sst2)- nlp dataset
glue, subsetsst2, splitvalidation- Successes: 721/872
- Accuracy: 82.68%
- nlp dataset
- Yelp Polarity (
cnn-yelp)- nlp dataset
yelp_polarity, splittest- Successes: 913/1000
- Accuracy: 91.30%
- nlp dataset
albert-base-v2
- AG News (
albert-base-v2-ag-news)- nlp dataset
ag_news, splittest- Successes: 943/1000
- Accuracy: 94.30%
- nlp dataset
- CoLA (
albert-base-v2-cola)- nlp dataset
glue, subsetcola, splitvalidation- Successes: 829/1000
- Accuracy: 82.90%
- nlp dataset
- IMDB (
albert-base-v2-imdb)- nlp dataset
imdb, splittest- Successes: 913/1000
- Accuracy: 91.30%
- nlp dataset
- Movie Reviews [Rotten Tomatoes] (
albert-base-v2-mr)- nlp dataset
rotten_tomatoes, splitvalidation- Successes: 882/1000
- Accuracy: 88.20%
- nlp dataset
rotten_tomatoes, splittest- Successes: 851/1000
- Accuracy: 85.10%
- nlp dataset
- Quora Question Pairs (
albert-base-v2-qqp)- nlp dataset
glue, subsetqqp, splitvalidation- Successes: 914/1000
- Accuracy: 91.40%
- nlp dataset
- Recognizing Textual Entailment (
albert-base-v2-rte)- nlp dataset
glue, subsetrte, splitvalidation- Successes: 211/277
- Accuracy: 76.17%
- nlp dataset
- SNLI (
albert-base-v2-snli)- nlp dataset
snli, splittest- Successes: 883/1000
- Accuracy: 88.30%
- nlp dataset
- SST-2 (
albert-base-v2-sst2)- nlp dataset
glue, subsetsst2, splitvalidation- Successes: 807/872
- Accuracy: 92.55%)
- nlp dataset
- STS-b (
albert-base-v2-stsb)- nlp dataset
glue, subsetstsb, splitvalidation - Pearson correlation: 0.9041359738552746
- Spearman correlation: 0.8995912861209745
- nlp dataset
- WNLI (
albert-base-v2-wnli)- nlp dataset
glue, subsetwnli, splitvalidation- Successes: 42/71
- Accuracy: 59.15%
- nlp dataset
- Yelp Polarity (
albert-base-v2-yelp)- nlp dataset
yelp_polarity, splittest- Successes: 963/1000
- Accuracy: 96.30%
- nlp dataset
bert-base-uncased
- AG News (
bert-base-uncased-ag-news)- nlp dataset
ag_news, splittest- Successes: 942/1000
- Accuracy: 94.20%
- nlp dataset
- CoLA (
bert-base-uncased-cola)- nlp dataset
glue, subsetcola, splitvalidation- Successes: 812/1000
- Accuracy: 81.20%
- nlp dataset
- IMDB (
bert-base-uncased-imdb)- nlp dataset
imdb, splittest- Successes: 919/1000
- Accuracy: 91.90%
- nlp dataset
- MNLI matched (
bert-base-uncased-mnli)- nlp dataset
glue, subsetmnli, splitvalidation_matched- Successes: 840/1000
- Accuracy: 84.00%
- nlp dataset
- Movie Reviews [Rotten Tomatoes] (
bert-base-uncased-mr)- nlp dataset
rotten_tomatoes, splitvalidation- Successes: 876/1000
- Accuracy: 87.60%
- nlp dataset
rotten_tomatoes, splittest- Successes: 838/1000
- Accuracy: 83.80%
- nlp dataset
- MRPC (
bert-base-uncased-mrpc)- nlp dataset
glue, subsetmrpc, splitvalidation- Successes: 358/408
- Accuracy: 87.75%
- nlp dataset
- QNLI (
bert-base-uncased-qnli)- nlp dataset
glue, subsetqnli, splitvalidation- Successes: 904/1000
- Accuracy: 90.40%
- nlp dataset
- Quora Question Pairs (
bert-base-uncased-qqp)- nlp dataset
glue, subsetqqp, splitvalidation- Successes: 924/1000
- Accuracy: 92.40%
- nlp dataset
- Recognizing Textual Entailment (
bert-base-uncased-rte)- nlp dataset
glue, subsetrte, splitvalidation- Successes: 201/277
- Accuracy: 72.56%
- nlp dataset
- SNLI (
bert-base-uncased-snli)- nlp dataset
snli, splittest- Successes: 894/1000
- Accuracy: 89.40%
- nlp dataset
- SST-2 (
bert-base-uncased-sst2)- nlp dataset
glue, subsetsst2, splitvalidation- Successes: 806/872
- Accuracy: 92.43%)
- nlp dataset
- STS-b (
bert-base-uncased-stsb)- nlp dataset
glue, subsetstsb, splitvalidation - Pearson correlation: 0.8775458937815515
- Spearman correlation: 0.8773251339980935
- nlp dataset
- WNLI (
bert-base-uncased-wnli)- nlp dataset
glue, subsetwnli, splitvalidation- Successes: 40/71
- Accuracy: 56.34%
- nlp dataset
- Yelp Polarity (
bert-base-uncased-yelp)- nlp dataset
yelp_polarity, splittest- Successes: 963/1000
- Accuracy: 96.30%
- nlp dataset
distilbert-base-cased
- CoLA (
distilbert-base-cased-cola)- nlp dataset
glue, subsetcola, splitvalidation- Successes: 786/1000
- Accuracy: 78.60%
- nlp dataset
- MRPC (
distilbert-base-cased-mrpc)- nlp dataset
glue, subsetmrpc, splitvalidation- Successes: 320/408
- Accuracy: 78.43%
- nlp dataset
- Quora Question Pairs (
distilbert-base-cased-qqp)- nlp dataset
glue, subsetqqp, splitvalidation- Successes: 908/1000
- Accuracy: 90.80%
- nlp dataset
- SNLI (
distilbert-base-cased-snli)- nlp dataset
snli, splittest- Successes: 861/1000
- Accuracy: 86.10%
- nlp dataset
- SST-2 (
distilbert-base-cased-sst2)- nlp dataset
glue, subsetsst2, splitvalidation- Successes: 785/872
- Accuracy: 90.02%)
- nlp dataset
- STS-b (
distilbert-base-cased-stsb)- nlp dataset
glue, subsetstsb, splitvalidation - Pearson correlation: 0.8421540899520146
- Spearman correlation: 0.8407155030382939
- nlp dataset
distilbert-base-uncased
- AG News (
distilbert-base-uncased-ag-news)- nlp dataset
ag_news, splittest- Successes: 944/1000
- Accuracy: 94.40%
- nlp dataset
- CoLA (
distilbert-base-uncased-cola)- nlp dataset
glue, subsetcola, splitvalidation- Successes: 786/1000
- Accuracy: 78.60%
- nlp dataset
- IMDB (
distilbert-base-uncased-imdb)- nlp dataset
imdb, splittest- Successes: 903/1000
- Accuracy: 90.30%
- nlp dataset
- MNLI matched (
distilbert-base-uncased-mnli)- nlp dataset
glue, subsetmnli, splitvalidation_matched- Successes: 817/1000
- Accuracy: 81.70%
- nlp dataset
- MRPC (
distilbert-base-uncased-mrpc)- nlp dataset
glue, subsetmrpc, splitvalidation- Successes: 350/408
- Accuracy: 85.78%
- nlp dataset
- QNLI (
distilbert-base-uncased-qnli)- nlp dataset
glue, subsetqnli, splitvalidation- Successes: 860/1000
- Accuracy: 86.00%
- nlp dataset
- Recognizing Textual Entailment (
distilbert-base-uncased-rte)- nlp dataset
glue, subsetrte, splitvalidation- Successes: 180/277
- Accuracy: 64.98%
- nlp dataset
- STS-b (
distilbert-base-uncased-stsb)- nlp dataset
glue, subsetstsb, splitvalidation - Pearson correlation: 0.8421540899520146
- Spearman correlation: 0.8407155030382939
- nlp dataset
- WNLI (
distilbert-base-uncased-wnli)- nlp dataset
glue, subsetwnli, splitvalidation- Successes: 40/71
- Accuracy: 56.34%
- nlp dataset
roberta-base
- AG News (
roberta-base-ag-news)- nlp dataset
ag_news, splittest- Successes: 947/1000
- Accuracy: 94.70%
- nlp dataset
- CoLA (
roberta-base-cola)- nlp dataset
glue, subsetcola, splitvalidation- Successes: 857/1000
- Accuracy: 85.70%
- nlp dataset
- IMDB (
roberta-base-imdb)- nlp dataset
imdb, splittest- Successes: 941/1000
- Accuracy: 94.10%
- nlp dataset
- Movie Reviews [Rotten Tomatoes] (
roberta-base-mr)- nlp dataset
rotten_tomatoes, splitvalidation- Successes: 899/1000
- Accuracy: 89.90%
- nlp dataset
rotten_tomatoes, splittest- Successes: 883/1000
- Accuracy: 88.30%
- nlp dataset
- MRPC (
roberta-base-mrpc)- nlp dataset
glue, subsetmrpc, splitvalidation- Successes: 371/408
- Accuracy: 91.18%
- nlp dataset
- QNLI (
roberta-base-qnli)- nlp dataset
glue, subsetqnli, splitvalidation- Successes: 917/1000
- Accuracy: 91.70%
- nlp dataset
- Recognizing Textual Entailment (
roberta-base-rte)- nlp dataset
glue, subsetrte, splitvalidation- Successes: 217/277
- Accuracy: 78.34%
- nlp dataset
- SST-2 (
roberta-base-sst2)- nlp dataset
glue, subsetsst2, splitvalidation- Successes: 820/872
- Accuracy: 94.04%)
- nlp dataset
- STS-b (
roberta-base-stsb)- nlp dataset
glue, subsetstsb, splitvalidation - Pearson correlation: 0.906067852162708
- Spearman correlation: 0.9025045272903051
- nlp dataset
- WNLI (
roberta-base-wnli)- nlp dataset
glue, subsetwnli, splitvalidation- Successes: 40/71
- Accuracy: 56.34%
- nlp dataset
xlnet-base-cased
- CoLA (
xlnet-base-cased-cola)- nlp dataset
glue, subsetcola, splitvalidation- Successes: 800/1000
- Accuracy: 80.00%
- nlp dataset
- IMDB (
xlnet-base-cased-imdb)- nlp dataset
imdb, splittest- Successes: 957/1000
- Accuracy: 95.70%
- nlp dataset
- Movie Reviews [Rotten Tomatoes] (
xlnet-base-cased-mr)- nlp dataset
rotten_tomatoes, splitvalidation- Successes: 908/1000
- Accuracy: 90.80%
- nlp dataset
rotten_tomatoes, splittest- Successes: 876/1000
- Accuracy: 87.60%
- nlp dataset
- MRPC (
xlnet-base-cased-mrpc)- nlp dataset
glue, subsetmrpc, splitvalidation- Successes: 363/408
- Accuracy: 88.97%
- nlp dataset
- Recognizing Textual Entailment (
xlnet-base-cased-rte)- nlp dataset
glue, subsetrte, splitvalidation- Successes: 196/277
- Accuracy: 70.76%
- nlp dataset
- STS-b (
xlnet-base-cased-stsb)- nlp dataset
glue, subsetstsb, splitvalidation - Pearson correlation: 0.883111673280641
- Spearman correlation: 0.8773439961182335
- nlp dataset
- WNLI (
xlnet-base-cased-wnli)- nlp dataset
glue, subsetwnli, splitvalidation- Successes: 41/71
- Accuracy: 57.75%
- nlp dataset