TextAttack Model Zoo
More details at https://textattack.readthedocs.io/en/latest/3recipes/models.html
TextAttack includes pre-trained models for different common NLP tasks. This makes it easier for users to get started with TextAttack. It also enables a more fair comparison of attacks from the literature.
All evaluation results were obtained using textattack eval to evaluate models on their default
test dataset (test set, if labels are available, otherwise, eval/validation set). You can use
this command to verify the accuracies for yourself: for example, textattack eval --model roberta-base-mr.
The LSTM and wordCNN models' code is available in textattack.models.helpers. All other models are transformers
imported from the transformers package. To list evaluate all
TextAttack pretrained models, invoke textattack eval without specifying a model: textattack eval --num-examples 1000.
All evaluations shown are on the full validation or test set up to 1000 examples.
LSTM
- AG News (
lstm-ag-news)datasetsdatasetag_news, splittest- Successes: 914/1000
- Accuracy: 91.4%
- IMDB (
lstm-imdb)datasetsdatasetimdb, splittest- Successes: 883/1000
- Accuracy: 88.30%
- Movie Reviews [Rotten Tomatoes] (
lstm-mr)datasetsdatasetrotten_tomatoes, splitvalidation- Successes: 807/1000
- Accuracy: 80.70%
datasetsdatasetrotten_tomatoes, splittest- Successes: 781/1000
- Accuracy: 78.10%
- SST-2 (
lstm-sst2)datasetsdatasetglue, subsetsst2, splitvalidation- Successes: 737/872
- Accuracy: 84.52%
- Yelp Polarity (
lstm-yelp)datasetsdatasetyelp_polarity, splittest- Successes: 922/1000
- Accuracy: 92.20%
wordCNN
- AG News (
cnn-ag-news)datasetsdatasetag_news, splittest- Successes: 910/1000
- Accuracy: 91.00%
- IMDB (
cnn-imdb)datasetsdatasetimdb, splittest- Successes: 863/1000
- Accuracy: 86.30%
- Movie Reviews [Rotten Tomatoes] (
cnn-mr)datasetsdatasetrotten_tomatoes, splitvalidation- Successes: 794/1000
- Accuracy: 79.40%
datasetsdatasetrotten_tomatoes, splittest- Successes: 768/1000
- Accuracy: 76.80%
- SST-2 (
cnn-sst2)datasetsdatasetglue, subsetsst2, splitvalidation- Successes: 721/872
- Accuracy: 82.68%
- Yelp Polarity (
cnn-yelp)datasetsdatasetyelp_polarity, splittest- Successes: 913/1000
- Accuracy: 91.30%
albert-base-v2
- AG News (
albert-base-v2-ag-news)datasetsdatasetag_news, splittest- Successes: 943/1000
- Accuracy: 94.30%
- CoLA (
albert-base-v2-cola)datasetsdatasetglue, subsetcola, splitvalidation- Successes: 829/1000
- Accuracy: 82.90%
- IMDB (
albert-base-v2-imdb)datasetsdatasetimdb, splittest- Successes: 913/1000
- Accuracy: 91.30%
- Movie Reviews [Rotten Tomatoes] (
albert-base-v2-mr)datasetsdatasetrotten_tomatoes, splitvalidation- Successes: 882/1000
- Accuracy: 88.20%
datasetsdatasetrotten_tomatoes, splittest- Successes: 851/1000
- Accuracy: 85.10%
- Quora Question Pairs (
albert-base-v2-qqp)datasetsdatasetglue, subsetqqp, splitvalidation- Successes: 914/1000
- Accuracy: 91.40%
- Recognizing Textual Entailment (
albert-base-v2-rte)datasetsdatasetglue, subsetrte, splitvalidation- Successes: 211/277
- Accuracy: 76.17%
- SNLI (
albert-base-v2-snli)datasetsdatasetsnli, splittest- Successes: 883/1000
- Accuracy: 88.30%
- SST-2 (
albert-base-v2-sst2)datasetsdatasetglue, subsetsst2, splitvalidation- Successes: 807/872
- Accuracy: 92.55%)
- STS-b (
albert-base-v2-stsb)datasetsdatasetglue, subsetstsb, splitvalidation- Pearson correlation: 0.9041359738552746
- Spearman correlation: 0.8995912861209745
- WNLI (
albert-base-v2-wnli)datasetsdatasetglue, subsetwnli, splitvalidation- Successes: 42/71
- Accuracy: 59.15%
- Yelp Polarity (
albert-base-v2-yelp)datasetsdatasetyelp_polarity, splittest- Successes: 963/1000
- Accuracy: 96.30%
bert-base-uncased
- AG News (
bert-base-uncased-ag-news)datasetsdatasetag_news, splittest- Successes: 942/1000
- Accuracy: 94.20%
- CoLA (
bert-base-uncased-cola)datasetsdatasetglue, subsetcola, splitvalidation- Successes: 812/1000
- Accuracy: 81.20%
- IMDB (
bert-base-uncased-imdb)datasetsdatasetimdb, splittest- Successes: 919/1000
- Accuracy: 91.90%
- MNLI matched (
bert-base-uncased-mnli)datasetsdatasetglue, subsetmnli, splitvalidation_matched- Successes: 840/1000
- Accuracy: 84.00%
- Movie Reviews [Rotten Tomatoes] (
bert-base-uncased-mr)datasetsdatasetrotten_tomatoes, splitvalidation- Successes: 876/1000
- Accuracy: 87.60%
datasetsdatasetrotten_tomatoes, splittest- Successes: 838/1000
- Accuracy: 83.80%
- MRPC (
bert-base-uncased-mrpc)datasetsdatasetglue, subsetmrpc, splitvalidation- Successes: 358/408
- Accuracy: 87.75%
- QNLI (
bert-base-uncased-qnli)datasetsdatasetglue, subsetqnli, splitvalidation- Successes: 904/1000
- Accuracy: 90.40%
- Quora Question Pairs (
bert-base-uncased-qqp)datasetsdatasetglue, subsetqqp, splitvalidation- Successes: 924/1000
- Accuracy: 92.40%
- Recognizing Textual Entailment (
bert-base-uncased-rte)datasetsdatasetglue, subsetrte, splitvalidation- Successes: 201/277
- Accuracy: 72.56%
- SNLI (
bert-base-uncased-snli)datasetsdatasetsnli, splittest- Successes: 894/1000
- Accuracy: 89.40%
- SST-2 (
bert-base-uncased-sst2)datasetsdatasetglue, subsetsst2, splitvalidation- Successes: 806/872
- Accuracy: 92.43%)
- STS-b (
bert-base-uncased-stsb)datasetsdatasetglue, subsetstsb, splitvalidation- Pearson correlation: 0.8775458937815515
- Spearman correlation: 0.8773251339980935
- WNLI (
bert-base-uncased-wnli)datasetsdatasetglue, subsetwnli, splitvalidation- Successes: 40/71
- Accuracy: 56.34%
- Yelp Polarity (
bert-base-uncased-yelp)datasetsdatasetyelp_polarity, splittest- Successes: 963/1000
- Accuracy: 96.30%
distilbert-base-cased
- CoLA (
distilbert-base-cased-cola)datasetsdatasetglue, subsetcola, splitvalidation- Successes: 786/1000
- Accuracy: 78.60%
- MRPC (
distilbert-base-cased-mrpc)datasetsdatasetglue, subsetmrpc, splitvalidation- Successes: 320/408
- Accuracy: 78.43%
- Quora Question Pairs (
distilbert-base-cased-qqp)datasetsdatasetglue, subsetqqp, splitvalidation- Successes: 908/1000
- Accuracy: 90.80%
- SNLI (
distilbert-base-cased-snli)datasetsdatasetsnli, splittest- Successes: 861/1000
- Accuracy: 86.10%
- SST-2 (
distilbert-base-cased-sst2)datasetsdatasetglue, subsetsst2, splitvalidation- Successes: 785/872
- Accuracy: 90.02%)
- STS-b (
distilbert-base-cased-stsb)datasetsdatasetglue, subsetstsb, splitvalidation- Pearson correlation: 0.8421540899520146
- Spearman correlation: 0.8407155030382939
distilbert-base-uncased
- AG News (
distilbert-base-uncased-ag-news)datasetsdatasetag_news, splittest- Successes: 944/1000
- Accuracy: 94.40%
- CoLA (
distilbert-base-uncased-cola)datasetsdatasetglue, subsetcola, splitvalidation- Successes: 786/1000
- Accuracy: 78.60%
- IMDB (
distilbert-base-uncased-imdb)datasetsdatasetimdb, splittest- Successes: 903/1000
- Accuracy: 90.30%
- MNLI matched (
distilbert-base-uncased-mnli)datasetsdatasetglue, subsetmnli, splitvalidation_matched- Successes: 817/1000
- Accuracy: 81.70%
- MRPC (
distilbert-base-uncased-mrpc)datasetsdatasetglue, subsetmrpc, splitvalidation- Successes: 350/408
- Accuracy: 85.78%
- QNLI (
distilbert-base-uncased-qnli)datasetsdatasetglue, subsetqnli, splitvalidation- Successes: 860/1000
- Accuracy: 86.00%
- Recognizing Textual Entailment (
distilbert-base-uncased-rte)datasetsdatasetglue, subsetrte, splitvalidation- Successes: 180/277
- Accuracy: 64.98%
- STS-b (
distilbert-base-uncased-stsb)datasetsdatasetglue, subsetstsb, splitvalidation- Pearson correlation: 0.8421540899520146
- Spearman correlation: 0.8407155030382939
- WNLI (
distilbert-base-uncased-wnli)datasetsdatasetglue, subsetwnli, splitvalidation- Successes: 40/71
- Accuracy: 56.34%
roberta-base
- AG News (
roberta-base-ag-news)datasetsdatasetag_news, splittest- Successes: 947/1000
- Accuracy: 94.70%
- CoLA (
roberta-base-cola)datasetsdatasetglue, subsetcola, splitvalidation- Successes: 857/1000
- Accuracy: 85.70%
- IMDB (
roberta-base-imdb)datasetsdatasetimdb, splittest- Successes: 941/1000
- Accuracy: 94.10%
- Movie Reviews [Rotten Tomatoes] (
roberta-base-mr)datasetsdatasetrotten_tomatoes, splitvalidation- Successes: 899/1000
- Accuracy: 89.90%
datasetsdatasetrotten_tomatoes, splittest- Successes: 883/1000
- Accuracy: 88.30%
- MRPC (
roberta-base-mrpc)datasetsdatasetglue, subsetmrpc, splitvalidation- Successes: 371/408
- Accuracy: 91.18%
- QNLI (
roberta-base-qnli)datasetsdatasetglue, subsetqnli, splitvalidation- Successes: 917/1000
- Accuracy: 91.70%
- Recognizing Textual Entailment (
roberta-base-rte)datasetsdatasetglue, subsetrte, splitvalidation- Successes: 217/277
- Accuracy: 78.34%
- SST-2 (
roberta-base-sst2)datasetsdatasetglue, subsetsst2, splitvalidation- Successes: 820/872
- Accuracy: 94.04%)
- STS-b (
roberta-base-stsb)datasetsdatasetglue, subsetstsb, splitvalidation- Pearson correlation: 0.906067852162708
- Spearman correlation: 0.9025045272903051
- WNLI (
roberta-base-wnli)datasetsdatasetglue, subsetwnli, splitvalidation- Successes: 40/71
- Accuracy: 56.34%
xlnet-base-cased
- CoLA (
xlnet-base-cased-cola)datasetsdatasetglue, subsetcola, splitvalidation- Successes: 800/1000
- Accuracy: 80.00%
- IMDB (
xlnet-base-cased-imdb)datasetsdatasetimdb, splittest- Successes: 957/1000
- Accuracy: 95.70%
- Movie Reviews [Rotten Tomatoes] (
xlnet-base-cased-mr)datasetsdatasetrotten_tomatoes, splitvalidation- Successes: 908/1000
- Accuracy: 90.80%
datasetsdatasetrotten_tomatoes, splittest- Successes: 876/1000
- Accuracy: 87.60%
- MRPC (
xlnet-base-cased-mrpc)datasetsdatasetglue, subsetmrpc, splitvalidation- Successes: 363/408
- Accuracy: 88.97%
- Recognizing Textual Entailment (
xlnet-base-cased-rte)datasetsdatasetglue, subsetrte, splitvalidation- Successes: 196/277
- Accuracy: 70.76%
- STS-b (
xlnet-base-cased-stsb)datasetsdatasetglue, subsetstsb, splitvalidation- Pearson correlation: 0.883111673280641
- Spearman correlation: 0.8773439961182335
- WNLI (
xlnet-base-cased-wnli)datasetsdatasetglue, subsetwnli, splitvalidation- Successes: 41/71
- Accuracy: 57.75%