I have updated (almost) all code files' head docstring by adding text from the old API rst files. So we really don't need those files under 3import. The only exception is for the attack recipe . So I keep that as a single item in the sidebar Besides, I just did a comprehensive clean up of all URLs by using the :ref:
TextAttack Model Zoo
TextAttack includes pre-trained models for different common NLP tasks. This makes it easier for users to get started with TextAttack. It also enables a more fair comparison of attacks from the literature.
All evaluation results were obtained using textattack eval to evaluate models on their default
test dataset (test set, if labels are available, otherwise, eval/validation set). You can use
this command to verify the accuracies for yourself: for example, textattack eval --model roberta-base-mr.
The LSTM and wordCNN models' code is available in textattack.models.helpers. All other models are transformers
imported from the transformers package. To list evaluate all
TextAttack pretrained models, invoke textattack eval without specifying a model: textattack eval --num-examples 1000.
All evaluations shown are on the full validation or test set up to 1000 examples.
LSTM
- AG News (
lstm-ag-news)datasetsdatasetag_news, splittest- Successes: 914/1000
- Accuracy: 91.4%
- IMDB (
lstm-imdb)datasetsdatasetimdb, splittest- Successes: 883/1000
- Accuracy: 88.30%
- Movie Reviews [Rotten Tomatoes] (
lstm-mr)datasetsdatasetrotten_tomatoes, splitvalidation- Successes: 807/1000
- Accuracy: 80.70%
datasetsdatasetrotten_tomatoes, splittest- Successes: 781/1000
- Accuracy: 78.10%
- SST-2 (
lstm-sst2)datasetsdatasetglue, subsetsst2, splitvalidation- Successes: 737/872
- Accuracy: 84.52%
- Yelp Polarity (
lstm-yelp)datasetsdatasetyelp_polarity, splittest- Successes: 922/1000
- Accuracy: 92.20%
wordCNN
- AG News (
cnn-ag-news)datasetsdatasetag_news, splittest- Successes: 910/1000
- Accuracy: 91.00%
- IMDB (
cnn-imdb)datasetsdatasetimdb, splittest- Successes: 863/1000
- Accuracy: 86.30%
- Movie Reviews [Rotten Tomatoes] (
cnn-mr)datasetsdatasetrotten_tomatoes, splitvalidation- Successes: 794/1000
- Accuracy: 79.40%
datasetsdatasetrotten_tomatoes, splittest- Successes: 768/1000
- Accuracy: 76.80%
- SST-2 (
cnn-sst2)datasetsdatasetglue, subsetsst2, splitvalidation- Successes: 721/872
- Accuracy: 82.68%
- Yelp Polarity (
cnn-yelp)datasetsdatasetyelp_polarity, splittest- Successes: 913/1000
- Accuracy: 91.30%
albert-base-v2
- AG News (
albert-base-v2-ag-news)datasetsdatasetag_news, splittest- Successes: 943/1000
- Accuracy: 94.30%
- CoLA (
albert-base-v2-cola)datasetsdatasetglue, subsetcola, splitvalidation- Successes: 829/1000
- Accuracy: 82.90%
- IMDB (
albert-base-v2-imdb)datasetsdatasetimdb, splittest- Successes: 913/1000
- Accuracy: 91.30%
- Movie Reviews [Rotten Tomatoes] (
albert-base-v2-mr)datasetsdatasetrotten_tomatoes, splitvalidation- Successes: 882/1000
- Accuracy: 88.20%
datasetsdatasetrotten_tomatoes, splittest- Successes: 851/1000
- Accuracy: 85.10%
- Quora Question Pairs (
albert-base-v2-qqp)datasetsdatasetglue, subsetqqp, splitvalidation- Successes: 914/1000
- Accuracy: 91.40%
- Recognizing Textual Entailment (
albert-base-v2-rte)datasetsdatasetglue, subsetrte, splitvalidation- Successes: 211/277
- Accuracy: 76.17%
- SNLI (
albert-base-v2-snli)datasetsdatasetsnli, splittest- Successes: 883/1000
- Accuracy: 88.30%
- SST-2 (
albert-base-v2-sst2)datasetsdatasetglue, subsetsst2, splitvalidation- Successes: 807/872
- Accuracy: 92.55%)
- STS-b (
albert-base-v2-stsb)datasetsdatasetglue, subsetstsb, splitvalidation- Pearson correlation: 0.9041359738552746
- Spearman correlation: 0.8995912861209745
- WNLI (
albert-base-v2-wnli)datasetsdatasetglue, subsetwnli, splitvalidation- Successes: 42/71
- Accuracy: 59.15%
- Yelp Polarity (
albert-base-v2-yelp)datasetsdatasetyelp_polarity, splittest- Successes: 963/1000
- Accuracy: 96.30%
bert-base-uncased
- AG News (
bert-base-uncased-ag-news)datasetsdatasetag_news, splittest- Successes: 942/1000
- Accuracy: 94.20%
- CoLA (
bert-base-uncased-cola)datasetsdatasetglue, subsetcola, splitvalidation- Successes: 812/1000
- Accuracy: 81.20%
- IMDB (
bert-base-uncased-imdb)datasetsdatasetimdb, splittest- Successes: 919/1000
- Accuracy: 91.90%
- MNLI matched (
bert-base-uncased-mnli)datasetsdatasetglue, subsetmnli, splitvalidation_matched- Successes: 840/1000
- Accuracy: 84.00%
- Movie Reviews [Rotten Tomatoes] (
bert-base-uncased-mr)datasetsdatasetrotten_tomatoes, splitvalidation- Successes: 876/1000
- Accuracy: 87.60%
datasetsdatasetrotten_tomatoes, splittest- Successes: 838/1000
- Accuracy: 83.80%
- MRPC (
bert-base-uncased-mrpc)datasetsdatasetglue, subsetmrpc, splitvalidation- Successes: 358/408
- Accuracy: 87.75%
- QNLI (
bert-base-uncased-qnli)datasetsdatasetglue, subsetqnli, splitvalidation- Successes: 904/1000
- Accuracy: 90.40%
- Quora Question Pairs (
bert-base-uncased-qqp)datasetsdatasetglue, subsetqqp, splitvalidation- Successes: 924/1000
- Accuracy: 92.40%
- Recognizing Textual Entailment (
bert-base-uncased-rte)datasetsdatasetglue, subsetrte, splitvalidation- Successes: 201/277
- Accuracy: 72.56%
- SNLI (
bert-base-uncased-snli)datasetsdatasetsnli, splittest- Successes: 894/1000
- Accuracy: 89.40%
- SST-2 (
bert-base-uncased-sst2)datasetsdatasetglue, subsetsst2, splitvalidation- Successes: 806/872
- Accuracy: 92.43%)
- STS-b (
bert-base-uncased-stsb)datasetsdatasetglue, subsetstsb, splitvalidation- Pearson correlation: 0.8775458937815515
- Spearman correlation: 0.8773251339980935
- WNLI (
bert-base-uncased-wnli)datasetsdatasetglue, subsetwnli, splitvalidation- Successes: 40/71
- Accuracy: 56.34%
- Yelp Polarity (
bert-base-uncased-yelp)datasetsdatasetyelp_polarity, splittest- Successes: 963/1000
- Accuracy: 96.30%
distilbert-base-cased
- CoLA (
distilbert-base-cased-cola)datasetsdatasetglue, subsetcola, splitvalidation- Successes: 786/1000
- Accuracy: 78.60%
- MRPC (
distilbert-base-cased-mrpc)datasetsdatasetglue, subsetmrpc, splitvalidation- Successes: 320/408
- Accuracy: 78.43%
- Quora Question Pairs (
distilbert-base-cased-qqp)datasetsdatasetglue, subsetqqp, splitvalidation- Successes: 908/1000
- Accuracy: 90.80%
- SNLI (
distilbert-base-cased-snli)datasetsdatasetsnli, splittest- Successes: 861/1000
- Accuracy: 86.10%
- SST-2 (
distilbert-base-cased-sst2)datasetsdatasetglue, subsetsst2, splitvalidation- Successes: 785/872
- Accuracy: 90.02%)
- STS-b (
distilbert-base-cased-stsb)datasetsdatasetglue, subsetstsb, splitvalidation- Pearson correlation: 0.8421540899520146
- Spearman correlation: 0.8407155030382939
distilbert-base-uncased
- AG News (
distilbert-base-uncased-ag-news)datasetsdatasetag_news, splittest- Successes: 944/1000
- Accuracy: 94.40%
- CoLA (
distilbert-base-uncased-cola)datasetsdatasetglue, subsetcola, splitvalidation- Successes: 786/1000
- Accuracy: 78.60%
- IMDB (
distilbert-base-uncased-imdb)datasetsdatasetimdb, splittest- Successes: 903/1000
- Accuracy: 90.30%
- MNLI matched (
distilbert-base-uncased-mnli)datasetsdatasetglue, subsetmnli, splitvalidation_matched- Successes: 817/1000
- Accuracy: 81.70%
- MRPC (
distilbert-base-uncased-mrpc)datasetsdatasetglue, subsetmrpc, splitvalidation- Successes: 350/408
- Accuracy: 85.78%
- QNLI (
distilbert-base-uncased-qnli)datasetsdatasetglue, subsetqnli, splitvalidation- Successes: 860/1000
- Accuracy: 86.00%
- Recognizing Textual Entailment (
distilbert-base-uncased-rte)datasetsdatasetglue, subsetrte, splitvalidation- Successes: 180/277
- Accuracy: 64.98%
- STS-b (
distilbert-base-uncased-stsb)datasetsdatasetglue, subsetstsb, splitvalidation- Pearson correlation: 0.8421540899520146
- Spearman correlation: 0.8407155030382939
- WNLI (
distilbert-base-uncased-wnli)datasetsdatasetglue, subsetwnli, splitvalidation- Successes: 40/71
- Accuracy: 56.34%
roberta-base
- AG News (
roberta-base-ag-news)datasetsdatasetag_news, splittest- Successes: 947/1000
- Accuracy: 94.70%
- CoLA (
roberta-base-cola)datasetsdatasetglue, subsetcola, splitvalidation- Successes: 857/1000
- Accuracy: 85.70%
- IMDB (
roberta-base-imdb)datasetsdatasetimdb, splittest- Successes: 941/1000
- Accuracy: 94.10%
- Movie Reviews [Rotten Tomatoes] (
roberta-base-mr)datasetsdatasetrotten_tomatoes, splitvalidation- Successes: 899/1000
- Accuracy: 89.90%
datasetsdatasetrotten_tomatoes, splittest- Successes: 883/1000
- Accuracy: 88.30%
- MRPC (
roberta-base-mrpc)datasetsdatasetglue, subsetmrpc, splitvalidation- Successes: 371/408
- Accuracy: 91.18%
- QNLI (
roberta-base-qnli)datasetsdatasetglue, subsetqnli, splitvalidation- Successes: 917/1000
- Accuracy: 91.70%
- Recognizing Textual Entailment (
roberta-base-rte)datasetsdatasetglue, subsetrte, splitvalidation- Successes: 217/277
- Accuracy: 78.34%
- SST-2 (
roberta-base-sst2)datasetsdatasetglue, subsetsst2, splitvalidation- Successes: 820/872
- Accuracy: 94.04%)
- STS-b (
roberta-base-stsb)datasetsdatasetglue, subsetstsb, splitvalidation- Pearson correlation: 0.906067852162708
- Spearman correlation: 0.9025045272903051
- WNLI (
roberta-base-wnli)datasetsdatasetglue, subsetwnli, splitvalidation- Successes: 40/71
- Accuracy: 56.34%
xlnet-base-cased
- CoLA (
xlnet-base-cased-cola)datasetsdatasetglue, subsetcola, splitvalidation- Successes: 800/1000
- Accuracy: 80.00%
- IMDB (
xlnet-base-cased-imdb)datasetsdatasetimdb, splittest- Successes: 957/1000
- Accuracy: 95.70%
- Movie Reviews [Rotten Tomatoes] (
xlnet-base-cased-mr)datasetsdatasetrotten_tomatoes, splitvalidation- Successes: 908/1000
- Accuracy: 90.80%
datasetsdatasetrotten_tomatoes, splittest- Successes: 876/1000
- Accuracy: 87.60%
- MRPC (
xlnet-base-cased-mrpc)datasetsdatasetglue, subsetmrpc, splitvalidation- Successes: 363/408
- Accuracy: 88.97%
- Recognizing Textual Entailment (
xlnet-base-cased-rte)datasetsdatasetglue, subsetrte, splitvalidation- Successes: 196/277
- Accuracy: 70.76%
- STS-b (
xlnet-base-cased-stsb)datasetsdatasetglue, subsetstsb, splitvalidation- Pearson correlation: 0.883111673280641
- Spearman correlation: 0.8773439961182335
- WNLI (
xlnet-base-cased-wnli)datasetsdatasetglue, subsetwnli, splitvalidation- Successes: 41/71
- Accuracy: 57.75%