mirror of
https://github.com/QData/TextAttack.git
synced 2021-10-13 00:05:06 +03:00
Merge pull request #338 from QData/new_docs
update the readme table of recipes and try to fix the readthedoc build error
This commit is contained in:
212
README.md
212
README.md
@@ -105,55 +105,177 @@ We include attack recipes which implement attacks from the literature. You can l
|
||||
|
||||
To run an attack recipe: `textattack attack --recipe [recipe_name]`
|
||||
|
||||
Attacks on classification tasks, like sentiment classification and entailment:
|
||||
- **alzantot**: Genetic algorithm attack from (["Generating Natural Language Adversarial Examples" (Alzantot et al., 2018)](https://arxiv.org/abs/1804.07998)).
|
||||
- **bae**: BERT masked language model transformation attack from (["BAE: BERT-based Adversarial Examples for Text Classification" (Garg & Ramakrishnan, 2019)](https://arxiv.org/abs/2004.01970)).
|
||||
- **bert-attack**: BERT masked language model transformation attack with subword replacements (["BERT-ATTACK: Adversarial Attack Against BERT Using BERT" (Li et al., 2020)](https://arxiv.org/abs/2004.09984)).
|
||||
- **checklist**: Invariance testing implemented in CheckList that contract, extend, and substitutes name entities. (["Beyond Accuracy: Behavioral
|
||||
Testing of NLP models with CheckList" (Ribeiro et al., 2020)](https://arxiv.org/abs/2005.04118)).
|
||||
- **clare (*coming soon*)**: Greedy attack with word swap, insertion, and merge transformations using RoBERTa masked language model. (["Contextualized Perturbation for Textual Adversarial Attack" (Li et al., 2020)](https://arxiv.org/abs/2009.07502)).
|
||||
- **faster-alzantot**: modified, faster version of the Alzantot et al. genetic algorithm, from (["Certified Robustness to Adversarial Word Substitutions" (Jia et al., 2019)](https://arxiv.org/abs/1909.00986)).
|
||||
- **deepwordbug**: Greedy replace-1 scoring and multi-transformation character-swap attack (["Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers" (Gao et al., 2018)](https://arxiv.org/abs/1801.04354)).
|
||||
- **hotflip**: Beam search and gradient-based word swap (["HotFlip: White-Box Adversarial Examples for Text Classification" (Ebrahimi et al., 2017)](https://arxiv.org/abs/1712.06751)).
|
||||
- **iga**: Improved genetic algorithm attack from (["Natural Language Adversarial Attacks and Defenses in Word Level (Wang et al., 2019)"](https://arxiv.org/abs/1909.06723)
|
||||
- **input-reduction**: Reducing the input while maintaining the prediction through word importance ranking (["Pathologies of Neural Models Make Interpretation Difficult" (Feng et al., 2018)](https://arxiv.org/pdf/1804.07781.pdf)).
|
||||
- **kuleshov**: Greedy search and counterfitted embedding swap (["Adversarial Examples for Natural Language Classification Problems" (Kuleshov et al., 2018)](https://openreview.net/pdf?id=r1QZ3zbAZ)).
|
||||
- **pruthi**: Character-based attack that simulates common typos (["Combating Adversarial Misspellings with Robust Word Recognition" (Pruthi et al., 2019)](https://arxiv.org/abs/1905.11268)
|
||||
- **pso**: Particle swarm optimization and HowNet synonym swap (["Word-level Textual Adversarial Attacking as Combinatorial Optimization" (Zang et al., 2020)](https://www.aclweb.org/anthology/2020.acl-main.540/)).
|
||||
- **pwws**: Greedy attack with word importance ranking based on word saliency and synonym swap scores (["Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency" (Ren et al., 2019)](https://www.aclweb.org/anthology/P19-1103/)).
|
||||
- **textbugger**: Greedy attack with word importance ranking and a combination of synonym and character-based swaps ([(["TextBugger: Generating Adversarial Text Against Real-world Applications" (Li et al., 2018)](https://arxiv.org/abs/1812.05271)).
|
||||
- **textfooler**: Greedy attack with word importance ranking and counter-fitted embedding swap (["Is Bert Really Robust?" (Jin et al., 2019)](https://arxiv.org/abs/1907.11932)).
|
||||
|
||||
Attacks on sequence-to-sequence models:
|
||||
- **morpheus**: Greedy attack that replaces words with their inflections with the goal of minimizing BLEU score (["It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations"](https://www.aclweb.org/anthology/2020.acl-main.263.pdf)
|
||||
- **seq2sick**: Greedy attack with goal of changing every word in the output translation. Currently implemented as black-box with plans to change to white-box as done in paper (["Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples" (Cheng et al., 2018)](https://arxiv.org/abs/1803.01128)).
|
||||
|
||||
Following table illustrates the comparison of the attack models.
|
||||
<table>
|
||||
<thead>
|
||||
<tr class="header">
|
||||
<th style="text-align: left;"><strong>Attack Recipe Name</strong></th>
|
||||
<th style="text-align: left;"><strong>Goal Function</strong></th>
|
||||
<th style="text-align: left; width:130px" ><strong>Constraints-Enforced</strong></th>
|
||||
<th style="text-align: left;"><strong>Transformation</strong></th>
|
||||
<th style="text-align: left;"><strong>Search Method</strong></th>
|
||||
<th style="text-align: left;"><strong>Main Idea</strong></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr><td colspan="6"><strong>Attacks on classification tasks, like sentiment classification and entailment:</strong></td></tr>
|
||||
|
||||
Attacks on classification tasks
|
||||
| Attack Recipe(s) | Accessibility | Perturbation | Main Idea |
|
||||
| :-------------------------------: | :-------------: | :----------: | :-----------------------------------------------------------------------------------------------: |
|
||||
| Alzantot Genetic Algorithm | Score | Word | Genetic algorithm-based word substitution |
|
||||
| BAE* | Score | Word | BERT masked language model transformation attack |
|
||||
| Faster Alzantot Genetic Algorithm | Score | Word | Genetic algorithm-based word substitution(faster version) |
|
||||
| Improved Genetic Algorithm | Score | Word | Improved genetic algorithm-based word substitution |
|
||||
| Input Reduction* | Gradient | Word | Reducing the input while maintaining the prediction through word importance ranking |
|
||||
| Kuleshov | Score | Word | Greedy search and counterfitted embedding swap |
|
||||
| Particle Swarm Optimization | Score | Word | Particle Swarm Optimization-based word substitution |
|
||||
| TextFooler | Score | Word | Greedy attack with word importance ranking and counter-fitted embedding swap |
|
||||
| PWWS | Score | Word | Greedy attack with word importance ranking based on word saliency and synonym swap scores |
|
||||
| TextBugger | Gradient, Score | Word+Char | Greedy attack with word importance ranking and a combination of synonym and character-based swaps |
|
||||
| HotFlip | Gradient | Word, Char | Beam search and gradient-based word swap |
|
||||
| BERT-Attack* | Score | Word, Char | BERT masked language model transformation attack with subword replacements |
|
||||
| CheckList* | Score | Word, Char | Invariance testing that contract, extend, and substitutes name entities. |
|
||||
| DeepWordBug | Score | Char | Greedy replace-1 scoring and multi-transformation character-swap attack |
|
||||
| pruthi | Score | Char | Character-based attack that simulates common typos |
|
||||
<tr class="even">
|
||||
<td style="text-align: left;"><code>alzantot</code> <span class="citation" data-cites="Alzantot2018GeneratingNL Jia2019CertifiedRT"></span></td>
|
||||
<td style="text-align: left;"><sub>Untargeted {Classification, Entailment}</sub></td>
|
||||
<td style="text-align: left;"><sub>Percentage of words perturbed, Language Model perplexity, Word embedding distance</sub></td>
|
||||
<td style="text-align: left;"><sub>Counter-fitted word embedding swap</sub></td>
|
||||
<td style="text-align: left;"><sub>Genetic Algorithm</sub></td>
|
||||
<td ><sub>from (["Generating Natural Language Adversarial Examples" (Alzantot et al., 2018)](https://arxiv.org/abs/1804.07998))</sub></td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td style="text-align: left;"><code>bae</code> <span class="citation" data-cites="garg2020bae"></span></td>
|
||||
<td style="text-align: left;"><sub>Untargeted Classification</sub></td>
|
||||
<td style="text-align: left;"><sub>USE sentence encoding cosine similarity</sub></td>
|
||||
<td style="text-align: left;"><sub>BERT Masked Token Prediction</sub></td>
|
||||
<td style="text-align: left;"><sub>Greedy-WIR</sub></td>
|
||||
<td ><sub>BERT masked language model transformation attack from (["BAE: BERT-based Adversarial Examples for Text Classification" (Garg & Ramakrishnan, 2019)](https://arxiv.org/abs/2004.01970)). </td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td style="text-align: left;"><code>bert-attack</code> <span class="citation" data-cites="li2020bertattack"></span></td>
|
||||
<td style="text-align: left;"><sub>Untargeted Classification</td>
|
||||
<td style="text-align: left;"><sub>USE sentence encoding cosine similarity, Maximum number of words perturbed</td>
|
||||
<td style="text-align: left;"><sub>BERT Masked Token Prediction (with subword expansion)</td>
|
||||
<td style="text-align: left;"><sub>Greedy-WIR</sub></td>
|
||||
<td ><sub> (["BERT-ATTACK: Adversarial Attack Against BERT Using BERT" (Li et al., 2020)](https://arxiv.org/abs/2004.09984))</sub></td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td style="text-align: left;"><code>checklist</code> <span class="citation" data-cites="Gao2018BlackBoxGO"></span></td>
|
||||
<td style="text-align: left;"><sub>{Untargeted, Targeted} Classification</sub></td>
|
||||
<td style="text-align: left;"><sub>checklist distance</sub></td>
|
||||
<td style="text-align: left;"><sub>contract, extend, and substitutes name entities</sub></td>
|
||||
<td style="text-align: left;"><sub>Greedy-WIR</sub></td>
|
||||
<td ><sub>Invariance testing implemented in CheckList . (["Beyond Accuracy: Behavioral Testing of NLP models with CheckList" (Ribeiro et al., 2020)](https://arxiv.org/abs/2005.04118))</sub></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td style="text-align: left;"> <code>clare (*coming soon*)</code> <span class="citation" data-cites="Alzantot2018GeneratingNL Jia2019CertifiedRT"></span></td>
|
||||
<td style="text-align: left;"><sub>Untargeted {Classification, Entailment}</sub></td>
|
||||
<td style="text-align: left;"><sub>RoBERTa masked language model</sub></td>
|
||||
<td style="text-align: left;"><sub>word swap, insertion, and merge</sub></td>
|
||||
<td style="text-align: left;"><sub>Greedy</sub></td>
|
||||
<td ><sub>["Contextualized Perturbation for Textual Adversarial Attack" (Li et al., 2020)](https://arxiv.org/abs/2009.07502))</sub></td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td style="text-align: left;"><code>deepwordbug</code> <span class="citation" data-cites="Gao2018BlackBoxGO"></span></td>
|
||||
<td style="text-align: left;"><sub>{Untargeted, Targeted} Classification</sub></td>
|
||||
<td style="text-align: left;"><sub>Levenshtein edit distance</sub></td>
|
||||
<td style="text-align: left;"><sub>{Character Insertion, Character Deletion, Neighboring Character Swap, Character Substitution}</sub></td>
|
||||
<td style="text-align: left;"><sub>Greedy-WIR</sub></td>
|
||||
<td ><sub>Greedy replace-1 scoring and multi-transformation character-swap attack (["Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers" (Gao et al., 2018)](https://arxiv.org/abs/1801.04354)</sub></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td style="text-align: left;"> <code>fast-alzantot</code> <span class="citation" data-cites="Alzantot2018GeneratingNL Jia2019CertifiedRT"></span></td>
|
||||
<td style="text-align: left;"><sub>Untargeted {Classification, Entailment}</sub></td>
|
||||
<td style="text-align: left;"><sub>Percentage of words perturbed, Language Model perplexity, Word embedding distance</sub></td>
|
||||
<td style="text-align: left;"><sub>Counter-fitted word embedding swap</sub></td>
|
||||
<td style="text-align: left;"><sub>Genetic Algorithm</sub></td>
|
||||
<td ><sub>Modified, faster version of the Alzantot et al. genetic algorithm, from (["Certified Robustness to Adversarial Word Substitutions" (Jia et al., 2019)](https://arxiv.org/abs/1909.00986))</sub></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td style="text-align: left;"><code>hotflip</code> (word swap) <span class="citation" data-cites="Ebrahimi2017HotFlipWA"></span></td>
|
||||
<td style="text-align: left;"><sub>Untargeted Classification</sub></td>
|
||||
<td style="text-align: left;"><sub>Word Embedding Cosine Similarity, Part-of-speech match, Number of words perturbed</sub></td>
|
||||
<td style="text-align: left;"><sub>Gradient-Based Word Swap</sub></td>
|
||||
<td style="text-align: left;"><sub>Beam search</sub></td>
|
||||
<td ><sub> (["HotFlip: White-Box Adversarial Examples for Text Classification" (Ebrahimi et al., 2017)](https://arxiv.org/abs/1712.06751))</sub></td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td style="text-align: left;"><code>iga</code> <span class="citation" data-cites="iga-wang2019natural"></span></td>
|
||||
<td style="text-align: left;"><sub>Untargeted {Classification, Entailment}</sub></td>
|
||||
<td style="text-align: left;"><sub>Percentage of words perturbed, Word embedding distance</sub></td>
|
||||
<td style="text-align: left;"><sub>Counter-fitted word embedding swap</sub></td>
|
||||
<td style="text-align: left;"><sub>Genetic Algorithm</sub></td>
|
||||
<td ><sub>Improved genetic algorithm -based word substitution from (["Natural Language Adversarial Attacks and Defenses in Word Level (Wang et al., 2019)"](https://arxiv.org/abs/1909.06723)</sub></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td style="text-align: left;"><code>input-reduction</code> <span class="citation" data-cites="feng2018pathologies"></span></td>
|
||||
<td style="text-align: left;"><sub>Input Reduction</sub></td>
|
||||
<td style="text-align: left;"></td>
|
||||
<td style="text-align: left;"><sub>Word deletion</sub></td>
|
||||
<td style="text-align: left;"><sub>Greedy-WIR</sub></td>
|
||||
<td ><sub>Greedy attack with word importance ranking , Reducing the input while maintaining the prediction through word importance ranking (["Pathologies of Neural Models Make Interpretation Difficult" (Feng et al., 2018)](https://arxiv.org/pdf/1804.07781.pdf))</sub></td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td style="text-align: left;"><code>kuleshov</code> <span class="citation" data-cites="Kuleshov2018AdversarialEF"></span></td>
|
||||
<td style="text-align: left;"><sub>Untargeted Classification</sub></td>
|
||||
<td style="text-align: left;"><sub>Thought vector encoding cosine similarity, Language model similarity probability</sub></td>
|
||||
<td style="text-align: left;"><sub>Counter-fitted word embedding swap</sub></td>
|
||||
<td style="text-align: left;"><sub>Greedy word swap</sub></td>
|
||||
<td ><sub>(["Adversarial Examples for Natural Language Classification Problems" (Kuleshov et al., 2018)](https://openreview.net/pdf?id=r1QZ3zbAZ)) </sub></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td style="text-align: left;"><code>pruthi</code> <span class="citation" data-cites="pruthi2019combating"></span></td>
|
||||
<td style="text-align: left;"><sub>Untargeted Classification</sub></td>
|
||||
<td style="text-align: left;"><sub>Minimum word length, Maximum number of words perturbed</sub></td>
|
||||
<td style="text-align: left;"><sub>{Neighboring Character Swap, Character Deletion, Character Insertion, Keyboard-Based Character Swap}</sub></td>
|
||||
<td style="text-align: left;"><sub>Greedy search</sub></td>
|
||||
<td ><sub>simulates common typos (["Combating Adversarial Misspellings with Robust Word Recognition" (Pruthi et al., 2019)](https://arxiv.org/abs/1905.11268) </sub></td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td style="text-align: left;"><code>pso</code> <span class="citation" data-cites="pso-zang-etal-2020-word"></span></td>
|
||||
<td style="text-align: left;"><sub>Untargeted Classification</sub></td>
|
||||
<td style="text-align: left;"></td>
|
||||
<td style="text-align: left;"><sub>HowNet Word Swap</sub></td>
|
||||
<td style="text-align: left;"><sub>Particle Swarm Optimization</sub></td>
|
||||
<td ><sub>(["Word-level Textual Adversarial Attacking as Combinatorial Optimization" (Zang et al., 2020)](https://www.aclweb.org/anthology/2020.acl-main.540/)) </sub></td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td style="text-align: left;"><code>pwws</code> <span class="citation" data-cites="pwws-ren-etal-2019-generating"></span></td>
|
||||
<td style="text-align: left;"><sub>Untargeted Classification</sub></td>
|
||||
<td style="text-align: left;"></td>
|
||||
<td style="text-align: left;"><sub>WordNet-based synonym swap</sub></td>
|
||||
<td style="text-align: left;"><sub>Greedy-WIR (saliency)</sub></td>
|
||||
<td ><sub>Greedy attack with word importance ranking based on word saliency and synonym swap scores (["Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency" (Ren et al., 2019)](https://www.aclweb.org/anthology/P19-1103/))</sub> </td>
|
||||
</tr>
|
||||
<tr class="even">
|
||||
<td style="text-align: left;"><code>textbugger</code> : (black-box) <span class="citation" data-cites="Li2019TextBuggerGA"></span></td>
|
||||
<td style="text-align: left;"><sub>Untargeted Classification</sub></td>
|
||||
<td style="text-align: left;"><sub>USE sentence encoding cosine similarity</sub></td>
|
||||
<td style="text-align: left;"><sub>{Character Insertion, Character Deletion, Neighboring Character Swap, Character Substitution}</sub></td>
|
||||
<td style="text-align: left;"><sub>Greedy-WIR</sub></td>
|
||||
<td ><sub>([(["TextBugger: Generating Adversarial Text Against Real-world Applications" (Li et al., 2018)](https://arxiv.org/abs/1812.05271)).</sub></td>
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td style="text-align: left;"><code>textfooler</code> <span class="citation" data-cites="Jin2019TextFooler"></span></td>
|
||||
<td style="text-align: left;"><sub>Untargeted {Classification, Entailment}</sub></td>
|
||||
<td style="text-align: left;"><sub>Word Embedding Distance, Part-of-speech match, USE sentence encoding cosine similarity</sub></td>
|
||||
<td style="text-align: left;"><sub>Counter-fitted word embedding swap</sub></td>
|
||||
<td style="text-align: left;"><sub>Greedy-WIR</sub></td>
|
||||
<td ><sub>Greedy attack with word importance ranking (["Is Bert Really Robust?" (Jin et al., 2019)](https://arxiv.org/abs/1907.11932))</sub> </td>
|
||||
</tr>
|
||||
|
||||
<tr><td colspan="6"><strong>Attacks on sequence-to-sequence models:</strong></td></tr>
|
||||
|
||||
<tr class="odd">
|
||||
<td style="text-align: left;"><code>morpheus</code> <span class="citation" data-cites="morpheus-tan-etal-2020-morphin"></span></td>
|
||||
<td style="text-align: left;"><sub>Minimum BLEU Score</sub> </td>
|
||||
<td style="text-align: left;"></td>
|
||||
<td style="text-align: left;"><sub>Inflection Word Swap</sub> </td>
|
||||
<td style="text-align: left;"><sub>Greedy search</sub> </td>
|
||||
<td ><sub>Greedy to replace words with their inflections with the goal of minimizing BLEU score (["It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations"](https://www.aclweb.org/anthology/2020.acl-main.263.pdf)</sub> </td>
|
||||
</tr>
|
||||
|
||||
</tr>
|
||||
<tr class="odd">
|
||||
<td style="text-align: left;"><code>seq2sick</code> :(black-box) <span class="citation" data-cites="cheng2018seq2sick"></span></td>
|
||||
<td style="text-align: left;"><sub>Non-overlapping output</sub> </td>
|
||||
<td style="text-align: left;"></td>
|
||||
<td style="text-align: left;"><sub>Counter-fitted word embedding swap</sub> </td>
|
||||
<td style="text-align: left;"><sub>Greedy-WIR</sub></td>
|
||||
<td ><sub>Greedy attack with goal of changing every word in the output translation. Currently implemented as black-box with plans to change to white-box as done in paper (["Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples" (Cheng et al., 2018)](https://arxiv.org/abs/1803.01128)) </sub> </td>
|
||||
</tr>
|
||||
|
||||
|
||||
</tbody>
|
||||
</font>
|
||||
</table>
|
||||
|
||||
Attacks on sequence-to-sequence models:
|
||||
| Attack Recipe(s) | Accessibility | Perturbation | Main Idea |
|
||||
| :-------------------------------: | :-------------: | :----------: | :-----------------------------------------------------------------------------------------------: |
|
||||
| Seq2Sick | Score | Word | Greedy attack with goal of changing every word in the output translation. |
|
||||
| MORPHEUS | Score | Word | Greedy attack that replaces words with their inflections with the goal of minimizing BLEU score |
|
||||
|
||||
|
||||
#### Recipe Usage Examples
|
||||
|
||||
@@ -29,6 +29,9 @@ We provide the following broad advice to help other future developers create use
|
||||
Our modular and extendable design allows us to reuse many components to offer 15+ different adversarial attack methods proposed by literature. Our model-agnostic and dataset-agnostic design allows users to easily run adversarial attacks against their own models built using any deep learning framework. We hope that our lessons from developing TextAttack will help others create user-friendly open-source NLP libraries.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## More Details in Reference
|
||||
|
||||
```
|
||||
|
||||
@@ -61,3 +61,10 @@ TextAttack has some other features that make it a pleasure to use:
|
||||
- :ref:`Pre-trained Models <models>` for testing attacks and evaluating constraints
|
||||
- :ref:`Visualization options <loggers>` like Weights & Biases and Visdom
|
||||
- :ref:`AttackedText <attacked_text>`, a utility class for strings that includes tools for tokenizing and editing text
|
||||
|
||||
|
||||
|
||||
Indices and Glossary of TextAttack
|
||||
-----------------------------------
|
||||
|
||||
* :ref:`modindex`
|
||||
|
||||
@@ -134,3 +134,8 @@ see some basic information about the dataset.
|
||||
For example, use `textattack peek-dataset --dataset-from-huggingface glue^mrpc` to see
|
||||
information about the MRPC dataset (from the GLUE set of datasets). This will
|
||||
print statistics like the number of labels, average number of words, etc.
|
||||
|
||||
|
||||
## A summary diagram of TextAttack Ecosystem
|
||||
|
||||

|
||||
|
||||
@@ -60,3 +60,7 @@ How to Cite TextAttack
|
||||
}
|
||||
```
|
||||
|
||||
## A summary diagram of TextAttack Ecosystem
|
||||
|
||||

|
||||
|
||||
|
||||
@@ -224,3 +224,7 @@ You can run TextAttack tests with `pytest`. Just type `make test`.
|
||||
|
||||
|
||||
#### This guide was heavily inspired by the awesome [transformers guide to contributing](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
431
docs/2notebook/Example_5_Explain_BERT.ipynb
Normal file
431
docs/2notebook/Example_5_Explain_BERT.ipynb
Normal file
File diff suppressed because one or more lines are too long
361
docs/3recipes/models.md
Normal file
361
docs/3recipes/models.md
Normal file
@@ -0,0 +1,361 @@
|
||||
# TextAttack Model Zoo
|
||||
|
||||
TextAttack includes pre-trained models for different common NLP tasks. This makes it easier for
|
||||
users to get started with TextAttack. It also enables a more fair comparison of attacks from
|
||||
the literature.
|
||||
|
||||
All evaluation results were obtained using `textattack eval` to evaluate models on their default
|
||||
test dataset (test set, if labels are available, otherwise, eval/validation set). You can use
|
||||
this command to verify the accuracies for yourself: for example, `textattack eval --model roberta-base-mr`.
|
||||
|
||||
|
||||
The LSTM and wordCNN models' code is available in `textattack.models.helpers`. All other models are transformers
|
||||
imported from the [`transformers`](https://github.com/huggingface/transformers/) package. To list evaluate all
|
||||
TextAttack pretrained models, invoke `textattack eval` without specifying a model: `textattack eval --num-examples 1000`.
|
||||
All evaluations shown are on the full validation or test set up to 1000 examples.
|
||||
|
||||
### `LSTM`
|
||||
|
||||
<section>
|
||||
|
||||
- AG News (`lstm-ag-news`)
|
||||
- `datasets` dataset `ag_news`, split `test`
|
||||
- Successes: 914/1000
|
||||
- Accuracy: 91.4%
|
||||
- IMDB (`lstm-imdb`)
|
||||
- `datasets` dataset `imdb`, split `test`
|
||||
- Successes: 883/1000
|
||||
- Accuracy: 88.30%
|
||||
- Movie Reviews [Rotten Tomatoes] (`lstm-mr`)
|
||||
- `datasets` dataset `rotten_tomatoes`, split `validation`
|
||||
- Successes: 807/1000
|
||||
- Accuracy: 80.70%
|
||||
- `datasets` dataset `rotten_tomatoes`, split `test`
|
||||
- Successes: 781/1000
|
||||
- Accuracy: 78.10%
|
||||
- SST-2 (`lstm-sst2`)
|
||||
- `datasets` dataset `glue`, subset `sst2`, split `validation`
|
||||
- Successes: 737/872
|
||||
- Accuracy: 84.52%
|
||||
- Yelp Polarity (`lstm-yelp`)
|
||||
- `datasets` dataset `yelp_polarity`, split `test`
|
||||
- Successes: 922/1000
|
||||
- Accuracy: 92.20%
|
||||
|
||||
</section>
|
||||
|
||||
### `wordCNN`
|
||||
|
||||
<section>
|
||||
|
||||
|
||||
- AG News (`cnn-ag-news`)
|
||||
- `datasets` dataset `ag_news`, split `test`
|
||||
- Successes: 910/1000
|
||||
- Accuracy: 91.00%
|
||||
- IMDB (`cnn-imdb`)
|
||||
- `datasets` dataset `imdb`, split `test`
|
||||
- Successes: 863/1000
|
||||
- Accuracy: 86.30%
|
||||
- Movie Reviews [Rotten Tomatoes] (`cnn-mr`)
|
||||
- `datasets` dataset `rotten_tomatoes`, split `validation`
|
||||
- Successes: 794/1000
|
||||
- Accuracy: 79.40%
|
||||
- `datasets` dataset `rotten_tomatoes`, split `test`
|
||||
- Successes: 768/1000
|
||||
- Accuracy: 76.80%
|
||||
- SST-2 (`cnn-sst2`)
|
||||
- `datasets` dataset `glue`, subset `sst2`, split `validation`
|
||||
- Successes: 721/872
|
||||
- Accuracy: 82.68%
|
||||
- Yelp Polarity (`cnn-yelp`)
|
||||
- `datasets` dataset `yelp_polarity`, split `test`
|
||||
- Successes: 913/1000
|
||||
- Accuracy: 91.30%
|
||||
|
||||
</section>
|
||||
|
||||
|
||||
### `albert-base-v2`
|
||||
|
||||
<section>
|
||||
|
||||
- AG News (`albert-base-v2-ag-news`)
|
||||
- `datasets` dataset `ag_news`, split `test`
|
||||
- Successes: 943/1000
|
||||
- Accuracy: 94.30%
|
||||
- CoLA (`albert-base-v2-cola`)
|
||||
- `datasets` dataset `glue`, subset `cola`, split `validation`
|
||||
- Successes: 829/1000
|
||||
- Accuracy: 82.90%
|
||||
- IMDB (`albert-base-v2-imdb`)
|
||||
- `datasets` dataset `imdb`, split `test`
|
||||
- Successes: 913/1000
|
||||
- Accuracy: 91.30%
|
||||
- Movie Reviews [Rotten Tomatoes] (`albert-base-v2-mr`)
|
||||
- `datasets` dataset `rotten_tomatoes`, split `validation`
|
||||
- Successes: 882/1000
|
||||
- Accuracy: 88.20%
|
||||
- `datasets` dataset `rotten_tomatoes`, split `test`
|
||||
- Successes: 851/1000
|
||||
- Accuracy: 85.10%
|
||||
- Quora Question Pairs (`albert-base-v2-qqp`)
|
||||
- `datasets` dataset `glue`, subset `qqp`, split `validation`
|
||||
- Successes: 914/1000
|
||||
- Accuracy: 91.40%
|
||||
- Recognizing Textual Entailment (`albert-base-v2-rte`)
|
||||
- `datasets` dataset `glue`, subset `rte`, split `validation`
|
||||
- Successes: 211/277
|
||||
- Accuracy: 76.17%
|
||||
- SNLI (`albert-base-v2-snli`)
|
||||
- `datasets` dataset `snli`, split `test`
|
||||
- Successes: 883/1000
|
||||
- Accuracy: 88.30%
|
||||
- SST-2 (`albert-base-v2-sst2`)
|
||||
- `datasets` dataset `glue`, subset `sst2`, split `validation`
|
||||
- Successes: 807/872
|
||||
- Accuracy: 92.55%)
|
||||
- STS-b (`albert-base-v2-stsb`)
|
||||
- `datasets` dataset `glue`, subset `stsb`, split `validation`
|
||||
- Pearson correlation: 0.9041359738552746
|
||||
- Spearman correlation: 0.8995912861209745
|
||||
- WNLI (`albert-base-v2-wnli`)
|
||||
- `datasets` dataset `glue`, subset `wnli`, split `validation`
|
||||
- Successes: 42/71
|
||||
- Accuracy: 59.15%
|
||||
- Yelp Polarity (`albert-base-v2-yelp`)
|
||||
- `datasets` dataset `yelp_polarity`, split `test`
|
||||
- Successes: 963/1000
|
||||
- Accuracy: 96.30%
|
||||
|
||||
</section>
|
||||
|
||||
### `bert-base-uncased`
|
||||
|
||||
<section>
|
||||
|
||||
- AG News (`bert-base-uncased-ag-news`)
|
||||
- `datasets` dataset `ag_news`, split `test`
|
||||
- Successes: 942/1000
|
||||
- Accuracy: 94.20%
|
||||
- CoLA (`bert-base-uncased-cola`)
|
||||
- `datasets` dataset `glue`, subset `cola`, split `validation`
|
||||
- Successes: 812/1000
|
||||
- Accuracy: 81.20%
|
||||
- IMDB (`bert-base-uncased-imdb`)
|
||||
- `datasets` dataset `imdb`, split `test`
|
||||
- Successes: 919/1000
|
||||
- Accuracy: 91.90%
|
||||
- MNLI matched (`bert-base-uncased-mnli`)
|
||||
- `datasets` dataset `glue`, subset `mnli`, split `validation_matched`
|
||||
- Successes: 840/1000
|
||||
- Accuracy: 84.00%
|
||||
- Movie Reviews [Rotten Tomatoes] (`bert-base-uncased-mr`)
|
||||
- `datasets` dataset `rotten_tomatoes`, split `validation`
|
||||
- Successes: 876/1000
|
||||
- Accuracy: 87.60%
|
||||
- `datasets` dataset `rotten_tomatoes`, split `test`
|
||||
- Successes: 838/1000
|
||||
- Accuracy: 83.80%
|
||||
- MRPC (`bert-base-uncased-mrpc`)
|
||||
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
|
||||
- Successes: 358/408
|
||||
- Accuracy: 87.75%
|
||||
- QNLI (`bert-base-uncased-qnli`)
|
||||
- `datasets` dataset `glue`, subset `qnli`, split `validation`
|
||||
- Successes: 904/1000
|
||||
- Accuracy: 90.40%
|
||||
- Quora Question Pairs (`bert-base-uncased-qqp`)
|
||||
- `datasets` dataset `glue`, subset `qqp`, split `validation`
|
||||
- Successes: 924/1000
|
||||
- Accuracy: 92.40%
|
||||
- Recognizing Textual Entailment (`bert-base-uncased-rte`)
|
||||
- `datasets` dataset `glue`, subset `rte`, split `validation`
|
||||
- Successes: 201/277
|
||||
- Accuracy: 72.56%
|
||||
- SNLI (`bert-base-uncased-snli`)
|
||||
- `datasets` dataset `snli`, split `test`
|
||||
- Successes: 894/1000
|
||||
- Accuracy: 89.40%
|
||||
- SST-2 (`bert-base-uncased-sst2`)
|
||||
- `datasets` dataset `glue`, subset `sst2`, split `validation`
|
||||
- Successes: 806/872
|
||||
- Accuracy: 92.43%)
|
||||
- STS-b (`bert-base-uncased-stsb`)
|
||||
- `datasets` dataset `glue`, subset `stsb`, split `validation`
|
||||
- Pearson correlation: 0.8775458937815515
|
||||
- Spearman correlation: 0.8773251339980935
|
||||
- WNLI (`bert-base-uncased-wnli`)
|
||||
- `datasets` dataset `glue`, subset `wnli`, split `validation`
|
||||
- Successes: 40/71
|
||||
- Accuracy: 56.34%
|
||||
- Yelp Polarity (`bert-base-uncased-yelp`)
|
||||
- `datasets` dataset `yelp_polarity`, split `test`
|
||||
- Successes: 963/1000
|
||||
- Accuracy: 96.30%
|
||||
|
||||
</section>
|
||||
|
||||
### `distilbert-base-cased`
|
||||
|
||||
<section>
|
||||
|
||||
|
||||
- CoLA (`distilbert-base-cased-cola`)
|
||||
- `datasets` dataset `glue`, subset `cola`, split `validation`
|
||||
- Successes: 786/1000
|
||||
- Accuracy: 78.60%
|
||||
- MRPC (`distilbert-base-cased-mrpc`)
|
||||
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
|
||||
- Successes: 320/408
|
||||
- Accuracy: 78.43%
|
||||
- Quora Question Pairs (`distilbert-base-cased-qqp`)
|
||||
- `datasets` dataset `glue`, subset `qqp`, split `validation`
|
||||
- Successes: 908/1000
|
||||
- Accuracy: 90.80%
|
||||
- SNLI (`distilbert-base-cased-snli`)
|
||||
- `datasets` dataset `snli`, split `test`
|
||||
- Successes: 861/1000
|
||||
- Accuracy: 86.10%
|
||||
- SST-2 (`distilbert-base-cased-sst2`)
|
||||
- `datasets` dataset `glue`, subset `sst2`, split `validation`
|
||||
- Successes: 785/872
|
||||
- Accuracy: 90.02%)
|
||||
- STS-b (`distilbert-base-cased-stsb`)
|
||||
- `datasets` dataset `glue`, subset `stsb`, split `validation`
|
||||
- Pearson correlation: 0.8421540899520146
|
||||
- Spearman correlation: 0.8407155030382939
|
||||
|
||||
</section>
|
||||
|
||||
### `distilbert-base-uncased`
|
||||
|
||||
<section>
|
||||
|
||||
- AG News (`distilbert-base-uncased-ag-news`)
|
||||
- `datasets` dataset `ag_news`, split `test`
|
||||
- Successes: 944/1000
|
||||
- Accuracy: 94.40%
|
||||
- CoLA (`distilbert-base-uncased-cola`)
|
||||
- `datasets` dataset `glue`, subset `cola`, split `validation`
|
||||
- Successes: 786/1000
|
||||
- Accuracy: 78.60%
|
||||
- IMDB (`distilbert-base-uncased-imdb`)
|
||||
- `datasets` dataset `imdb`, split `test`
|
||||
- Successes: 903/1000
|
||||
- Accuracy: 90.30%
|
||||
- MNLI matched (`distilbert-base-uncased-mnli`)
|
||||
- `datasets` dataset `glue`, subset `mnli`, split `validation_matched`
|
||||
- Successes: 817/1000
|
||||
- Accuracy: 81.70%
|
||||
- MRPC (`distilbert-base-uncased-mrpc`)
|
||||
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
|
||||
- Successes: 350/408
|
||||
- Accuracy: 85.78%
|
||||
- QNLI (`distilbert-base-uncased-qnli`)
|
||||
- `datasets` dataset `glue`, subset `qnli`, split `validation`
|
||||
- Successes: 860/1000
|
||||
- Accuracy: 86.00%
|
||||
- Recognizing Textual Entailment (`distilbert-base-uncased-rte`)
|
||||
- `datasets` dataset `glue`, subset `rte`, split `validation`
|
||||
- Successes: 180/277
|
||||
- Accuracy: 64.98%
|
||||
- STS-b (`distilbert-base-uncased-stsb`)
|
||||
- `datasets` dataset `glue`, subset `stsb`, split `validation`
|
||||
- Pearson correlation: 0.8421540899520146
|
||||
- Spearman correlation: 0.8407155030382939
|
||||
- WNLI (`distilbert-base-uncased-wnli`)
|
||||
- `datasets` dataset `glue`, subset `wnli`, split `validation`
|
||||
- Successes: 40/71
|
||||
- Accuracy: 56.34%
|
||||
|
||||
</section>
|
||||
|
||||
### `roberta-base`
|
||||
|
||||
<section>
|
||||
|
||||
- AG News (`roberta-base-ag-news`)
|
||||
- `datasets` dataset `ag_news`, split `test`
|
||||
- Successes: 947/1000
|
||||
- Accuracy: 94.70%
|
||||
- CoLA (`roberta-base-cola`)
|
||||
- `datasets` dataset `glue`, subset `cola`, split `validation`
|
||||
- Successes: 857/1000
|
||||
- Accuracy: 85.70%
|
||||
- IMDB (`roberta-base-imdb`)
|
||||
- `datasets` dataset `imdb`, split `test`
|
||||
- Successes: 941/1000
|
||||
- Accuracy: 94.10%
|
||||
- Movie Reviews [Rotten Tomatoes] (`roberta-base-mr`)
|
||||
- `datasets` dataset `rotten_tomatoes`, split `validation`
|
||||
- Successes: 899/1000
|
||||
- Accuracy: 89.90%
|
||||
- `datasets` dataset `rotten_tomatoes`, split `test`
|
||||
- Successes: 883/1000
|
||||
- Accuracy: 88.30%
|
||||
- MRPC (`roberta-base-mrpc`)
|
||||
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
|
||||
- Successes: 371/408
|
||||
- Accuracy: 91.18%
|
||||
- QNLI (`roberta-base-qnli`)
|
||||
- `datasets` dataset `glue`, subset `qnli`, split `validation`
|
||||
- Successes: 917/1000
|
||||
- Accuracy: 91.70%
|
||||
- Recognizing Textual Entailment (`roberta-base-rte`)
|
||||
- `datasets` dataset `glue`, subset `rte`, split `validation`
|
||||
- Successes: 217/277
|
||||
- Accuracy: 78.34%
|
||||
- SST-2 (`roberta-base-sst2`)
|
||||
- `datasets` dataset `glue`, subset `sst2`, split `validation`
|
||||
- Successes: 820/872
|
||||
- Accuracy: 94.04%)
|
||||
- STS-b (`roberta-base-stsb`)
|
||||
- `datasets` dataset `glue`, subset `stsb`, split `validation`
|
||||
- Pearson correlation: 0.906067852162708
|
||||
- Spearman correlation: 0.9025045272903051
|
||||
- WNLI (`roberta-base-wnli`)
|
||||
- `datasets` dataset `glue`, subset `wnli`, split `validation`
|
||||
- Successes: 40/71
|
||||
- Accuracy: 56.34%
|
||||
|
||||
</section>
|
||||
|
||||
### `xlnet-base-cased`
|
||||
|
||||
<section>
|
||||
|
||||
- CoLA (`xlnet-base-cased-cola`)
|
||||
- `datasets` dataset `glue`, subset `cola`, split `validation`
|
||||
- Successes: 800/1000
|
||||
- Accuracy: 80.00%
|
||||
- IMDB (`xlnet-base-cased-imdb`)
|
||||
- `datasets` dataset `imdb`, split `test`
|
||||
- Successes: 957/1000
|
||||
- Accuracy: 95.70%
|
||||
- Movie Reviews [Rotten Tomatoes] (`xlnet-base-cased-mr`)
|
||||
- `datasets` dataset `rotten_tomatoes`, split `validation`
|
||||
- Successes: 908/1000
|
||||
- Accuracy: 90.80%
|
||||
- `datasets` dataset `rotten_tomatoes`, split `test`
|
||||
- Successes: 876/1000
|
||||
- Accuracy: 87.60%
|
||||
- MRPC (`xlnet-base-cased-mrpc`)
|
||||
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
|
||||
- Successes: 363/408
|
||||
- Accuracy: 88.97%
|
||||
- Recognizing Textual Entailment (`xlnet-base-cased-rte`)
|
||||
- `datasets` dataset `glue`, subset `rte`, split `validation`
|
||||
- Successes: 196/277
|
||||
- Accuracy: 70.76%
|
||||
- STS-b (`xlnet-base-cased-stsb`)
|
||||
- `datasets` dataset `glue`, subset `stsb`, split `validation`
|
||||
- Pearson correlation: 0.883111673280641
|
||||
- Spearman correlation: 0.8773439961182335
|
||||
- WNLI (`xlnet-base-cased-wnli`)
|
||||
- `datasets` dataset `glue`, subset `wnli`, split `validation`
|
||||
- Successes: 41/71
|
||||
- Accuracy: 57.75%
|
||||
|
||||
</section>
|
||||
|
||||
BIN
docs/_static/imgs/intro/textattack_ecosystem.png
vendored
Normal file
BIN
docs/_static/imgs/intro/textattack_ecosystem.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 97 KiB |
@@ -30,7 +30,8 @@ TextAttack Documentation
|
||||
Tutorial 4: Attacking scikit-learn models <2notebook/Example_1_sklearn.ipynb>
|
||||
Tutorial 5: Attacking AllenNLP models <2notebook/Example_2_allennlp.ipynb>
|
||||
Tutorial 6: Attacking multilingual models <2notebook/Example_4_CamemBERT.ipynb>
|
||||
|
||||
Tutorial 7: Explaining BERT model using Captum <2notebook/Example_5_Explain_BERT.ipynb>
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 6
|
||||
@@ -41,4 +42,7 @@ TextAttack Documentation
|
||||
1start/api-design-tips.md
|
||||
3recipes/attack_recipes
|
||||
3recipes/augmenter_recipes
|
||||
apidoc/textattack
|
||||
3recipes/models.md
|
||||
apidoc/textattack
|
||||
|
||||
|
||||
|
||||
@@ -2,6 +2,10 @@
|
||||
Word Swap by Embedding
|
||||
============================================
|
||||
|
||||
Based on paper: `<arxiv.org/abs/1603.00892>`_
|
||||
|
||||
Paper title: Counter-fitting Word Vectors to Linguistic Constraints
|
||||
|
||||
"""
|
||||
import os
|
||||
|
||||
|
||||
Reference in New Issue
Block a user