7.5 KiB
Command-Line Usage
The easiest way to use textattack is from the command-line. Installing textattack
will provide you with the handy textattack command which will allow you to do
just about anything TextAttack offers in a single bash command.
Tip
: If you are for some reason unable to use the
textattackcommand, you can access all the same functionality by prependingpython -mto the command (python -m textattack ...).
To see all available commands, type textattack --help. This page explains
some of the most important functionalities of textattack: NLP data augmentation,
adversarial attacks, and training and evaluating models.
Data Augmentation with textattack augment
The easiest way to use our data augmentation tools is with textattack augment <args>. textattack augment
takes an input CSV file and text column to augment, along with the number of words to change per augmentation
and the number of augmentations per input example. It outputs a CSV in the same format with all the augmentation
examples corresponding to the proper columns.
For example, given the following as examples.csv:
"text",label
"the rock is destined to be the 21st century's new conan and that he's going to make a splash even greater than arnold schwarzenegger , jean- claud van damme or steven segal.", 1
"the gorgeously elaborate continuation of 'the lord of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded vision of j . r . r . tolkien's middle-earth .", 1
"take care of my cat offers a refreshingly different slice of asian cinema .", 1
"a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish line proves simply too discouraging to let slide .", 0
"it's a mystery how the movie could be released in this condition .", 0
The command:
textattack augment --csv examples.csv --input-column text --recipe embedding --num-words-to-swap 4 \
--transformations-per-example 2 --exclude-original
will augment the text column with four swaps per augmentation, twice as many augmentations as original inputs, and exclude the original inputs from the
output CSV. (All of this will be saved to augment.csv by default.)
After augmentation, here are the contents of augment.csv:
text,label
"the rock is destined to be the 21st century's newest conan and that he's gonna to make a splashing even stronger than arnold schwarzenegger , jean- claud van damme or steven segal.",1
"the rock is destined to be the 21tk century's novel conan and that he's going to make a splat even greater than arnold schwarzenegger , jean- claud van damme or stevens segal.",1
the gorgeously elaborate continuation of 'the lord of the rings' trilogy is so huge that a column of expression significant adequately describe co-writer/director pedro jackson's expanded vision of j . rs . r . tolkien's middle-earth .,1
the gorgeously elaborate continuation of 'the lordy of the piercings' trilogy is so huge that a column of mots cannot adequately describe co-novelist/director peter jackson's expanded vision of j . r . r . tolkien's middle-earth .,1
take care of my cat offerings a pleasantly several slice of asia cinema .,1
taking care of my cat offers a pleasantly different slice of asiatic kino .,1
a technically good-made suspenser . . . but its abrupt drop in iq points as it races to the finish bloodline proves straightforward too disheartening to let slide .,0
a technically well-made suspenser . . . but its abrupt drop in iq dot as it races to the finish line demonstrates simply too disheartening to leave slide .,0
it's a enigma how the film wo be releases in this condition .,0
it's a enigma how the filmmaking wo be publicized in this condition .,0
The 'embedding' augmentation recipe uses counterfitted embedding nearest-neighbors to augment data.
Adversarial Attacks with textattack attack
The heart of textattack is running adversarial attacks on NLP models with
textattack attack. You can build an attack from the command-line in several ways:
- Use an attack recipe to launch an attack from the literature:
textattack attack --recipe deepwordbug - Build your attack from components:
textattack attack --model lstm-mr --num-examples 20 --search-method beam-search:beam_width=4 \
--transformation word-swap-embedding \
--constraints repeat stopword max-words-perturbed:max_num_words=2 embedding:min_cos_sim=0.8 part-of-speech \
--goal-function untargeted-classification
- Create a python file that builds your attack and load it:
textattack attack --attack-from-file my_file.py:my_attack_name
Training Models with textattack train
With textattack, you can train models on any classification or regression task
from nlp using a single line.
Available Models
TextAttack Models
TextAttack has two build-in model types, a 1-layer bidirectional LSTM with a hidden
state size of 150 (lstm), and a WordCNN with 3 window sizes
(3, 4, 5) and 100 filters for the window size (cnn). Both models set dropout
to 0.3 and use a base of the 200-dimensional GLoVE embeddings.
transformers Models
Along with the lstm and cnn, you can theoretically fine-tune any model based
in the huggingface transformers
repo. Just type the model name (like bert-base-cased) and it will be automatically
loaded.
Here are some models from transformers that have worked well for us:
bert-base-uncasedandbert-base-caseddistilbert-base-uncasedanddistilbert-base-casedalbert-base-v2roberta-basexlnet-base-cased
Evaluating Models with textattack eval-model
Other Commands
Checkpoints and textattack attack-resume
Some attacks can take a very long time. Sometimes this is because they're using a very slow search method (like beam search with a high beam width) or sometimes they're just attacking a large number of samples. In these cases, it can be useful to save attack checkpoints throughout the course of the attack. Then, if the attack crashes for some reason, you can resume without restarting from scratch.
- To save checkpoints while running an attack, add the argument
--checkpoint-interval X, where X is the number of attacks you want to run between checkpoints (for exampletextattack attack <args> --checkpoint-interval 5). - To load an attack from a checkpoint, use
textattack attack-resume --checkpoint-file <checkpoint-file>.
Listing features with textattack list
TextAttack has a lot of built-in features (models, search methods, constraints, etc.)
and it can get overwhelming to keep track of all the options. To list all of the
options within a given category, use textattack list.
For example:
- list all the built-in models:
textattack list models - list all constraints:
textattack list constraints - list all search methods:
textattack list search-methods
Examining datasets with textattack peek-dataset
It can be useful to take a cursory look at and compute some basic statistics of
whatever dataset you're working with. Whether you're loading a dataset of your
own from a file, or one from NLP, you can use textattack peek-dataset to
see some basic information about the dataset.
For example, use textattack peek-dataset --dataset-from-nlp glue:mrpc to see
information about the MRPC dataset (from the GLUE set of datasets). This will
print statistics like the number of labels, average number of words, etc.