1
0
mirror of https://github.com/QData/TextAttack.git synced 2021-10-13 00:05:06 +03:00

fixing bugs in example folder..

This commit is contained in:
Yanjun Qi
2021-08-02 16:42:49 -04:00
parent 27cf667553
commit c57444940a
13 changed files with 74 additions and 35 deletions

3
.gitignore vendored
View File

@@ -45,4 +45,5 @@ checkpoints/
# vim
*.swp
.vscode
.vscode
*.csv

View File

@@ -71,7 +71,11 @@ or a specific command using, for example,
textattack attack --help
```
The [`examples/`](examples/) folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file. The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
The [`examples/`](examples/) folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file.
The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
### Running Attacks: `textattack attack --help`
@@ -323,7 +327,9 @@ For example, given the following as `examples.csv`:
"it's a mystery how the movie could be released in this condition .", 0
```
The command `textattack augment --csv examples.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
The command
```textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original
```
will augment the `text` column by altering 10% of each example's words, generating twice as many augmentations as original inputs, and exclude the original inputs from the
output CSV. (All of this will be saved to `augment.csv` by default.)
@@ -453,7 +459,7 @@ create a short file that loads them as variables `model` and `tokenizer`. The `
be able to transform string inputs to lists or tensors of IDs using a method called `encode()`. The
model must take inputs via the `__call__` method.
##### Model from a file
##### Custom Model from a file
To experiment with a model you've trained, you could create the following file
and name it `my_model.py`:
@@ -488,14 +494,12 @@ which maintains both a list of tokens and the original text, with punctuation. W
#### Dataset via Data Frames (*coming soon*)
#### Dataset loading via other mechanism, see: [here](https://textattack.readthedocs.io/en/latest/api/datasets.html)
### Attacks and how to design a new attack
The `attack_one` method in an `Attack` takes as input an `AttackedText`, and outputs either a `SuccessfulAttackResult` if it succeeds or a `FailedAttackResult` if it fails.
We formulate an attack as consisting of four components: a **goal function** which determines if the attack has succeeded, **constraints** defining which perturbations are valid, a **transformation** that generates potential modifications given an input, and a **search method** which traverses through the search space of possible perturbations. The attack attempts to perturb an input text such that the model output fulfills the goal function (i.e., indicating whether the attack is successful) and the perturbation adheres to the set of constraints (e.g., grammar constraint, semantic similarity constraint). A search method is used to find a sequence of transformations that produce a successful adversarial example.

View File

@@ -315,7 +315,7 @@ TextAttack 的组件中,有很多易用的数据增强工具。`textattack.Aug
"it's a mystery how the movie could be released in this condition .", 0
```
使用命令 `textattack augment --csv examples.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
使用命令 `textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
会增强 `text` 列,约束对样本中 10% 的词进行修改,生成输入数据两倍的样本,同时结果文件中不保存 csv 文件的原始输入。(默认所有结果将会保存在 `augment.csv` 文件中)
数据增强后,下面是 `augment.csv` 文件的内容:
@@ -454,8 +454,6 @@ dataset = [('Today was....', 1), ('This movie is...', 0), ...]
### 何为攻击 & 如何设计新的攻击
`Attack` 中的 `attack_one` 方法以 `AttackedText` 对象作为输入,若攻击成功,返回 `SuccessfulAttackResult`,若攻击失败,返回 `FailedAttackResult`
我们将攻击划分并定义为四个组成部分:**目标函数** 定义怎样的攻击是一次成功的攻击,**约束条件** 定义怎样的扰动是可行的,**变换规则** 对输入文本生成一系列可行的扰动结果,**搜索方法** 在搜索空间中遍历所有可行的扰动结果。每一次攻击都尝试对输入的文本添加扰动,使其通过目标函数(即判断攻击是否成功),并且扰动要符合约束(如语法约束,语义相似性约束)。最后用搜索方法在所有可行的变换结果中,挑选出优质的对抗样本。

View File

@@ -40,7 +40,7 @@ For example, given the following as `examples.csv`:
The command:
```
textattack augment --csv examples.csv --input-column text --recipe eda --pct-words-to-swap .1 \
textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe eda --pct-words-to-swap .1 \
--transformations-per-example 2 --exclude-original
```
will augment the `text` column with 10% of words edited per augmentation, twice as many augmentations as original inputs, and exclude the original inputs from the

View File

@@ -11,6 +11,41 @@ TextAttack Extended Functions (Multilingual)
## We have built a new WebDemo For Visulizing TextAttack generated Examples;
- [TextAttack-WebDemo Github](https://github.com/QData/TextAttack-WebDemo)
- [TextAttack-WebDemo Github](https://github.com/QData/TextAttack-WebDemo)
## User defined custom inputs and models
### Custom Datasets: Dataset from a file
Loading a dataset from a file is very similar to loading a model from a file. A 'dataset' is any iterable of `(input, output)` pairs.
The following example would load a sentiment classification dataset from file `my_dataset.py`:
```python
dataset = [('Today was....', 1), ('This movie is...', 0), ...]
```
You can then run attacks on samples from this dataset by adding the argument `--dataset-from-file my_dataset.py`.
#### Custom Model: from a file
To experiment with a model you've trained, you could create the following file
and name it `my_model.py`:
```python
model = load_your_model_with_custom_code() # replace this line with your model loading code
tokenizer = load_your_tokenizer_with_custom_code() # replace this line with your tokenizer loading code
```
Then, run an attack with the argument `--model-from-file my_model.py`. The model and tokenizer will be loaded automatically.
### Custom attack components
The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..

View File

@@ -38,7 +38,9 @@ and the number of augmentations per input example. It outputs a CSV in the same
"it's a mystery how the movie could be released in this condition .", 0
```
The command `textattack augment --csv examples.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
The command
```textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original
```
will augment the `text` column by altering 10% of each example's words, generating twice as many augmentations as original inputs, and exclude the original inputs from the
output CSV. (All of this will be saved to `augment.csv` by default.)

View File

@@ -7,6 +7,7 @@ from transformers import AutoTokenizer, TFAutoModelForSequenceClassification, pi
from textattack.attack_recipes import PWWSRen2019
from textattack.datasets import HuggingFaceDataset
from textattack.models.wrappers import ModelWrapper
from textattack import Attacker
if "TF_CPP_MIN_LOG_LEVEL" not in os.environ:
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
@@ -20,11 +21,11 @@ class HuggingFaceSentimentAnalysisPipelineWrapper(ModelWrapper):
[[0.218262017, 0.7817379832267761]
"""
def __init__(self, pipeline):
self.pipeline = pipeline
def __init__(self, model):
self.model = model
def __call__(self, text_inputs):
raw_outputs = self.pipeline(text_inputs)
raw_outputs = self.model(text_inputs)
outputs = []
for output in raw_outputs:
score = output["score"]
@@ -55,7 +56,6 @@ recipe = PWWSRen2019.build(model_wrapper)
recipe.transformation.language = "fra"
dataset = HuggingFaceDataset("allocine", split="test")
for idx, result in enumerate(recipe.attack_dataset(dataset)):
print(("-" * 20), f"Result {idx+1}", ("-" * 20))
print(result.__str__(color_method="ansi"))
print()
attacker = Attacker(recipe, dataset)
results = attacker.attack_dataset()

View File

@@ -3,5 +3,5 @@
# model on the Yelp dataset.
textattack attack --attack-n --goal-function untargeted-classification \
--model bert-base-uncased-yelp --num-examples 8 --transformation word-swap-wordnet \
--constraints edit-distance^12 max-words-perturbed:max_percent=0.75 repeat stopword \
--constraints edit-distance^12 max-words-perturbed^max_percent=0.75 repeat stopword \
--search greedy

View File

@@ -1,11 +1,11 @@
text,label
"the rock is destined to be the 21st century's novel conan and that he's go to make a splash yet greater than arnold schwarzenegger , jean- claud van damme or steven segal.",1
"the rock is destined to be the 21st century's novo conan and that he's going to make a splash yet greater than arnold schwarzenegger , jean- claud van damme or stephens segal.",1
the gorgeously elaborate continuation of 'the lord of the rings' triad is so massive that a column of words cannot adequately describe co-writer/director pete jackson's expanded vision of j . r . r . tolkien's middle-earth .,1
the gorgeously elaborate continuation of 'the lordy of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/superintendent peter jackson's enlargements vision of j . r . r . tolkien's middle-earth .,1
take care of my cat offers a cheerfully different slice of asian cinema .,1
take care of my cat offers a refreshingly different slice of asian cinemas .,1
a technically well-made suspenser . . . but its abrupt fall in iq points as it races to the finish line demonstrating simply too discouraging to let slide .,0
a technologically well-made suspenser . . . but its abrupt dip in iq points as it races to the finish line proves simply too discouraging to let slide .,0
it's a mystery how the cinematography could be released in this condition .,0
it's a mystery how the movies could be released in this condition .,0
"the rock is destined to be the new conan and that he's going to make a splash even greater than arnold , jean- claud van damme or steven segal.",1
"the rock is destined to be the 21st century's new conan and that he's going to caravan make a splash even greater than arnold schwarzenegger , jean- claud van damme or steven segal.",1
the gorgeously rarify continuation of 'the lord of the rings' trilogy is so huge that a column of give-and-take cannot adequately describe co-writer/director shaft jackson's expanded vision of j . r . r . tolkien's middle-earth .,1
the gorgeously elaborate of 'the of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded of j . r . r . tolkien's middle-earth .,1
take care different my cat offers a refreshingly of slice of asian cinema .,1
take care of my cat offers a different slice of asian cinema .,1
a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish IT line proves simply too discouraging to let slide .,0
a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish demarcation proves plainly too discouraging to let slide .,0
it's pic a mystery how the movie could be released in this condition .,0
it's a mystery how the movie could in released be this condition .,0
1 text label
2 the rock is destined to be the 21st century's novel conan and that he's go to make a splash yet greater than arnold schwarzenegger , jean- claud van damme or steven segal. the rock is destined to be the new conan and that he's going to make a splash even greater than arnold , jean- claud van damme or steven segal. 1
3 the rock is destined to be the 21st century's novo conan and that he's going to make a splash yet greater than arnold schwarzenegger , jean- claud van damme or stephens segal. the rock is destined to be the 21st century's new conan and that he's going to caravan make a splash even greater than arnold schwarzenegger , jean- claud van damme or steven segal. 1
4 the gorgeously elaborate continuation of 'the lord of the rings' triad is so massive that a column of words cannot adequately describe co-writer/director pete jackson's expanded vision of j . r . r . tolkien's middle-earth . the gorgeously rarify continuation of 'the lord of the rings' trilogy is so huge that a column of give-and-take cannot adequately describe co-writer/director shaft jackson's expanded vision of j . r . r . tolkien's middle-earth . 1
5 the gorgeously elaborate continuation of 'the lordy of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/superintendent peter jackson's enlargements vision of j . r . r . tolkien's middle-earth . the gorgeously elaborate of 'the of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded of j . r . r . tolkien's middle-earth . 1
6 take care of my cat offers a cheerfully different slice of asian cinema . take care different my cat offers a refreshingly of slice of asian cinema . 1
7 take care of my cat offers a refreshingly different slice of asian cinemas . take care of my cat offers a different slice of asian cinema . 1
8 a technically well-made suspenser . . . but its abrupt fall in iq points as it races to the finish line demonstrating simply too discouraging to let slide . a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish IT line proves simply too discouraging to let slide . 0
9 a technologically well-made suspenser . . . but its abrupt dip in iq points as it races to the finish line proves simply too discouraging to let slide . a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish demarcation proves plainly too discouraging to let slide . 0
10 it's a mystery how the cinematography could be released in this condition . it's pic a mystery how the movie could be released in this condition . 0
11 it's a mystery how the movies could be released in this condition . it's a mystery how the movie could in released be this condition . 0

View File

@@ -1,2 +1,2 @@
#!/bin/bash
textattack augment --csv examples.csv --input-column text --recipe eda --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original --overwrite
textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe eda --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original --overwrite

View File

@@ -1,2 +0,0 @@
"text",label
"it's a mystery how the movie could be released in this condition .", 0
1 text label
2 it's a mystery how the movie could be released in this condition . 0

View File

@@ -1,4 +1,4 @@
#!/bin/bash
# Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a basic
# demonstration of our training script and `datasets` integration.
textattack train --model-name-or-path lstm --dataset rotten_romatoes --epochs 50 --learning-rate 1e-5
textattack train --model-name-or-path lstm --dataset imdb --epochs 50 --learning-rate 1e-5

View File

@@ -360,6 +360,7 @@ class _CommandLineTrainingArgs:
# Arguments that are needed if we want to create a model to train.
parser.add_argument(
"--model-name-or-path",
"--model",
type=str,
required=True,
help='Name or path of the model we want to create. "lstm" and "cnn" will create TextAttack\'s LSTM and CNN models while'