1
0
mirror of https://github.com/QData/TextAttack.git synced 2021-10-13 00:05:06 +03:00

add custom dataset API use example in doc

This commit is contained in:
Yanjun Qi
2021-10-08 10:37:32 -04:00
parent caacc1c8a7
commit 42d019262e
2 changed files with 20 additions and 7 deletions

View File

@@ -499,17 +499,23 @@ dataset = [('Today was....', 1), ('This movie is...', 0), ...]
You can then run attacks on samples from this dataset by adding the argument `--dataset-from-file my_dataset.py`.
#### Dataset loading via other mechanism, see: [more details at here](https://textattack.readthedocs.io/en/latest/api/datasets.html)
```python
import textattack
my_dataset = [("text",label),....]
new_dataset = textattack.datasets.Dataset(my_dataset)
```
#### Dataset via AttackedText class
To allow for word replacement after a sequence has been tokenized, we include an `AttackedText` object
which maintains both a list of tokens and the original text, with punctuation. We use this object in favor of a list of words or just raw text.
#### Dataset loading via other mechanism, see: [here](https://textattack.readthedocs.io/en/latest/api/datasets.html)
### Attacks and how to design a new attack

View File

@@ -110,14 +110,21 @@ You can then run attacks on samples from this dataset by adding the argument `--
#### Dataset loading via other mechanism, see: [more details at here](https://textattack.readthedocs.io/en/latest/api/datasets.html)
```python
import textattack
my_dataset = [("text",label),....]
new_dataset = textattack.datasets.Dataset(my_dataset)
```
#### Custom Dataset via AttackedText class
To allow for word replacement after a sequence has been tokenized, we include an `AttackedText` object
which maintains both a list of tokens and the original text, with punctuation. We use this object in favor of a list of words or just raw text.
#### Custome Dataset via Data Frames or other python data objects (*coming soon*)
### 4. Benchmarking Attacks