mirror of
https://github.com/QData/TextAttack.git
synced 2021-10-13 00:05:06 +03:00
add custom dataset API use example in doc
This commit is contained in:
16
README.md
16
README.md
@@ -499,17 +499,23 @@ dataset = [('Today was....', 1), ('This movie is...', 0), ...]
|
|||||||
You can then run attacks on samples from this dataset by adding the argument `--dataset-from-file my_dataset.py`.
|
You can then run attacks on samples from this dataset by adding the argument `--dataset-from-file my_dataset.py`.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#### Dataset loading via other mechanism, see: [more details at here](https://textattack.readthedocs.io/en/latest/api/datasets.html)
|
||||||
|
|
||||||
|
```python
|
||||||
|
import textattack
|
||||||
|
my_dataset = [("text",label),....]
|
||||||
|
new_dataset = textattack.datasets.Dataset(my_dataset)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#### Dataset via AttackedText class
|
#### Dataset via AttackedText class
|
||||||
|
|
||||||
To allow for word replacement after a sequence has been tokenized, we include an `AttackedText` object
|
To allow for word replacement after a sequence has been tokenized, we include an `AttackedText` object
|
||||||
which maintains both a list of tokens and the original text, with punctuation. We use this object in favor of a list of words or just raw text.
|
which maintains both a list of tokens and the original text, with punctuation. We use this object in favor of a list of words or just raw text.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#### Dataset loading via other mechanism, see: [here](https://textattack.readthedocs.io/en/latest/api/datasets.html)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Attacks and how to design a new attack
|
### Attacks and how to design a new attack
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -110,14 +110,21 @@ You can then run attacks on samples from this dataset by adding the argument `--
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#### Dataset loading via other mechanism, see: [more details at here](https://textattack.readthedocs.io/en/latest/api/datasets.html)
|
||||||
|
|
||||||
|
```python
|
||||||
|
import textattack
|
||||||
|
my_dataset = [("text",label),....]
|
||||||
|
new_dataset = textattack.datasets.Dataset(my_dataset)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
#### Custom Dataset via AttackedText class
|
#### Custom Dataset via AttackedText class
|
||||||
|
|
||||||
To allow for word replacement after a sequence has been tokenized, we include an `AttackedText` object
|
To allow for word replacement after a sequence has been tokenized, we include an `AttackedText` object
|
||||||
which maintains both a list of tokens and the original text, with punctuation. We use this object in favor of a list of words or just raw text.
|
which maintains both a list of tokens and the original text, with punctuation. We use this object in favor of a list of words or just raw text.
|
||||||
|
|
||||||
|
|
||||||
#### Custome Dataset via Data Frames or other python data objects (*coming soon*)
|
|
||||||
|
|
||||||
|
|
||||||
### 4. Benchmarking Attacks
|
### 4. Benchmarking Attacks
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user