mirror of
https://github.com/QData/TextAttack.git
synced 2021-10-13 00:05:06 +03:00
change all mentions of nlp to datasets
This commit is contained in:
12
README.md
12
README.md
@@ -206,7 +206,7 @@ of a string or a list of strings. Here's an example of how to use the `Embedding
|
|||||||
|
|
||||||
Our model training code is available via `textattack train` to help you train LSTMs,
|
Our model training code is available via `textattack train` to help you train LSTMs,
|
||||||
CNNs, and `transformers` models using TextAttack out-of-the-box. Datasets are
|
CNNs, and `transformers` models using TextAttack out-of-the-box. Datasets are
|
||||||
automatically loaded using the `nlp` package.
|
automatically loaded using the `datasets` package.
|
||||||
|
|
||||||
#### Training Examples
|
#### Training Examples
|
||||||
*Train our default LSTM for 50 epochs on the Yelp Polarity dataset:*
|
*Train our default LSTM for 50 epochs on the Yelp Polarity dataset:*
|
||||||
@@ -227,7 +227,7 @@ textattack train --model bert-base-uncased --dataset glue^cola --batch-size 32 -
|
|||||||
|
|
||||||
### `textattack peek-dataset`
|
### `textattack peek-dataset`
|
||||||
|
|
||||||
To take a closer look at a dataset, use `textattack peek-dataset`. TextAttack will print some cursory statistics about the inputs and outputs from the dataset. For example, `textattack peek-dataset --dataset-from-nlp snli` will show information about the SNLI dataset from the NLP package.
|
To take a closer look at a dataset, use `textattack peek-dataset`. TextAttack will print some cursory statistics about the inputs and outputs from the dataset. For example, `textattack peek-dataset --dataset-from-datasets snli` will show information about the SNLI dataset from the NLP package.
|
||||||
|
|
||||||
|
|
||||||
### `textattack list`
|
### `textattack list`
|
||||||
@@ -261,18 +261,18 @@ Here's an example of using one of the built-in models (the SST-2 dataset is auto
|
|||||||
textattack attack --model roberta-base-sst2 --recipe textfooler --num-examples 10
|
textattack attack --model roberta-base-sst2 --recipe textfooler --num-examples 10
|
||||||
```
|
```
|
||||||
|
|
||||||
#### HuggingFace support: `transformers` models and `nlp` datasets
|
#### HuggingFace support: `transformers` models and `datasets` datasets
|
||||||
|
|
||||||
We also provide built-in support for [`transformers` pretrained models](https://huggingface.co/models)
|
We also provide built-in support for [`transformers` pretrained models](https://huggingface.co/models)
|
||||||
and datasets from the [`nlp` package](https://github.com/huggingface/nlp)! Here's an example of loading
|
and datasets from the [`datasets` package](https://github.com/huggingface/datasets)! Here's an example of loading
|
||||||
and attacking a pre-trained model and dataset:
|
and attacking a pre-trained model and dataset:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
textattack attack --model-from-huggingface distilbert-base-uncased-finetuned-sst-2-english --dataset-from-nlp glue^sst2 --recipe deepwordbug --num-examples 10
|
textattack attack --model-from-huggingface distilbert-base-uncased-finetuned-sst-2-english --dataset-from-datasets glue^sst2 --recipe deepwordbug --num-examples 10
|
||||||
```
|
```
|
||||||
|
|
||||||
You can explore other pre-trained models using the `--model-from-huggingface` argument, or other datasets by changing
|
You can explore other pre-trained models using the `--model-from-huggingface` argument, or other datasets by changing
|
||||||
`--dataset-from-nlp`.
|
`--dataset-from-datasets`.
|
||||||
|
|
||||||
|
|
||||||
#### Loading a model or dataset from a file
|
#### Loading a model or dataset from a file
|
||||||
|
|||||||
@@ -73,7 +73,7 @@ textattack attack --model lstm-mr --num-examples 20 --search-method beam-search^
|
|||||||
## Training Models with `textattack train`
|
## Training Models with `textattack train`
|
||||||
|
|
||||||
With textattack, you can train models on any classification or regression task
|
With textattack, you can train models on any classification or regression task
|
||||||
from [`nlp`](https://github.com/huggingface/nlp/) using a single line.
|
from [`datasets`](https://github.com/huggingface/datasets/) using a single line.
|
||||||
|
|
||||||
### Available Models
|
### Available Models
|
||||||
#### TextAttack Models
|
#### TextAttack Models
|
||||||
@@ -131,6 +131,6 @@ whatever dataset you're working with. Whether you're loading a dataset of your
|
|||||||
own from a file, or one from NLP, you can use `textattack peek-dataset` to
|
own from a file, or one from NLP, you can use `textattack peek-dataset` to
|
||||||
see some basic information about the dataset.
|
see some basic information about the dataset.
|
||||||
|
|
||||||
For example, use `textattack peek-dataset --dataset-from-nlp glue^mrpc` to see
|
For example, use `textattack peek-dataset --dataset-from-datasets glue^mrpc` to see
|
||||||
information about the MRPC dataset (from the GLUE set of datasets). This will
|
information about the MRPC dataset (from the GLUE set of datasets). This will
|
||||||
print statistics like the number of labels, average number of words, etc.
|
print statistics like the number of labels, average number of words, etc.
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
# Shows how to attack a DistilBERT model fine-tuned on SST2 dataset *from the
|
# Shows how to attack a DistilBERT model fine-tuned on SST2 dataset *from the
|
||||||
# huggingface model repository& using the DeepWordBug recipe and 10 examples.
|
# huggingface model repository& using the DeepWordBug recipe and 10 examples.
|
||||||
textattack attack --model-from-huggingface distilbert-base-uncased-finetuned-sst-2-english --dataset-from-nlp glue^sst2 --recipe deepwordbug --num-examples 10
|
textattack attack --model-from-huggingface distilbert-base-uncased-finetuned-sst-2-english --dataset-from-datasets glue^sst2 --recipe deepwordbug --num-examples 10
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
# Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a basic
|
# Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a basic
|
||||||
# demonstration of our training script and `nlp` integration.
|
# demonstration of our training script and `datasets` integration.
|
||||||
textattack train --model lstm --dataset rotten_romatoes --batch-size 64 --epochs 50 --learning-rate 1e-5
|
textattack train --model lstm --dataset rotten_romatoes --batch-size 64 --epochs 50 --learning-rate 1e-5
|
||||||
@@ -5,7 +5,7 @@ filelock
|
|||||||
language_tool_python
|
language_tool_python
|
||||||
lemminflect
|
lemminflect
|
||||||
lru-dict
|
lru-dict
|
||||||
nlp
|
datasets
|
||||||
nltk
|
nltk
|
||||||
numpy
|
numpy
|
||||||
pandas>=1.0.1
|
pandas>=1.0.1
|
||||||
|
|||||||
@@ -43,10 +43,10 @@ attack_test_params = [
|
|||||||
(
|
(
|
||||||
"textattack attack --model-from-huggingface "
|
"textattack attack --model-from-huggingface "
|
||||||
"distilbert-base-uncased-finetuned-sst-2-english "
|
"distilbert-base-uncased-finetuned-sst-2-english "
|
||||||
"--dataset-from-nlp glue^sst2^train --recipe deepwordbug --num-examples 3 "
|
"--dataset-from-datasets glue^sst2^train --recipe deepwordbug --num-examples 3 "
|
||||||
"--shuffle=False"
|
"--shuffle=False"
|
||||||
),
|
),
|
||||||
"tests/sample_outputs/run_attack_transformers_nlp.txt",
|
"tests/sample_outputs/run_attack_transformers_datasets.txt",
|
||||||
),
|
),
|
||||||
#
|
#
|
||||||
# test running an attack by loading a model and dataset from file
|
# test running an attack by loading a model and dataset from file
|
||||||
@@ -59,7 +59,7 @@ attack_test_params = [
|
|||||||
"--dataset-from-file tests/sample_inputs/sst_model_and_dataset.py "
|
"--dataset-from-file tests/sample_inputs/sst_model_and_dataset.py "
|
||||||
"--recipe deepwordbug --num-examples 3 --shuffle=False"
|
"--recipe deepwordbug --num-examples 3 --shuffle=False"
|
||||||
),
|
),
|
||||||
"tests/sample_outputs/run_attack_transformers_nlp.txt",
|
"tests/sample_outputs/run_attack_transformers_datasets.txt",
|
||||||
),
|
),
|
||||||
#
|
#
|
||||||
# test hotflip on 10 samples from LSTM MR
|
# test hotflip on 10 samples from LSTM MR
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ import pytest
|
|||||||
eval_test_params = [
|
eval_test_params = [
|
||||||
(
|
(
|
||||||
"eval_model_hub_rt",
|
"eval_model_hub_rt",
|
||||||
"textattack eval --model-from-huggingface textattack/distilbert-base-uncased-rotten-tomatoes --dataset-from-nlp rotten_tomatoes --num-examples 4",
|
"textattack eval --model-from-huggingface textattack/distilbert-base-uncased-rotten-tomatoes --dataset-from-datasets rotten_tomatoes --num-examples 4",
|
||||||
"tests/sample_outputs/eval_model_hub_rt.txt",
|
"tests/sample_outputs/eval_model_hub_rt.txt",
|
||||||
),
|
),
|
||||||
(
|
(
|
||||||
|
|||||||
@@ -64,11 +64,11 @@ def add_dataset_args(parser):
|
|||||||
"""
|
"""
|
||||||
dataset_group = parser.add_mutually_exclusive_group()
|
dataset_group = parser.add_mutually_exclusive_group()
|
||||||
dataset_group.add_argument(
|
dataset_group.add_argument(
|
||||||
"--dataset-from-nlp",
|
"--dataset-from-datasets",
|
||||||
type=str,
|
type=str,
|
||||||
required=False,
|
required=False,
|
||||||
default=None,
|
default=None,
|
||||||
help="Dataset to load from `nlp` repository.",
|
help="Dataset to load from `datasets` repository.",
|
||||||
)
|
)
|
||||||
dataset_group.add_argument(
|
dataset_group.add_argument(
|
||||||
"--dataset-from-file",
|
"--dataset-from-file",
|
||||||
@@ -349,7 +349,7 @@ def parse_dataset_from_args(args):
|
|||||||
# Automatically detect dataset for huggingface & textattack models.
|
# Automatically detect dataset for huggingface & textattack models.
|
||||||
# This allows us to use the --model shortcut without specifying a dataset.
|
# This allows us to use the --model shortcut without specifying a dataset.
|
||||||
if args.model in HUGGINGFACE_DATASET_BY_MODEL:
|
if args.model in HUGGINGFACE_DATASET_BY_MODEL:
|
||||||
_, args.dataset_from_nlp = HUGGINGFACE_DATASET_BY_MODEL[args.model]
|
_, args.dataset_from_datasets = HUGGINGFACE_DATASET_BY_MODEL[args.model]
|
||||||
elif args.model in TEXTATTACK_DATASET_BY_MODEL:
|
elif args.model in TEXTATTACK_DATASET_BY_MODEL:
|
||||||
_, dataset = TEXTATTACK_DATASET_BY_MODEL[args.model]
|
_, dataset = TEXTATTACK_DATASET_BY_MODEL[args.model]
|
||||||
if dataset[0].startswith("textattack"):
|
if dataset[0].startswith("textattack"):
|
||||||
@@ -358,7 +358,7 @@ def parse_dataset_from_args(args):
|
|||||||
dataset = eval(f"{dataset[0]}")(*dataset[1:])
|
dataset = eval(f"{dataset[0]}")(*dataset[1:])
|
||||||
return dataset
|
return dataset
|
||||||
else:
|
else:
|
||||||
args.dataset_from_nlp = dataset
|
args.dataset_from_datasets = dataset
|
||||||
# Automatically detect dataset for models trained with textattack.
|
# Automatically detect dataset for models trained with textattack.
|
||||||
elif args.model and os.path.exists(args.model):
|
elif args.model and os.path.exists(args.model):
|
||||||
model_args_json_path = os.path.join(args.model, "train_args.json")
|
model_args_json_path = os.path.join(args.model, "train_args.json")
|
||||||
@@ -372,7 +372,7 @@ def parse_dataset_from_args(args):
|
|||||||
name, subset = model_train_args["dataset"].split(ARGS_SPLIT_TOKEN)
|
name, subset = model_train_args["dataset"].split(ARGS_SPLIT_TOKEN)
|
||||||
else:
|
else:
|
||||||
name, subset = model_train_args["dataset"], None
|
name, subset = model_train_args["dataset"], None
|
||||||
args.dataset_from_nlp = (
|
args.dataset_from_datasets = (
|
||||||
name,
|
name,
|
||||||
subset,
|
subset,
|
||||||
model_train_args["dataset_dev_split"],
|
model_train_args["dataset_dev_split"],
|
||||||
@@ -403,14 +403,14 @@ def parse_dataset_from_args(args):
|
|||||||
raise AttributeError(
|
raise AttributeError(
|
||||||
f"``dataset`` not found in module {args.dataset_from_file}"
|
f"``dataset`` not found in module {args.dataset_from_file}"
|
||||||
)
|
)
|
||||||
elif args.dataset_from_nlp:
|
elif args.dataset_from_datasets:
|
||||||
dataset_args = args.dataset_from_nlp
|
dataset_args = args.dataset_from_datasets
|
||||||
if isinstance(dataset_args, str):
|
if isinstance(dataset_args, str):
|
||||||
if ARGS_SPLIT_TOKEN in dataset_args:
|
if ARGS_SPLIT_TOKEN in dataset_args:
|
||||||
dataset_args = dataset_args.split(ARGS_SPLIT_TOKEN)
|
dataset_args = dataset_args.split(ARGS_SPLIT_TOKEN)
|
||||||
else:
|
else:
|
||||||
dataset_args = (dataset_args,)
|
dataset_args = (dataset_args,)
|
||||||
dataset = textattack.datasets.HuggingFaceNlpDataset(
|
dataset = textattack.datasets.HuggingFaceDataset(
|
||||||
*dataset_args, shuffle=args.shuffle
|
*dataset_args, shuffle=args.shuffle
|
||||||
)
|
)
|
||||||
dataset.examples = dataset.examples[args.num_examples_offset :]
|
dataset.examples = dataset.examples[args.num_examples_offset :]
|
||||||
|
|||||||
@@ -8,8 +8,9 @@ from textattack.commands.augment import AUGMENTATION_RECIPE_NAMES
|
|||||||
logger = textattack.shared.logger
|
logger = textattack.shared.logger
|
||||||
|
|
||||||
|
|
||||||
def prepare_dataset_for_training(nlp_dataset):
|
def prepare_dataset_for_training(datasets_dataset):
|
||||||
"""Changes an `nlp` dataset into the proper format for tokenization."""
|
"""Changes an `datasets` dataset into the proper format for
|
||||||
|
tokenization."""
|
||||||
|
|
||||||
def prepare_example_dict(ex):
|
def prepare_example_dict(ex):
|
||||||
"""Returns the values in order corresponding to the data.
|
"""Returns the values in order corresponding to the data.
|
||||||
@@ -25,22 +26,22 @@ def prepare_dataset_for_training(nlp_dataset):
|
|||||||
return values[0]
|
return values[0]
|
||||||
return tuple(values)
|
return tuple(values)
|
||||||
|
|
||||||
text, outputs = zip(*((prepare_example_dict(x[0]), x[1]) for x in nlp_dataset))
|
text, outputs = zip(*((prepare_example_dict(x[0]), x[1]) for x in datasets_dataset))
|
||||||
return list(text), list(outputs)
|
return list(text), list(outputs)
|
||||||
|
|
||||||
|
|
||||||
def dataset_from_args(args):
|
def dataset_from_args(args):
|
||||||
"""Returns a tuple of ``HuggingFaceNlpDataset`` for the train and test
|
"""Returns a tuple of ``HuggingFaceDataset`` for the train and test
|
||||||
datasets for ``args.dataset``."""
|
datasets for ``args.dataset``."""
|
||||||
dataset_args = args.dataset.split(ARGS_SPLIT_TOKEN)
|
dataset_args = args.dataset.split(ARGS_SPLIT_TOKEN)
|
||||||
# TODO `HuggingFaceNlpDataset` -> `HuggingFaceDataset`
|
# TODO `HuggingFaceDataset` -> `HuggingFaceDataset`
|
||||||
if args.dataset_train_split:
|
if args.dataset_train_split:
|
||||||
train_dataset = textattack.datasets.HuggingFaceNlpDataset(
|
train_dataset = textattack.datasets.HuggingFaceDataset(
|
||||||
*dataset_args, split=args.dataset_train_split
|
*dataset_args, split=args.dataset_train_split
|
||||||
)
|
)
|
||||||
else:
|
else:
|
||||||
try:
|
try:
|
||||||
train_dataset = textattack.datasets.HuggingFaceNlpDataset(
|
train_dataset = textattack.datasets.HuggingFaceDataset(
|
||||||
*dataset_args, split="train"
|
*dataset_args, split="train"
|
||||||
)
|
)
|
||||||
args.dataset_train_split = "train"
|
args.dataset_train_split = "train"
|
||||||
@@ -49,31 +50,31 @@ def dataset_from_args(args):
|
|||||||
train_text, train_labels = prepare_dataset_for_training(train_dataset)
|
train_text, train_labels = prepare_dataset_for_training(train_dataset)
|
||||||
|
|
||||||
if args.dataset_dev_split:
|
if args.dataset_dev_split:
|
||||||
eval_dataset = textattack.datasets.HuggingFaceNlpDataset(
|
eval_dataset = textattack.datasets.HuggingFaceDataset(
|
||||||
*dataset_args, split=args.dataset_dev_split
|
*dataset_args, split=args.dataset_dev_split
|
||||||
)
|
)
|
||||||
else:
|
else:
|
||||||
# try common dev split names
|
# try common dev split names
|
||||||
try:
|
try:
|
||||||
eval_dataset = textattack.datasets.HuggingFaceNlpDataset(
|
eval_dataset = textattack.datasets.HuggingFaceDataset(
|
||||||
*dataset_args, split="dev"
|
*dataset_args, split="dev"
|
||||||
)
|
)
|
||||||
args.dataset_dev_split = "dev"
|
args.dataset_dev_split = "dev"
|
||||||
except KeyError:
|
except KeyError:
|
||||||
try:
|
try:
|
||||||
eval_dataset = textattack.datasets.HuggingFaceNlpDataset(
|
eval_dataset = textattack.datasets.HuggingFaceDataset(
|
||||||
*dataset_args, split="eval"
|
*dataset_args, split="eval"
|
||||||
)
|
)
|
||||||
args.dataset_dev_split = "eval"
|
args.dataset_dev_split = "eval"
|
||||||
except KeyError:
|
except KeyError:
|
||||||
try:
|
try:
|
||||||
eval_dataset = textattack.datasets.HuggingFaceNlpDataset(
|
eval_dataset = textattack.datasets.HuggingFaceDataset(
|
||||||
*dataset_args, split="validation"
|
*dataset_args, split="validation"
|
||||||
)
|
)
|
||||||
args.dataset_dev_split = "validation"
|
args.dataset_dev_split = "validation"
|
||||||
except KeyError:
|
except KeyError:
|
||||||
try:
|
try:
|
||||||
eval_dataset = textattack.datasets.HuggingFaceNlpDataset(
|
eval_dataset = textattack.datasets.HuggingFaceDataset(
|
||||||
*dataset_args, split="test"
|
*dataset_args, split="test"
|
||||||
)
|
)
|
||||||
args.dataset_dev_split = "test"
|
args.dataset_dev_split = "test"
|
||||||
@@ -189,7 +190,7 @@ def write_readme(args, best_eval_score, best_eval_score_epoch):
|
|||||||
## TextAttack Model Card
|
## TextAttack Model Card
|
||||||
|
|
||||||
This `{args.model}` model was fine-tuned for sequence classification using TextAttack
|
This `{args.model}` model was fine-tuned for sequence classification using TextAttack
|
||||||
and the {dataset_name} dataset loaded using the `nlp` library. The model was fine-tuned
|
and the {dataset_name} dataset loaded using the `datasets` library. The model was fine-tuned
|
||||||
for {args.num_train_epochs} epochs with a batch size of {args.batch_size}, a learning
|
for {args.num_train_epochs} epochs with a batch size of {args.batch_size}, a learning
|
||||||
rate of {args.learning_rate}, and a maximum sequence length of {args.max_length}.
|
rate of {args.learning_rate}, and a maximum sequence length of {args.max_length}.
|
||||||
Since this was a {task_name} task, the model was trained with a {loss_func} loss function.
|
Since this was a {task_name} task, the model was trained with a {loss_func} loss function.
|
||||||
|
|||||||
@@ -47,7 +47,7 @@ class TrainModelCommand(TextAttackCommand):
|
|||||||
required=True,
|
required=True,
|
||||||
default="yelp",
|
default="yelp",
|
||||||
help="dataset for training; will be loaded from "
|
help="dataset for training; will be loaded from "
|
||||||
"`nlp` library. if dataset has a subset, separate with a colon. "
|
"`datasets` library. if dataset has a subset, separate with a colon. "
|
||||||
" ex: `glue^sst2` or `rotten_tomatoes`",
|
" ex: `glue^sst2` or `rotten_tomatoes`",
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
from .dataset import TextAttackDataset
|
from .dataset import TextAttackDataset
|
||||||
from .huggingface_nlp_dataset import HuggingFaceNlpDataset
|
from .huggingface_dataset import HuggingFaceDataset
|
||||||
|
|
||||||
from . import translation
|
from . import translation
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
import collections
|
import collections
|
||||||
import random
|
import random
|
||||||
|
|
||||||
import nlp
|
import datasets
|
||||||
|
|
||||||
import textattack
|
import textattack
|
||||||
from textattack.datasets import TextAttackDataset
|
from textattack.datasets import TextAttackDataset
|
||||||
@@ -14,7 +14,7 @@ def _cb(s):
|
|||||||
return textattack.shared.utils.color_text(str(s), color="blue", method="ansi")
|
return textattack.shared.utils.color_text(str(s), color="blue", method="ansi")
|
||||||
|
|
||||||
|
|
||||||
def get_nlp_dataset_columns(dataset):
|
def get_datasets_dataset_columns(dataset):
|
||||||
schema = set(dataset.column_names)
|
schema = set(dataset.column_names)
|
||||||
if {"premise", "hypothesis", "label"} <= schema:
|
if {"premise", "hypothesis", "label"} <= schema:
|
||||||
input_columns = ("premise", "hypothesis")
|
input_columns = ("premise", "hypothesis")
|
||||||
@@ -54,15 +54,15 @@ def get_nlp_dataset_columns(dataset):
|
|||||||
return input_columns, output_column
|
return input_columns, output_column
|
||||||
|
|
||||||
|
|
||||||
class HuggingFaceNlpDataset(TextAttackDataset):
|
class HuggingFaceDataset(TextAttackDataset):
|
||||||
"""Loads a dataset from HuggingFace ``nlp`` and prepares it as a TextAttack
|
"""Loads a dataset from HuggingFace ``datasets`` and prepares it as a
|
||||||
dataset.
|
TextAttack dataset.
|
||||||
|
|
||||||
- name: the dataset name
|
- name: the dataset name
|
||||||
- subset: the subset of the main dataset. Dataset will be loaded as ``nlp.load_dataset(name, subset)``.
|
- subset: the subset of the main dataset. Dataset will be loaded as ``datasets.load_dataset(name, subset)``.
|
||||||
- label_map: Mapping if output labels should be re-mapped. Useful
|
- label_map: Mapping if output labels should be re-mapped. Useful
|
||||||
if model was trained with a different label arrangement than
|
if model was trained with a different label arrangement than
|
||||||
provided in the ``nlp`` version of the dataset.
|
provided in the ``datasets`` version of the dataset.
|
||||||
- output_scale_factor (float): Factor to divide ground-truth outputs by.
|
- output_scale_factor (float): Factor to divide ground-truth outputs by.
|
||||||
Generally, TextAttack goal functions require model outputs
|
Generally, TextAttack goal functions require model outputs
|
||||||
between 0 and 1. Some datasets test the model's correlation
|
between 0 and 1. Some datasets test the model's correlation
|
||||||
@@ -82,16 +82,16 @@ class HuggingFaceNlpDataset(TextAttackDataset):
|
|||||||
shuffle=False,
|
shuffle=False,
|
||||||
):
|
):
|
||||||
self._name = name
|
self._name = name
|
||||||
self._dataset = nlp.load_dataset(name, subset)[split]
|
self._dataset = datasets.load_dataset(name, subset)[split]
|
||||||
subset_print_str = f", subset {_cb(subset)}" if subset else ""
|
subset_print_str = f", subset {_cb(subset)}" if subset else ""
|
||||||
textattack.shared.logger.info(
|
textattack.shared.logger.info(
|
||||||
f"Loading {_cb('nlp')} dataset {_cb(name)}{subset_print_str}, split {_cb(split)}."
|
f"Loading {_cb('datasets')} dataset {_cb(name)}{subset_print_str}, split {_cb(split)}."
|
||||||
)
|
)
|
||||||
# Input/output column order, like (('premise', 'hypothesis'), 'label')
|
# Input/output column order, like (('premise', 'hypothesis'), 'label')
|
||||||
(
|
(
|
||||||
self.input_columns,
|
self.input_columns,
|
||||||
self.output_column,
|
self.output_column,
|
||||||
) = dataset_columns or get_nlp_dataset_columns(self._dataset)
|
) = dataset_columns or get_datasets_dataset_columns(self._dataset)
|
||||||
self._i = 0
|
self._i = 0
|
||||||
self.examples = list(self._dataset)
|
self.examples = list(self._dataset)
|
||||||
self.label_map = label_map
|
self.label_map = label_map
|
||||||
@@ -1,20 +1,20 @@
|
|||||||
import collections
|
import collections
|
||||||
|
|
||||||
import nlp
|
import datasets
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
from textattack.datasets import HuggingFaceNlpDataset
|
from textattack.datasets import HuggingFaceDataset
|
||||||
|
|
||||||
|
|
||||||
class TedMultiTranslationDataset(HuggingFaceNlpDataset):
|
class TedMultiTranslationDataset(HuggingFaceDataset):
|
||||||
"""Loads examples from the Ted Talk translation dataset using the `nlp`
|
"""Loads examples from the Ted Talk translation dataset using the
|
||||||
package.
|
`datasets` package.
|
||||||
|
|
||||||
dataset source: http://www.cs.jhu.edu/~kevinduh/a/multitarget-tedtalks/
|
dataset source: http://www.cs.jhu.edu/~kevinduh/a/multitarget-tedtalks/
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, source_lang="en", target_lang="de", split="test"):
|
def __init__(self, source_lang="en", target_lang="de", split="test"):
|
||||||
self._dataset = nlp.load_dataset("ted_multi")[split]
|
self._dataset = datasets.load_dataset("ted_multi")[split]
|
||||||
self.examples = self._dataset["translations"]
|
self.examples = self._dataset["translations"]
|
||||||
language_options = set(self.examples[0]["language"])
|
language_options = set(self.examples[0]["language"])
|
||||||
if source_lang not in language_options:
|
if source_lang not in language_options:
|
||||||
|
|||||||
@@ -19,26 +19,26 @@ All evaluations shown are on the full validation or test set up to 1000 examples
|
|||||||
<section>
|
<section>
|
||||||
|
|
||||||
- AG News (`lstm-ag-news`)
|
- AG News (`lstm-ag-news`)
|
||||||
- nlp dataset `ag_news`, split `test`
|
- `datasets` dataset `ag_news`, split `test`
|
||||||
- Successes: 914/1000
|
- Successes: 914/1000
|
||||||
- Accuracy: 91.4%
|
- Accuracy: 91.4%
|
||||||
- IMDB (`lstm-imdb`)
|
- IMDB (`lstm-imdb`)
|
||||||
- nlp dataset `imdb`, split `test`
|
- `datasets` dataset `imdb`, split `test`
|
||||||
- Successes: 883/1000
|
- Successes: 883/1000
|
||||||
- Accuracy: 88.30%
|
- Accuracy: 88.30%
|
||||||
- Movie Reviews [Rotten Tomatoes] (`lstm-mr`)
|
- Movie Reviews [Rotten Tomatoes] (`lstm-mr`)
|
||||||
- nlp dataset `rotten_tomatoes`, split `validation`
|
- `datasets` dataset `rotten_tomatoes`, split `validation`
|
||||||
- Successes: 807/1000
|
- Successes: 807/1000
|
||||||
- Accuracy: 80.70%
|
- Accuracy: 80.70%
|
||||||
- nlp dataset `rotten_tomatoes`, split `test`
|
- `datasets` dataset `rotten_tomatoes`, split `test`
|
||||||
- Successes: 781/1000
|
- Successes: 781/1000
|
||||||
- Accuracy: 78.10%
|
- Accuracy: 78.10%
|
||||||
- SST-2 (`lstm-sst2`)
|
- SST-2 (`lstm-sst2`)
|
||||||
- nlp dataset `glue`, subset `sst2`, split `validation`
|
- `datasets` dataset `glue`, subset `sst2`, split `validation`
|
||||||
- Successes: 737/872
|
- Successes: 737/872
|
||||||
- Accuracy: 84.52%
|
- Accuracy: 84.52%
|
||||||
- Yelp Polarity (`lstm-yelp`)
|
- Yelp Polarity (`lstm-yelp`)
|
||||||
- nlp dataset `yelp_polarity`, split `test`
|
- `datasets` dataset `yelp_polarity`, split `test`
|
||||||
- Successes: 922/1000
|
- Successes: 922/1000
|
||||||
- Accuracy: 92.20%
|
- Accuracy: 92.20%
|
||||||
|
|
||||||
@@ -50,26 +50,26 @@ All evaluations shown are on the full validation or test set up to 1000 examples
|
|||||||
|
|
||||||
|
|
||||||
- AG News (`cnn-ag-news`)
|
- AG News (`cnn-ag-news`)
|
||||||
- nlp dataset `ag_news`, split `test`
|
- `datasets` dataset `ag_news`, split `test`
|
||||||
- Successes: 910/1000
|
- Successes: 910/1000
|
||||||
- Accuracy: 91.00%
|
- Accuracy: 91.00%
|
||||||
- IMDB (`cnn-imdb`)
|
- IMDB (`cnn-imdb`)
|
||||||
- nlp dataset `imdb`, split `test`
|
- `datasets` dataset `imdb`, split `test`
|
||||||
- Successes: 863/1000
|
- Successes: 863/1000
|
||||||
- Accuracy: 86.30%
|
- Accuracy: 86.30%
|
||||||
- Movie Reviews [Rotten Tomatoes] (`cnn-mr`)
|
- Movie Reviews [Rotten Tomatoes] (`cnn-mr`)
|
||||||
- nlp dataset `rotten_tomatoes`, split `validation`
|
- `datasets` dataset `rotten_tomatoes`, split `validation`
|
||||||
- Successes: 794/1000
|
- Successes: 794/1000
|
||||||
- Accuracy: 79.40%
|
- Accuracy: 79.40%
|
||||||
- nlp dataset `rotten_tomatoes`, split `test`
|
- `datasets` dataset `rotten_tomatoes`, split `test`
|
||||||
- Successes: 768/1000
|
- Successes: 768/1000
|
||||||
- Accuracy: 76.80%
|
- Accuracy: 76.80%
|
||||||
- SST-2 (`cnn-sst2`)
|
- SST-2 (`cnn-sst2`)
|
||||||
- nlp dataset `glue`, subset `sst2`, split `validation`
|
- `datasets` dataset `glue`, subset `sst2`, split `validation`
|
||||||
- Successes: 721/872
|
- Successes: 721/872
|
||||||
- Accuracy: 82.68%
|
- Accuracy: 82.68%
|
||||||
- Yelp Polarity (`cnn-yelp`)
|
- Yelp Polarity (`cnn-yelp`)
|
||||||
- nlp dataset `yelp_polarity`, split `test`
|
- `datasets` dataset `yelp_polarity`, split `test`
|
||||||
- Successes: 913/1000
|
- Successes: 913/1000
|
||||||
- Accuracy: 91.30%
|
- Accuracy: 91.30%
|
||||||
|
|
||||||
@@ -81,50 +81,50 @@ All evaluations shown are on the full validation or test set up to 1000 examples
|
|||||||
<section>
|
<section>
|
||||||
|
|
||||||
- AG News (`albert-base-v2-ag-news`)
|
- AG News (`albert-base-v2-ag-news`)
|
||||||
- nlp dataset `ag_news`, split `test`
|
- `datasets` dataset `ag_news`, split `test`
|
||||||
- Successes: 943/1000
|
- Successes: 943/1000
|
||||||
- Accuracy: 94.30%
|
- Accuracy: 94.30%
|
||||||
- CoLA (`albert-base-v2-cola`)
|
- CoLA (`albert-base-v2-cola`)
|
||||||
- nlp dataset `glue`, subset `cola`, split `validation`
|
- `datasets` dataset `glue`, subset `cola`, split `validation`
|
||||||
- Successes: 829/1000
|
- Successes: 829/1000
|
||||||
- Accuracy: 82.90%
|
- Accuracy: 82.90%
|
||||||
- IMDB (`albert-base-v2-imdb`)
|
- IMDB (`albert-base-v2-imdb`)
|
||||||
- nlp dataset `imdb`, split `test`
|
- `datasets` dataset `imdb`, split `test`
|
||||||
- Successes: 913/1000
|
- Successes: 913/1000
|
||||||
- Accuracy: 91.30%
|
- Accuracy: 91.30%
|
||||||
- Movie Reviews [Rotten Tomatoes] (`albert-base-v2-mr`)
|
- Movie Reviews [Rotten Tomatoes] (`albert-base-v2-mr`)
|
||||||
- nlp dataset `rotten_tomatoes`, split `validation`
|
- `datasets` dataset `rotten_tomatoes`, split `validation`
|
||||||
- Successes: 882/1000
|
- Successes: 882/1000
|
||||||
- Accuracy: 88.20%
|
- Accuracy: 88.20%
|
||||||
- nlp dataset `rotten_tomatoes`, split `test`
|
- `datasets` dataset `rotten_tomatoes`, split `test`
|
||||||
- Successes: 851/1000
|
- Successes: 851/1000
|
||||||
- Accuracy: 85.10%
|
- Accuracy: 85.10%
|
||||||
- Quora Question Pairs (`albert-base-v2-qqp`)
|
- Quora Question Pairs (`albert-base-v2-qqp`)
|
||||||
- nlp dataset `glue`, subset `qqp`, split `validation`
|
- `datasets` dataset `glue`, subset `qqp`, split `validation`
|
||||||
- Successes: 914/1000
|
- Successes: 914/1000
|
||||||
- Accuracy: 91.40%
|
- Accuracy: 91.40%
|
||||||
- Recognizing Textual Entailment (`albert-base-v2-rte`)
|
- Recognizing Textual Entailment (`albert-base-v2-rte`)
|
||||||
- nlp dataset `glue`, subset `rte`, split `validation`
|
- `datasets` dataset `glue`, subset `rte`, split `validation`
|
||||||
- Successes: 211/277
|
- Successes: 211/277
|
||||||
- Accuracy: 76.17%
|
- Accuracy: 76.17%
|
||||||
- SNLI (`albert-base-v2-snli`)
|
- SNLI (`albert-base-v2-snli`)
|
||||||
- nlp dataset `snli`, split `test`
|
- `datasets` dataset `snli`, split `test`
|
||||||
- Successes: 883/1000
|
- Successes: 883/1000
|
||||||
- Accuracy: 88.30%
|
- Accuracy: 88.30%
|
||||||
- SST-2 (`albert-base-v2-sst2`)
|
- SST-2 (`albert-base-v2-sst2`)
|
||||||
- nlp dataset `glue`, subset `sst2`, split `validation`
|
- `datasets` dataset `glue`, subset `sst2`, split `validation`
|
||||||
- Successes: 807/872
|
- Successes: 807/872
|
||||||
- Accuracy: 92.55%)
|
- Accuracy: 92.55%)
|
||||||
- STS-b (`albert-base-v2-stsb`)
|
- STS-b (`albert-base-v2-stsb`)
|
||||||
- nlp dataset `glue`, subset `stsb`, split `validation`
|
- `datasets` dataset `glue`, subset `stsb`, split `validation`
|
||||||
- Pearson correlation: 0.9041359738552746
|
- Pearson correlation: 0.9041359738552746
|
||||||
- Spearman correlation: 0.8995912861209745
|
- Spearman correlation: 0.8995912861209745
|
||||||
- WNLI (`albert-base-v2-wnli`)
|
- WNLI (`albert-base-v2-wnli`)
|
||||||
- nlp dataset `glue`, subset `wnli`, split `validation`
|
- `datasets` dataset `glue`, subset `wnli`, split `validation`
|
||||||
- Successes: 42/71
|
- Successes: 42/71
|
||||||
- Accuracy: 59.15%
|
- Accuracy: 59.15%
|
||||||
- Yelp Polarity (`albert-base-v2-yelp`)
|
- Yelp Polarity (`albert-base-v2-yelp`)
|
||||||
- nlp dataset `yelp_polarity`, split `test`
|
- `datasets` dataset `yelp_polarity`, split `test`
|
||||||
- Successes: 963/1000
|
- Successes: 963/1000
|
||||||
- Accuracy: 96.30%
|
- Accuracy: 96.30%
|
||||||
|
|
||||||
@@ -135,62 +135,62 @@ All evaluations shown are on the full validation or test set up to 1000 examples
|
|||||||
<section>
|
<section>
|
||||||
|
|
||||||
- AG News (`bert-base-uncased-ag-news`)
|
- AG News (`bert-base-uncased-ag-news`)
|
||||||
- nlp dataset `ag_news`, split `test`
|
- `datasets` dataset `ag_news`, split `test`
|
||||||
- Successes: 942/1000
|
- Successes: 942/1000
|
||||||
- Accuracy: 94.20%
|
- Accuracy: 94.20%
|
||||||
- CoLA (`bert-base-uncased-cola`)
|
- CoLA (`bert-base-uncased-cola`)
|
||||||
- nlp dataset `glue`, subset `cola`, split `validation`
|
- `datasets` dataset `glue`, subset `cola`, split `validation`
|
||||||
- Successes: 812/1000
|
- Successes: 812/1000
|
||||||
- Accuracy: 81.20%
|
- Accuracy: 81.20%
|
||||||
- IMDB (`bert-base-uncased-imdb`)
|
- IMDB (`bert-base-uncased-imdb`)
|
||||||
- nlp dataset `imdb`, split `test`
|
- `datasets` dataset `imdb`, split `test`
|
||||||
- Successes: 919/1000
|
- Successes: 919/1000
|
||||||
- Accuracy: 91.90%
|
- Accuracy: 91.90%
|
||||||
- MNLI matched (`bert-base-uncased-mnli`)
|
- MNLI matched (`bert-base-uncased-mnli`)
|
||||||
- nlp dataset `glue`, subset `mnli`, split `validation_matched`
|
- `datasets` dataset `glue`, subset `mnli`, split `validation_matched`
|
||||||
- Successes: 840/1000
|
- Successes: 840/1000
|
||||||
- Accuracy: 84.00%
|
- Accuracy: 84.00%
|
||||||
- Movie Reviews [Rotten Tomatoes] (`bert-base-uncased-mr`)
|
- Movie Reviews [Rotten Tomatoes] (`bert-base-uncased-mr`)
|
||||||
- nlp dataset `rotten_tomatoes`, split `validation`
|
- `datasets` dataset `rotten_tomatoes`, split `validation`
|
||||||
- Successes: 876/1000
|
- Successes: 876/1000
|
||||||
- Accuracy: 87.60%
|
- Accuracy: 87.60%
|
||||||
- nlp dataset `rotten_tomatoes`, split `test`
|
- `datasets` dataset `rotten_tomatoes`, split `test`
|
||||||
- Successes: 838/1000
|
- Successes: 838/1000
|
||||||
- Accuracy: 83.80%
|
- Accuracy: 83.80%
|
||||||
- MRPC (`bert-base-uncased-mrpc`)
|
- MRPC (`bert-base-uncased-mrpc`)
|
||||||
- nlp dataset `glue`, subset `mrpc`, split `validation`
|
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
|
||||||
- Successes: 358/408
|
- Successes: 358/408
|
||||||
- Accuracy: 87.75%
|
- Accuracy: 87.75%
|
||||||
- QNLI (`bert-base-uncased-qnli`)
|
- QNLI (`bert-base-uncased-qnli`)
|
||||||
- nlp dataset `glue`, subset `qnli`, split `validation`
|
- `datasets` dataset `glue`, subset `qnli`, split `validation`
|
||||||
- Successes: 904/1000
|
- Successes: 904/1000
|
||||||
- Accuracy: 90.40%
|
- Accuracy: 90.40%
|
||||||
- Quora Question Pairs (`bert-base-uncased-qqp`)
|
- Quora Question Pairs (`bert-base-uncased-qqp`)
|
||||||
- nlp dataset `glue`, subset `qqp`, split `validation`
|
- `datasets` dataset `glue`, subset `qqp`, split `validation`
|
||||||
- Successes: 924/1000
|
- Successes: 924/1000
|
||||||
- Accuracy: 92.40%
|
- Accuracy: 92.40%
|
||||||
- Recognizing Textual Entailment (`bert-base-uncased-rte`)
|
- Recognizing Textual Entailment (`bert-base-uncased-rte`)
|
||||||
- nlp dataset `glue`, subset `rte`, split `validation`
|
- `datasets` dataset `glue`, subset `rte`, split `validation`
|
||||||
- Successes: 201/277
|
- Successes: 201/277
|
||||||
- Accuracy: 72.56%
|
- Accuracy: 72.56%
|
||||||
- SNLI (`bert-base-uncased-snli`)
|
- SNLI (`bert-base-uncased-snli`)
|
||||||
- nlp dataset `snli`, split `test`
|
- `datasets` dataset `snli`, split `test`
|
||||||
- Successes: 894/1000
|
- Successes: 894/1000
|
||||||
- Accuracy: 89.40%
|
- Accuracy: 89.40%
|
||||||
- SST-2 (`bert-base-uncased-sst2`)
|
- SST-2 (`bert-base-uncased-sst2`)
|
||||||
- nlp dataset `glue`, subset `sst2`, split `validation`
|
- `datasets` dataset `glue`, subset `sst2`, split `validation`
|
||||||
- Successes: 806/872
|
- Successes: 806/872
|
||||||
- Accuracy: 92.43%)
|
- Accuracy: 92.43%)
|
||||||
- STS-b (`bert-base-uncased-stsb`)
|
- STS-b (`bert-base-uncased-stsb`)
|
||||||
- nlp dataset `glue`, subset `stsb`, split `validation`
|
- `datasets` dataset `glue`, subset `stsb`, split `validation`
|
||||||
- Pearson correlation: 0.8775458937815515
|
- Pearson correlation: 0.8775458937815515
|
||||||
- Spearman correlation: 0.8773251339980935
|
- Spearman correlation: 0.8773251339980935
|
||||||
- WNLI (`bert-base-uncased-wnli`)
|
- WNLI (`bert-base-uncased-wnli`)
|
||||||
- nlp dataset `glue`, subset `wnli`, split `validation`
|
- `datasets` dataset `glue`, subset `wnli`, split `validation`
|
||||||
- Successes: 40/71
|
- Successes: 40/71
|
||||||
- Accuracy: 56.34%
|
- Accuracy: 56.34%
|
||||||
- Yelp Polarity (`bert-base-uncased-yelp`)
|
- Yelp Polarity (`bert-base-uncased-yelp`)
|
||||||
- nlp dataset `yelp_polarity`, split `test`
|
- `datasets` dataset `yelp_polarity`, split `test`
|
||||||
- Successes: 963/1000
|
- Successes: 963/1000
|
||||||
- Accuracy: 96.30%
|
- Accuracy: 96.30%
|
||||||
|
|
||||||
@@ -202,27 +202,27 @@ All evaluations shown are on the full validation or test set up to 1000 examples
|
|||||||
|
|
||||||
|
|
||||||
- CoLA (`distilbert-base-cased-cola`)
|
- CoLA (`distilbert-base-cased-cola`)
|
||||||
- nlp dataset `glue`, subset `cola`, split `validation`
|
- `datasets` dataset `glue`, subset `cola`, split `validation`
|
||||||
- Successes: 786/1000
|
- Successes: 786/1000
|
||||||
- Accuracy: 78.60%
|
- Accuracy: 78.60%
|
||||||
- MRPC (`distilbert-base-cased-mrpc`)
|
- MRPC (`distilbert-base-cased-mrpc`)
|
||||||
- nlp dataset `glue`, subset `mrpc`, split `validation`
|
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
|
||||||
- Successes: 320/408
|
- Successes: 320/408
|
||||||
- Accuracy: 78.43%
|
- Accuracy: 78.43%
|
||||||
- Quora Question Pairs (`distilbert-base-cased-qqp`)
|
- Quora Question Pairs (`distilbert-base-cased-qqp`)
|
||||||
- nlp dataset `glue`, subset `qqp`, split `validation`
|
- `datasets` dataset `glue`, subset `qqp`, split `validation`
|
||||||
- Successes: 908/1000
|
- Successes: 908/1000
|
||||||
- Accuracy: 90.80%
|
- Accuracy: 90.80%
|
||||||
- SNLI (`distilbert-base-cased-snli`)
|
- SNLI (`distilbert-base-cased-snli`)
|
||||||
- nlp dataset `snli`, split `test`
|
- `datasets` dataset `snli`, split `test`
|
||||||
- Successes: 861/1000
|
- Successes: 861/1000
|
||||||
- Accuracy: 86.10%
|
- Accuracy: 86.10%
|
||||||
- SST-2 (`distilbert-base-cased-sst2`)
|
- SST-2 (`distilbert-base-cased-sst2`)
|
||||||
- nlp dataset `glue`, subset `sst2`, split `validation`
|
- `datasets` dataset `glue`, subset `sst2`, split `validation`
|
||||||
- Successes: 785/872
|
- Successes: 785/872
|
||||||
- Accuracy: 90.02%)
|
- Accuracy: 90.02%)
|
||||||
- STS-b (`distilbert-base-cased-stsb`)
|
- STS-b (`distilbert-base-cased-stsb`)
|
||||||
- nlp dataset `glue`, subset `stsb`, split `validation`
|
- `datasets` dataset `glue`, subset `stsb`, split `validation`
|
||||||
- Pearson correlation: 0.8421540899520146
|
- Pearson correlation: 0.8421540899520146
|
||||||
- Spearman correlation: 0.8407155030382939
|
- Spearman correlation: 0.8407155030382939
|
||||||
|
|
||||||
@@ -233,39 +233,39 @@ All evaluations shown are on the full validation or test set up to 1000 examples
|
|||||||
<section>
|
<section>
|
||||||
|
|
||||||
- AG News (`distilbert-base-uncased-ag-news`)
|
- AG News (`distilbert-base-uncased-ag-news`)
|
||||||
- nlp dataset `ag_news`, split `test`
|
- `datasets` dataset `ag_news`, split `test`
|
||||||
- Successes: 944/1000
|
- Successes: 944/1000
|
||||||
- Accuracy: 94.40%
|
- Accuracy: 94.40%
|
||||||
- CoLA (`distilbert-base-uncased-cola`)
|
- CoLA (`distilbert-base-uncased-cola`)
|
||||||
- nlp dataset `glue`, subset `cola`, split `validation`
|
- `datasets` dataset `glue`, subset `cola`, split `validation`
|
||||||
- Successes: 786/1000
|
- Successes: 786/1000
|
||||||
- Accuracy: 78.60%
|
- Accuracy: 78.60%
|
||||||
- IMDB (`distilbert-base-uncased-imdb`)
|
- IMDB (`distilbert-base-uncased-imdb`)
|
||||||
- nlp dataset `imdb`, split `test`
|
- `datasets` dataset `imdb`, split `test`
|
||||||
- Successes: 903/1000
|
- Successes: 903/1000
|
||||||
- Accuracy: 90.30%
|
- Accuracy: 90.30%
|
||||||
- MNLI matched (`distilbert-base-uncased-mnli`)
|
- MNLI matched (`distilbert-base-uncased-mnli`)
|
||||||
- nlp dataset `glue`, subset `mnli`, split `validation_matched`
|
- `datasets` dataset `glue`, subset `mnli`, split `validation_matched`
|
||||||
- Successes: 817/1000
|
- Successes: 817/1000
|
||||||
- Accuracy: 81.70%
|
- Accuracy: 81.70%
|
||||||
- MRPC (`distilbert-base-uncased-mrpc`)
|
- MRPC (`distilbert-base-uncased-mrpc`)
|
||||||
- nlp dataset `glue`, subset `mrpc`, split `validation`
|
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
|
||||||
- Successes: 350/408
|
- Successes: 350/408
|
||||||
- Accuracy: 85.78%
|
- Accuracy: 85.78%
|
||||||
- QNLI (`distilbert-base-uncased-qnli`)
|
- QNLI (`distilbert-base-uncased-qnli`)
|
||||||
- nlp dataset `glue`, subset `qnli`, split `validation`
|
- `datasets` dataset `glue`, subset `qnli`, split `validation`
|
||||||
- Successes: 860/1000
|
- Successes: 860/1000
|
||||||
- Accuracy: 86.00%
|
- Accuracy: 86.00%
|
||||||
- Recognizing Textual Entailment (`distilbert-base-uncased-rte`)
|
- Recognizing Textual Entailment (`distilbert-base-uncased-rte`)
|
||||||
- nlp dataset `glue`, subset `rte`, split `validation`
|
- `datasets` dataset `glue`, subset `rte`, split `validation`
|
||||||
- Successes: 180/277
|
- Successes: 180/277
|
||||||
- Accuracy: 64.98%
|
- Accuracy: 64.98%
|
||||||
- STS-b (`distilbert-base-uncased-stsb`)
|
- STS-b (`distilbert-base-uncased-stsb`)
|
||||||
- nlp dataset `glue`, subset `stsb`, split `validation`
|
- `datasets` dataset `glue`, subset `stsb`, split `validation`
|
||||||
- Pearson correlation: 0.8421540899520146
|
- Pearson correlation: 0.8421540899520146
|
||||||
- Spearman correlation: 0.8407155030382939
|
- Spearman correlation: 0.8407155030382939
|
||||||
- WNLI (`distilbert-base-uncased-wnli`)
|
- WNLI (`distilbert-base-uncased-wnli`)
|
||||||
- nlp dataset `glue`, subset `wnli`, split `validation`
|
- `datasets` dataset `glue`, subset `wnli`, split `validation`
|
||||||
- Successes: 40/71
|
- Successes: 40/71
|
||||||
- Accuracy: 56.34%
|
- Accuracy: 56.34%
|
||||||
|
|
||||||
@@ -276,46 +276,46 @@ All evaluations shown are on the full validation or test set up to 1000 examples
|
|||||||
<section>
|
<section>
|
||||||
|
|
||||||
- AG News (`roberta-base-ag-news`)
|
- AG News (`roberta-base-ag-news`)
|
||||||
- nlp dataset `ag_news`, split `test`
|
- `datasets` dataset `ag_news`, split `test`
|
||||||
- Successes: 947/1000
|
- Successes: 947/1000
|
||||||
- Accuracy: 94.70%
|
- Accuracy: 94.70%
|
||||||
- CoLA (`roberta-base-cola`)
|
- CoLA (`roberta-base-cola`)
|
||||||
- nlp dataset `glue`, subset `cola`, split `validation`
|
- `datasets` dataset `glue`, subset `cola`, split `validation`
|
||||||
- Successes: 857/1000
|
- Successes: 857/1000
|
||||||
- Accuracy: 85.70%
|
- Accuracy: 85.70%
|
||||||
- IMDB (`roberta-base-imdb`)
|
- IMDB (`roberta-base-imdb`)
|
||||||
- nlp dataset `imdb`, split `test`
|
- `datasets` dataset `imdb`, split `test`
|
||||||
- Successes: 941/1000
|
- Successes: 941/1000
|
||||||
- Accuracy: 94.10%
|
- Accuracy: 94.10%
|
||||||
- Movie Reviews [Rotten Tomatoes] (`roberta-base-mr`)
|
- Movie Reviews [Rotten Tomatoes] (`roberta-base-mr`)
|
||||||
- nlp dataset `rotten_tomatoes`, split `validation`
|
- `datasets` dataset `rotten_tomatoes`, split `validation`
|
||||||
- Successes: 899/1000
|
- Successes: 899/1000
|
||||||
- Accuracy: 89.90%
|
- Accuracy: 89.90%
|
||||||
- nlp dataset `rotten_tomatoes`, split `test`
|
- `datasets` dataset `rotten_tomatoes`, split `test`
|
||||||
- Successes: 883/1000
|
- Successes: 883/1000
|
||||||
- Accuracy: 88.30%
|
- Accuracy: 88.30%
|
||||||
- MRPC (`roberta-base-mrpc`)
|
- MRPC (`roberta-base-mrpc`)
|
||||||
- nlp dataset `glue`, subset `mrpc`, split `validation`
|
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
|
||||||
- Successes: 371/408
|
- Successes: 371/408
|
||||||
- Accuracy: 91.18%
|
- Accuracy: 91.18%
|
||||||
- QNLI (`roberta-base-qnli`)
|
- QNLI (`roberta-base-qnli`)
|
||||||
- nlp dataset `glue`, subset `qnli`, split `validation`
|
- `datasets` dataset `glue`, subset `qnli`, split `validation`
|
||||||
- Successes: 917/1000
|
- Successes: 917/1000
|
||||||
- Accuracy: 91.70%
|
- Accuracy: 91.70%
|
||||||
- Recognizing Textual Entailment (`roberta-base-rte`)
|
- Recognizing Textual Entailment (`roberta-base-rte`)
|
||||||
- nlp dataset `glue`, subset `rte`, split `validation`
|
- `datasets` dataset `glue`, subset `rte`, split `validation`
|
||||||
- Successes: 217/277
|
- Successes: 217/277
|
||||||
- Accuracy: 78.34%
|
- Accuracy: 78.34%
|
||||||
- SST-2 (`roberta-base-sst2`)
|
- SST-2 (`roberta-base-sst2`)
|
||||||
- nlp dataset `glue`, subset `sst2`, split `validation`
|
- `datasets` dataset `glue`, subset `sst2`, split `validation`
|
||||||
- Successes: 820/872
|
- Successes: 820/872
|
||||||
- Accuracy: 94.04%)
|
- Accuracy: 94.04%)
|
||||||
- STS-b (`roberta-base-stsb`)
|
- STS-b (`roberta-base-stsb`)
|
||||||
- nlp dataset `glue`, subset `stsb`, split `validation`
|
- `datasets` dataset `glue`, subset `stsb`, split `validation`
|
||||||
- Pearson correlation: 0.906067852162708
|
- Pearson correlation: 0.906067852162708
|
||||||
- Spearman correlation: 0.9025045272903051
|
- Spearman correlation: 0.9025045272903051
|
||||||
- WNLI (`roberta-base-wnli`)
|
- WNLI (`roberta-base-wnli`)
|
||||||
- nlp dataset `glue`, subset `wnli`, split `validation`
|
- `datasets` dataset `glue`, subset `wnli`, split `validation`
|
||||||
- Successes: 40/71
|
- Successes: 40/71
|
||||||
- Accuracy: 56.34%
|
- Accuracy: 56.34%
|
||||||
|
|
||||||
@@ -326,34 +326,34 @@ All evaluations shown are on the full validation or test set up to 1000 examples
|
|||||||
<section>
|
<section>
|
||||||
|
|
||||||
- CoLA (`xlnet-base-cased-cola`)
|
- CoLA (`xlnet-base-cased-cola`)
|
||||||
- nlp dataset `glue`, subset `cola`, split `validation`
|
- `datasets` dataset `glue`, subset `cola`, split `validation`
|
||||||
- Successes: 800/1000
|
- Successes: 800/1000
|
||||||
- Accuracy: 80.00%
|
- Accuracy: 80.00%
|
||||||
- IMDB (`xlnet-base-cased-imdb`)
|
- IMDB (`xlnet-base-cased-imdb`)
|
||||||
- nlp dataset `imdb`, split `test`
|
- `datasets` dataset `imdb`, split `test`
|
||||||
- Successes: 957/1000
|
- Successes: 957/1000
|
||||||
- Accuracy: 95.70%
|
- Accuracy: 95.70%
|
||||||
- Movie Reviews [Rotten Tomatoes] (`xlnet-base-cased-mr`)
|
- Movie Reviews [Rotten Tomatoes] (`xlnet-base-cased-mr`)
|
||||||
- nlp dataset `rotten_tomatoes`, split `validation`
|
- `datasets` dataset `rotten_tomatoes`, split `validation`
|
||||||
- Successes: 908/1000
|
- Successes: 908/1000
|
||||||
- Accuracy: 90.80%
|
- Accuracy: 90.80%
|
||||||
- nlp dataset `rotten_tomatoes`, split `test`
|
- `datasets` dataset `rotten_tomatoes`, split `test`
|
||||||
- Successes: 876/1000
|
- Successes: 876/1000
|
||||||
- Accuracy: 87.60%
|
- Accuracy: 87.60%
|
||||||
- MRPC (`xlnet-base-cased-mrpc`)
|
- MRPC (`xlnet-base-cased-mrpc`)
|
||||||
- nlp dataset `glue`, subset `mrpc`, split `validation`
|
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
|
||||||
- Successes: 363/408
|
- Successes: 363/408
|
||||||
- Accuracy: 88.97%
|
- Accuracy: 88.97%
|
||||||
- Recognizing Textual Entailment (`xlnet-base-cased-rte`)
|
- Recognizing Textual Entailment (`xlnet-base-cased-rte`)
|
||||||
- nlp dataset `glue`, subset `rte`, split `validation`
|
- `datasets` dataset `glue`, subset `rte`, split `validation`
|
||||||
- Successes: 196/277
|
- Successes: 196/277
|
||||||
- Accuracy: 70.76%
|
- Accuracy: 70.76%
|
||||||
- STS-b (`xlnet-base-cased-stsb`)
|
- STS-b (`xlnet-base-cased-stsb`)
|
||||||
- nlp dataset `glue`, subset `stsb`, split `validation`
|
- `datasets` dataset `glue`, subset `stsb`, split `validation`
|
||||||
- Pearson correlation: 0.883111673280641
|
- Pearson correlation: 0.883111673280641
|
||||||
- Spearman correlation: 0.8773439961182335
|
- Spearman correlation: 0.8773439961182335
|
||||||
- WNLI (`xlnet-base-cased-wnli`)
|
- WNLI (`xlnet-base-cased-wnli`)
|
||||||
- nlp dataset `glue`, subset `wnli`, split `validation`
|
- `datasets` dataset `glue`, subset `wnli`, split `validation`
|
||||||
- Successes: 41/71
|
- Successes: 41/71
|
||||||
- Accuracy: 57.75%
|
- Accuracy: 57.75%
|
||||||
|
|
||||||
|
|||||||
@@ -122,7 +122,7 @@ def set_cache_dir(cache_dir):
|
|||||||
os.environ["TFHUB_CACHE_DIR"] = cache_dir
|
os.environ["TFHUB_CACHE_DIR"] = cache_dir
|
||||||
# HuggingFace `transformers` cache directory
|
# HuggingFace `transformers` cache directory
|
||||||
os.environ["PYTORCH_TRANSFORMERS_CACHE"] = cache_dir
|
os.environ["PYTORCH_TRANSFORMERS_CACHE"] = cache_dir
|
||||||
# HuggingFace `nlp` cache directory
|
# HuggingFace `datasets` cache directory
|
||||||
os.environ["HF_HOME"] = cache_dir
|
os.environ["HF_HOME"] = cache_dir
|
||||||
# Basic directory for Linux user-specific non-data files
|
# Basic directory for Linux user-specific non-data files
|
||||||
os.environ["XDG_CACHE_HOME"] = cache_dir
|
os.environ["XDG_CACHE_HOME"] = cache_dir
|
||||||
|
|||||||
Reference in New Issue
Block a user