1
0
mirror of https://github.com/QData/TextAttack.git synced 2021-10-13 00:05:06 +03:00

change all mentions of nlp to datasets

This commit is contained in:
Jin Yong Yoo
2020-09-28 22:37:21 -04:00
parent bbd1587e5f
commit 428b19a511
16 changed files with 129 additions and 128 deletions

View File

@@ -206,7 +206,7 @@ of a string or a list of strings. Here's an example of how to use the `Embedding
Our model training code is available via `textattack train` to help you train LSTMs, Our model training code is available via `textattack train` to help you train LSTMs,
CNNs, and `transformers` models using TextAttack out-of-the-box. Datasets are CNNs, and `transformers` models using TextAttack out-of-the-box. Datasets are
automatically loaded using the `nlp` package. automatically loaded using the `datasets` package.
#### Training Examples #### Training Examples
*Train our default LSTM for 50 epochs on the Yelp Polarity dataset:* *Train our default LSTM for 50 epochs on the Yelp Polarity dataset:*
@@ -227,7 +227,7 @@ textattack train --model bert-base-uncased --dataset glue^cola --batch-size 32 -
### `textattack peek-dataset` ### `textattack peek-dataset`
To take a closer look at a dataset, use `textattack peek-dataset`. TextAttack will print some cursory statistics about the inputs and outputs from the dataset. For example, `textattack peek-dataset --dataset-from-nlp snli` will show information about the SNLI dataset from the NLP package. To take a closer look at a dataset, use `textattack peek-dataset`. TextAttack will print some cursory statistics about the inputs and outputs from the dataset. For example, `textattack peek-dataset --dataset-from-datasets snli` will show information about the SNLI dataset from the NLP package.
### `textattack list` ### `textattack list`
@@ -261,18 +261,18 @@ Here's an example of using one of the built-in models (the SST-2 dataset is auto
textattack attack --model roberta-base-sst2 --recipe textfooler --num-examples 10 textattack attack --model roberta-base-sst2 --recipe textfooler --num-examples 10
``` ```
#### HuggingFace support: `transformers` models and `nlp` datasets #### HuggingFace support: `transformers` models and `datasets` datasets
We also provide built-in support for [`transformers` pretrained models](https://huggingface.co/models) We also provide built-in support for [`transformers` pretrained models](https://huggingface.co/models)
and datasets from the [`nlp` package](https://github.com/huggingface/nlp)! Here's an example of loading and datasets from the [`datasets` package](https://github.com/huggingface/datasets)! Here's an example of loading
and attacking a pre-trained model and dataset: and attacking a pre-trained model and dataset:
```bash ```bash
textattack attack --model-from-huggingface distilbert-base-uncased-finetuned-sst-2-english --dataset-from-nlp glue^sst2 --recipe deepwordbug --num-examples 10 textattack attack --model-from-huggingface distilbert-base-uncased-finetuned-sst-2-english --dataset-from-datasets glue^sst2 --recipe deepwordbug --num-examples 10
``` ```
You can explore other pre-trained models using the `--model-from-huggingface` argument, or other datasets by changing You can explore other pre-trained models using the `--model-from-huggingface` argument, or other datasets by changing
`--dataset-from-nlp`. `--dataset-from-datasets`.
#### Loading a model or dataset from a file #### Loading a model or dataset from a file

View File

@@ -73,7 +73,7 @@ textattack attack --model lstm-mr --num-examples 20 --search-method beam-search^
## Training Models with `textattack train` ## Training Models with `textattack train`
With textattack, you can train models on any classification or regression task With textattack, you can train models on any classification or regression task
from [`nlp`](https://github.com/huggingface/nlp/) using a single line. from [`datasets`](https://github.com/huggingface/datasets/) using a single line.
### Available Models ### Available Models
#### TextAttack Models #### TextAttack Models
@@ -131,6 +131,6 @@ whatever dataset you're working with. Whether you're loading a dataset of your
own from a file, or one from NLP, you can use `textattack peek-dataset` to own from a file, or one from NLP, you can use `textattack peek-dataset` to
see some basic information about the dataset. see some basic information about the dataset.
For example, use `textattack peek-dataset --dataset-from-nlp glue^mrpc` to see For example, use `textattack peek-dataset --dataset-from-datasets glue^mrpc` to see
information about the MRPC dataset (from the GLUE set of datasets). This will information about the MRPC dataset (from the GLUE set of datasets). This will
print statistics like the number of labels, average number of words, etc. print statistics like the number of labels, average number of words, etc.

View File

@@ -1,4 +1,4 @@
#!/bin/bash #!/bin/bash
# Shows how to attack a DistilBERT model fine-tuned on SST2 dataset *from the # Shows how to attack a DistilBERT model fine-tuned on SST2 dataset *from the
# huggingface model repository& using the DeepWordBug recipe and 10 examples. # huggingface model repository& using the DeepWordBug recipe and 10 examples.
textattack attack --model-from-huggingface distilbert-base-uncased-finetuned-sst-2-english --dataset-from-nlp glue^sst2 --recipe deepwordbug --num-examples 10 textattack attack --model-from-huggingface distilbert-base-uncased-finetuned-sst-2-english --dataset-from-datasets glue^sst2 --recipe deepwordbug --num-examples 10

View File

@@ -1,4 +1,4 @@
#!/bin/bash #!/bin/bash
# Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a basic # Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a basic
# demonstration of our training script and `nlp` integration. # demonstration of our training script and `datasets` integration.
textattack train --model lstm --dataset rotten_romatoes --batch-size 64 --epochs 50 --learning-rate 1e-5 textattack train --model lstm --dataset rotten_romatoes --batch-size 64 --epochs 50 --learning-rate 1e-5

View File

@@ -5,7 +5,7 @@ filelock
language_tool_python language_tool_python
lemminflect lemminflect
lru-dict lru-dict
nlp datasets
nltk nltk
numpy numpy
pandas>=1.0.1 pandas>=1.0.1

View File

@@ -43,10 +43,10 @@ attack_test_params = [
( (
"textattack attack --model-from-huggingface " "textattack attack --model-from-huggingface "
"distilbert-base-uncased-finetuned-sst-2-english " "distilbert-base-uncased-finetuned-sst-2-english "
"--dataset-from-nlp glue^sst2^train --recipe deepwordbug --num-examples 3 " "--dataset-from-datasets glue^sst2^train --recipe deepwordbug --num-examples 3 "
"--shuffle=False" "--shuffle=False"
), ),
"tests/sample_outputs/run_attack_transformers_nlp.txt", "tests/sample_outputs/run_attack_transformers_datasets.txt",
), ),
# #
# test running an attack by loading a model and dataset from file # test running an attack by loading a model and dataset from file
@@ -59,7 +59,7 @@ attack_test_params = [
"--dataset-from-file tests/sample_inputs/sst_model_and_dataset.py " "--dataset-from-file tests/sample_inputs/sst_model_and_dataset.py "
"--recipe deepwordbug --num-examples 3 --shuffle=False" "--recipe deepwordbug --num-examples 3 --shuffle=False"
), ),
"tests/sample_outputs/run_attack_transformers_nlp.txt", "tests/sample_outputs/run_attack_transformers_datasets.txt",
), ),
# #
# test hotflip on 10 samples from LSTM MR # test hotflip on 10 samples from LSTM MR

View File

@@ -4,7 +4,7 @@ import pytest
eval_test_params = [ eval_test_params = [
( (
"eval_model_hub_rt", "eval_model_hub_rt",
"textattack eval --model-from-huggingface textattack/distilbert-base-uncased-rotten-tomatoes --dataset-from-nlp rotten_tomatoes --num-examples 4", "textattack eval --model-from-huggingface textattack/distilbert-base-uncased-rotten-tomatoes --dataset-from-datasets rotten_tomatoes --num-examples 4",
"tests/sample_outputs/eval_model_hub_rt.txt", "tests/sample_outputs/eval_model_hub_rt.txt",
), ),
( (

View File

@@ -64,11 +64,11 @@ def add_dataset_args(parser):
""" """
dataset_group = parser.add_mutually_exclusive_group() dataset_group = parser.add_mutually_exclusive_group()
dataset_group.add_argument( dataset_group.add_argument(
"--dataset-from-nlp", "--dataset-from-datasets",
type=str, type=str,
required=False, required=False,
default=None, default=None,
help="Dataset to load from `nlp` repository.", help="Dataset to load from `datasets` repository.",
) )
dataset_group.add_argument( dataset_group.add_argument(
"--dataset-from-file", "--dataset-from-file",
@@ -349,7 +349,7 @@ def parse_dataset_from_args(args):
# Automatically detect dataset for huggingface & textattack models. # Automatically detect dataset for huggingface & textattack models.
# This allows us to use the --model shortcut without specifying a dataset. # This allows us to use the --model shortcut without specifying a dataset.
if args.model in HUGGINGFACE_DATASET_BY_MODEL: if args.model in HUGGINGFACE_DATASET_BY_MODEL:
_, args.dataset_from_nlp = HUGGINGFACE_DATASET_BY_MODEL[args.model] _, args.dataset_from_datasets = HUGGINGFACE_DATASET_BY_MODEL[args.model]
elif args.model in TEXTATTACK_DATASET_BY_MODEL: elif args.model in TEXTATTACK_DATASET_BY_MODEL:
_, dataset = TEXTATTACK_DATASET_BY_MODEL[args.model] _, dataset = TEXTATTACK_DATASET_BY_MODEL[args.model]
if dataset[0].startswith("textattack"): if dataset[0].startswith("textattack"):
@@ -358,7 +358,7 @@ def parse_dataset_from_args(args):
dataset = eval(f"{dataset[0]}")(*dataset[1:]) dataset = eval(f"{dataset[0]}")(*dataset[1:])
return dataset return dataset
else: else:
args.dataset_from_nlp = dataset args.dataset_from_datasets = dataset
# Automatically detect dataset for models trained with textattack. # Automatically detect dataset for models trained with textattack.
elif args.model and os.path.exists(args.model): elif args.model and os.path.exists(args.model):
model_args_json_path = os.path.join(args.model, "train_args.json") model_args_json_path = os.path.join(args.model, "train_args.json")
@@ -372,7 +372,7 @@ def parse_dataset_from_args(args):
name, subset = model_train_args["dataset"].split(ARGS_SPLIT_TOKEN) name, subset = model_train_args["dataset"].split(ARGS_SPLIT_TOKEN)
else: else:
name, subset = model_train_args["dataset"], None name, subset = model_train_args["dataset"], None
args.dataset_from_nlp = ( args.dataset_from_datasets = (
name, name,
subset, subset,
model_train_args["dataset_dev_split"], model_train_args["dataset_dev_split"],
@@ -403,14 +403,14 @@ def parse_dataset_from_args(args):
raise AttributeError( raise AttributeError(
f"``dataset`` not found in module {args.dataset_from_file}" f"``dataset`` not found in module {args.dataset_from_file}"
) )
elif args.dataset_from_nlp: elif args.dataset_from_datasets:
dataset_args = args.dataset_from_nlp dataset_args = args.dataset_from_datasets
if isinstance(dataset_args, str): if isinstance(dataset_args, str):
if ARGS_SPLIT_TOKEN in dataset_args: if ARGS_SPLIT_TOKEN in dataset_args:
dataset_args = dataset_args.split(ARGS_SPLIT_TOKEN) dataset_args = dataset_args.split(ARGS_SPLIT_TOKEN)
else: else:
dataset_args = (dataset_args,) dataset_args = (dataset_args,)
dataset = textattack.datasets.HuggingFaceNlpDataset( dataset = textattack.datasets.HuggingFaceDataset(
*dataset_args, shuffle=args.shuffle *dataset_args, shuffle=args.shuffle
) )
dataset.examples = dataset.examples[args.num_examples_offset :] dataset.examples = dataset.examples[args.num_examples_offset :]

View File

@@ -8,8 +8,9 @@ from textattack.commands.augment import AUGMENTATION_RECIPE_NAMES
logger = textattack.shared.logger logger = textattack.shared.logger
def prepare_dataset_for_training(nlp_dataset): def prepare_dataset_for_training(datasets_dataset):
"""Changes an `nlp` dataset into the proper format for tokenization.""" """Changes an `datasets` dataset into the proper format for
tokenization."""
def prepare_example_dict(ex): def prepare_example_dict(ex):
"""Returns the values in order corresponding to the data. """Returns the values in order corresponding to the data.
@@ -25,22 +26,22 @@ def prepare_dataset_for_training(nlp_dataset):
return values[0] return values[0]
return tuple(values) return tuple(values)
text, outputs = zip(*((prepare_example_dict(x[0]), x[1]) for x in nlp_dataset)) text, outputs = zip(*((prepare_example_dict(x[0]), x[1]) for x in datasets_dataset))
return list(text), list(outputs) return list(text), list(outputs)
def dataset_from_args(args): def dataset_from_args(args):
"""Returns a tuple of ``HuggingFaceNlpDataset`` for the train and test """Returns a tuple of ``HuggingFaceDataset`` for the train and test
datasets for ``args.dataset``.""" datasets for ``args.dataset``."""
dataset_args = args.dataset.split(ARGS_SPLIT_TOKEN) dataset_args = args.dataset.split(ARGS_SPLIT_TOKEN)
# TODO `HuggingFaceNlpDataset` -> `HuggingFaceDataset` # TODO `HuggingFaceDataset` -> `HuggingFaceDataset`
if args.dataset_train_split: if args.dataset_train_split:
train_dataset = textattack.datasets.HuggingFaceNlpDataset( train_dataset = textattack.datasets.HuggingFaceDataset(
*dataset_args, split=args.dataset_train_split *dataset_args, split=args.dataset_train_split
) )
else: else:
try: try:
train_dataset = textattack.datasets.HuggingFaceNlpDataset( train_dataset = textattack.datasets.HuggingFaceDataset(
*dataset_args, split="train" *dataset_args, split="train"
) )
args.dataset_train_split = "train" args.dataset_train_split = "train"
@@ -49,31 +50,31 @@ def dataset_from_args(args):
train_text, train_labels = prepare_dataset_for_training(train_dataset) train_text, train_labels = prepare_dataset_for_training(train_dataset)
if args.dataset_dev_split: if args.dataset_dev_split:
eval_dataset = textattack.datasets.HuggingFaceNlpDataset( eval_dataset = textattack.datasets.HuggingFaceDataset(
*dataset_args, split=args.dataset_dev_split *dataset_args, split=args.dataset_dev_split
) )
else: else:
# try common dev split names # try common dev split names
try: try:
eval_dataset = textattack.datasets.HuggingFaceNlpDataset( eval_dataset = textattack.datasets.HuggingFaceDataset(
*dataset_args, split="dev" *dataset_args, split="dev"
) )
args.dataset_dev_split = "dev" args.dataset_dev_split = "dev"
except KeyError: except KeyError:
try: try:
eval_dataset = textattack.datasets.HuggingFaceNlpDataset( eval_dataset = textattack.datasets.HuggingFaceDataset(
*dataset_args, split="eval" *dataset_args, split="eval"
) )
args.dataset_dev_split = "eval" args.dataset_dev_split = "eval"
except KeyError: except KeyError:
try: try:
eval_dataset = textattack.datasets.HuggingFaceNlpDataset( eval_dataset = textattack.datasets.HuggingFaceDataset(
*dataset_args, split="validation" *dataset_args, split="validation"
) )
args.dataset_dev_split = "validation" args.dataset_dev_split = "validation"
except KeyError: except KeyError:
try: try:
eval_dataset = textattack.datasets.HuggingFaceNlpDataset( eval_dataset = textattack.datasets.HuggingFaceDataset(
*dataset_args, split="test" *dataset_args, split="test"
) )
args.dataset_dev_split = "test" args.dataset_dev_split = "test"
@@ -189,7 +190,7 @@ def write_readme(args, best_eval_score, best_eval_score_epoch):
## TextAttack Model Card ## TextAttack Model Card
This `{args.model}` model was fine-tuned for sequence classification using TextAttack This `{args.model}` model was fine-tuned for sequence classification using TextAttack
and the {dataset_name} dataset loaded using the `nlp` library. The model was fine-tuned and the {dataset_name} dataset loaded using the `datasets` library. The model was fine-tuned
for {args.num_train_epochs} epochs with a batch size of {args.batch_size}, a learning for {args.num_train_epochs} epochs with a batch size of {args.batch_size}, a learning
rate of {args.learning_rate}, and a maximum sequence length of {args.max_length}. rate of {args.learning_rate}, and a maximum sequence length of {args.max_length}.
Since this was a {task_name} task, the model was trained with a {loss_func} loss function. Since this was a {task_name} task, the model was trained with a {loss_func} loss function.

View File

@@ -47,7 +47,7 @@ class TrainModelCommand(TextAttackCommand):
required=True, required=True,
default="yelp", default="yelp",
help="dataset for training; will be loaded from " help="dataset for training; will be loaded from "
"`nlp` library. if dataset has a subset, separate with a colon. " "`datasets` library. if dataset has a subset, separate with a colon. "
" ex: `glue^sst2` or `rotten_tomatoes`", " ex: `glue^sst2` or `rotten_tomatoes`",
) )
parser.add_argument( parser.add_argument(

View File

@@ -1,4 +1,4 @@
from .dataset import TextAttackDataset from .dataset import TextAttackDataset
from .huggingface_nlp_dataset import HuggingFaceNlpDataset from .huggingface_dataset import HuggingFaceDataset
from . import translation from . import translation

View File

@@ -1,7 +1,7 @@
import collections import collections
import random import random
import nlp import datasets
import textattack import textattack
from textattack.datasets import TextAttackDataset from textattack.datasets import TextAttackDataset
@@ -14,7 +14,7 @@ def _cb(s):
return textattack.shared.utils.color_text(str(s), color="blue", method="ansi") return textattack.shared.utils.color_text(str(s), color="blue", method="ansi")
def get_nlp_dataset_columns(dataset): def get_datasets_dataset_columns(dataset):
schema = set(dataset.column_names) schema = set(dataset.column_names)
if {"premise", "hypothesis", "label"} <= schema: if {"premise", "hypothesis", "label"} <= schema:
input_columns = ("premise", "hypothesis") input_columns = ("premise", "hypothesis")
@@ -54,15 +54,15 @@ def get_nlp_dataset_columns(dataset):
return input_columns, output_column return input_columns, output_column
class HuggingFaceNlpDataset(TextAttackDataset): class HuggingFaceDataset(TextAttackDataset):
"""Loads a dataset from HuggingFace ``nlp`` and prepares it as a TextAttack """Loads a dataset from HuggingFace ``datasets`` and prepares it as a
dataset. TextAttack dataset.
- name: the dataset name - name: the dataset name
- subset: the subset of the main dataset. Dataset will be loaded as ``nlp.load_dataset(name, subset)``. - subset: the subset of the main dataset. Dataset will be loaded as ``datasets.load_dataset(name, subset)``.
- label_map: Mapping if output labels should be re-mapped. Useful - label_map: Mapping if output labels should be re-mapped. Useful
if model was trained with a different label arrangement than if model was trained with a different label arrangement than
provided in the ``nlp`` version of the dataset. provided in the ``datasets`` version of the dataset.
- output_scale_factor (float): Factor to divide ground-truth outputs by. - output_scale_factor (float): Factor to divide ground-truth outputs by.
Generally, TextAttack goal functions require model outputs Generally, TextAttack goal functions require model outputs
between 0 and 1. Some datasets test the model's correlation between 0 and 1. Some datasets test the model's correlation
@@ -82,16 +82,16 @@ class HuggingFaceNlpDataset(TextAttackDataset):
shuffle=False, shuffle=False,
): ):
self._name = name self._name = name
self._dataset = nlp.load_dataset(name, subset)[split] self._dataset = datasets.load_dataset(name, subset)[split]
subset_print_str = f", subset {_cb(subset)}" if subset else "" subset_print_str = f", subset {_cb(subset)}" if subset else ""
textattack.shared.logger.info( textattack.shared.logger.info(
f"Loading {_cb('nlp')} dataset {_cb(name)}{subset_print_str}, split {_cb(split)}." f"Loading {_cb('datasets')} dataset {_cb(name)}{subset_print_str}, split {_cb(split)}."
) )
# Input/output column order, like (('premise', 'hypothesis'), 'label') # Input/output column order, like (('premise', 'hypothesis'), 'label')
( (
self.input_columns, self.input_columns,
self.output_column, self.output_column,
) = dataset_columns or get_nlp_dataset_columns(self._dataset) ) = dataset_columns or get_datasets_dataset_columns(self._dataset)
self._i = 0 self._i = 0
self.examples = list(self._dataset) self.examples = list(self._dataset)
self.label_map = label_map self.label_map = label_map

View File

@@ -1,20 +1,20 @@
import collections import collections
import nlp import datasets
import numpy as np import numpy as np
from textattack.datasets import HuggingFaceNlpDataset from textattack.datasets import HuggingFaceDataset
class TedMultiTranslationDataset(HuggingFaceNlpDataset): class TedMultiTranslationDataset(HuggingFaceDataset):
"""Loads examples from the Ted Talk translation dataset using the `nlp` """Loads examples from the Ted Talk translation dataset using the
package. `datasets` package.
dataset source: http://www.cs.jhu.edu/~kevinduh/a/multitarget-tedtalks/ dataset source: http://www.cs.jhu.edu/~kevinduh/a/multitarget-tedtalks/
""" """
def __init__(self, source_lang="en", target_lang="de", split="test"): def __init__(self, source_lang="en", target_lang="de", split="test"):
self._dataset = nlp.load_dataset("ted_multi")[split] self._dataset = datasets.load_dataset("ted_multi")[split]
self.examples = self._dataset["translations"] self.examples = self._dataset["translations"]
language_options = set(self.examples[0]["language"]) language_options = set(self.examples[0]["language"])
if source_lang not in language_options: if source_lang not in language_options:

View File

@@ -19,26 +19,26 @@ All evaluations shown are on the full validation or test set up to 1000 examples
<section> <section>
- AG News (`lstm-ag-news`) - AG News (`lstm-ag-news`)
- nlp dataset `ag_news`, split `test` - `datasets` dataset `ag_news`, split `test`
- Successes: 914/1000 - Successes: 914/1000
- Accuracy: 91.4% - Accuracy: 91.4%
- IMDB (`lstm-imdb`) - IMDB (`lstm-imdb`)
- nlp dataset `imdb`, split `test` - `datasets` dataset `imdb`, split `test`
- Successes: 883/1000 - Successes: 883/1000
- Accuracy: 88.30% - Accuracy: 88.30%
- Movie Reviews [Rotten Tomatoes] (`lstm-mr`) - Movie Reviews [Rotten Tomatoes] (`lstm-mr`)
- nlp dataset `rotten_tomatoes`, split `validation` - `datasets` dataset `rotten_tomatoes`, split `validation`
- Successes: 807/1000 - Successes: 807/1000
- Accuracy: 80.70% - Accuracy: 80.70%
- nlp dataset `rotten_tomatoes`, split `test` - `datasets` dataset `rotten_tomatoes`, split `test`
- Successes: 781/1000 - Successes: 781/1000
- Accuracy: 78.10% - Accuracy: 78.10%
- SST-2 (`lstm-sst2`) - SST-2 (`lstm-sst2`)
- nlp dataset `glue`, subset `sst2`, split `validation` - `datasets` dataset `glue`, subset `sst2`, split `validation`
- Successes: 737/872 - Successes: 737/872
- Accuracy: 84.52% - Accuracy: 84.52%
- Yelp Polarity (`lstm-yelp`) - Yelp Polarity (`lstm-yelp`)
- nlp dataset `yelp_polarity`, split `test` - `datasets` dataset `yelp_polarity`, split `test`
- Successes: 922/1000 - Successes: 922/1000
- Accuracy: 92.20% - Accuracy: 92.20%
@@ -50,26 +50,26 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- AG News (`cnn-ag-news`) - AG News (`cnn-ag-news`)
- nlp dataset `ag_news`, split `test` - `datasets` dataset `ag_news`, split `test`
- Successes: 910/1000 - Successes: 910/1000
- Accuracy: 91.00% - Accuracy: 91.00%
- IMDB (`cnn-imdb`) - IMDB (`cnn-imdb`)
- nlp dataset `imdb`, split `test` - `datasets` dataset `imdb`, split `test`
- Successes: 863/1000 - Successes: 863/1000
- Accuracy: 86.30% - Accuracy: 86.30%
- Movie Reviews [Rotten Tomatoes] (`cnn-mr`) - Movie Reviews [Rotten Tomatoes] (`cnn-mr`)
- nlp dataset `rotten_tomatoes`, split `validation` - `datasets` dataset `rotten_tomatoes`, split `validation`
- Successes: 794/1000 - Successes: 794/1000
- Accuracy: 79.40% - Accuracy: 79.40%
- nlp dataset `rotten_tomatoes`, split `test` - `datasets` dataset `rotten_tomatoes`, split `test`
- Successes: 768/1000 - Successes: 768/1000
- Accuracy: 76.80% - Accuracy: 76.80%
- SST-2 (`cnn-sst2`) - SST-2 (`cnn-sst2`)
- nlp dataset `glue`, subset `sst2`, split `validation` - `datasets` dataset `glue`, subset `sst2`, split `validation`
- Successes: 721/872 - Successes: 721/872
- Accuracy: 82.68% - Accuracy: 82.68%
- Yelp Polarity (`cnn-yelp`) - Yelp Polarity (`cnn-yelp`)
- nlp dataset `yelp_polarity`, split `test` - `datasets` dataset `yelp_polarity`, split `test`
- Successes: 913/1000 - Successes: 913/1000
- Accuracy: 91.30% - Accuracy: 91.30%
@@ -81,50 +81,50 @@ All evaluations shown are on the full validation or test set up to 1000 examples
<section> <section>
- AG News (`albert-base-v2-ag-news`) - AG News (`albert-base-v2-ag-news`)
- nlp dataset `ag_news`, split `test` - `datasets` dataset `ag_news`, split `test`
- Successes: 943/1000 - Successes: 943/1000
- Accuracy: 94.30% - Accuracy: 94.30%
- CoLA (`albert-base-v2-cola`) - CoLA (`albert-base-v2-cola`)
- nlp dataset `glue`, subset `cola`, split `validation` - `datasets` dataset `glue`, subset `cola`, split `validation`
- Successes: 829/1000 - Successes: 829/1000
- Accuracy: 82.90% - Accuracy: 82.90%
- IMDB (`albert-base-v2-imdb`) - IMDB (`albert-base-v2-imdb`)
- nlp dataset `imdb`, split `test` - `datasets` dataset `imdb`, split `test`
- Successes: 913/1000 - Successes: 913/1000
- Accuracy: 91.30% - Accuracy: 91.30%
- Movie Reviews [Rotten Tomatoes] (`albert-base-v2-mr`) - Movie Reviews [Rotten Tomatoes] (`albert-base-v2-mr`)
- nlp dataset `rotten_tomatoes`, split `validation` - `datasets` dataset `rotten_tomatoes`, split `validation`
- Successes: 882/1000 - Successes: 882/1000
- Accuracy: 88.20% - Accuracy: 88.20%
- nlp dataset `rotten_tomatoes`, split `test` - `datasets` dataset `rotten_tomatoes`, split `test`
- Successes: 851/1000 - Successes: 851/1000
- Accuracy: 85.10% - Accuracy: 85.10%
- Quora Question Pairs (`albert-base-v2-qqp`) - Quora Question Pairs (`albert-base-v2-qqp`)
- nlp dataset `glue`, subset `qqp`, split `validation` - `datasets` dataset `glue`, subset `qqp`, split `validation`
- Successes: 914/1000 - Successes: 914/1000
- Accuracy: 91.40% - Accuracy: 91.40%
- Recognizing Textual Entailment (`albert-base-v2-rte`) - Recognizing Textual Entailment (`albert-base-v2-rte`)
- nlp dataset `glue`, subset `rte`, split `validation` - `datasets` dataset `glue`, subset `rte`, split `validation`
- Successes: 211/277 - Successes: 211/277
- Accuracy: 76.17% - Accuracy: 76.17%
- SNLI (`albert-base-v2-snli`) - SNLI (`albert-base-v2-snli`)
- nlp dataset `snli`, split `test` - `datasets` dataset `snli`, split `test`
- Successes: 883/1000 - Successes: 883/1000
- Accuracy: 88.30% - Accuracy: 88.30%
- SST-2 (`albert-base-v2-sst2`) - SST-2 (`albert-base-v2-sst2`)
- nlp dataset `glue`, subset `sst2`, split `validation` - `datasets` dataset `glue`, subset `sst2`, split `validation`
- Successes: 807/872 - Successes: 807/872
- Accuracy: 92.55%) - Accuracy: 92.55%)
- STS-b (`albert-base-v2-stsb`) - STS-b (`albert-base-v2-stsb`)
- nlp dataset `glue`, subset `stsb`, split `validation` - `datasets` dataset `glue`, subset `stsb`, split `validation`
- Pearson correlation: 0.9041359738552746 - Pearson correlation: 0.9041359738552746
- Spearman correlation: 0.8995912861209745 - Spearman correlation: 0.8995912861209745
- WNLI (`albert-base-v2-wnli`) - WNLI (`albert-base-v2-wnli`)
- nlp dataset `glue`, subset `wnli`, split `validation` - `datasets` dataset `glue`, subset `wnli`, split `validation`
- Successes: 42/71 - Successes: 42/71
- Accuracy: 59.15% - Accuracy: 59.15%
- Yelp Polarity (`albert-base-v2-yelp`) - Yelp Polarity (`albert-base-v2-yelp`)
- nlp dataset `yelp_polarity`, split `test` - `datasets` dataset `yelp_polarity`, split `test`
- Successes: 963/1000 - Successes: 963/1000
- Accuracy: 96.30% - Accuracy: 96.30%
@@ -135,62 +135,62 @@ All evaluations shown are on the full validation or test set up to 1000 examples
<section> <section>
- AG News (`bert-base-uncased-ag-news`) - AG News (`bert-base-uncased-ag-news`)
- nlp dataset `ag_news`, split `test` - `datasets` dataset `ag_news`, split `test`
- Successes: 942/1000 - Successes: 942/1000
- Accuracy: 94.20% - Accuracy: 94.20%
- CoLA (`bert-base-uncased-cola`) - CoLA (`bert-base-uncased-cola`)
- nlp dataset `glue`, subset `cola`, split `validation` - `datasets` dataset `glue`, subset `cola`, split `validation`
- Successes: 812/1000 - Successes: 812/1000
- Accuracy: 81.20% - Accuracy: 81.20%
- IMDB (`bert-base-uncased-imdb`) - IMDB (`bert-base-uncased-imdb`)
- nlp dataset `imdb`, split `test` - `datasets` dataset `imdb`, split `test`
- Successes: 919/1000 - Successes: 919/1000
- Accuracy: 91.90% - Accuracy: 91.90%
- MNLI matched (`bert-base-uncased-mnli`) - MNLI matched (`bert-base-uncased-mnli`)
- nlp dataset `glue`, subset `mnli`, split `validation_matched` - `datasets` dataset `glue`, subset `mnli`, split `validation_matched`
- Successes: 840/1000 - Successes: 840/1000
- Accuracy: 84.00% - Accuracy: 84.00%
- Movie Reviews [Rotten Tomatoes] (`bert-base-uncased-mr`) - Movie Reviews [Rotten Tomatoes] (`bert-base-uncased-mr`)
- nlp dataset `rotten_tomatoes`, split `validation` - `datasets` dataset `rotten_tomatoes`, split `validation`
- Successes: 876/1000 - Successes: 876/1000
- Accuracy: 87.60% - Accuracy: 87.60%
- nlp dataset `rotten_tomatoes`, split `test` - `datasets` dataset `rotten_tomatoes`, split `test`
- Successes: 838/1000 - Successes: 838/1000
- Accuracy: 83.80% - Accuracy: 83.80%
- MRPC (`bert-base-uncased-mrpc`) - MRPC (`bert-base-uncased-mrpc`)
- nlp dataset `glue`, subset `mrpc`, split `validation` - `datasets` dataset `glue`, subset `mrpc`, split `validation`
- Successes: 358/408 - Successes: 358/408
- Accuracy: 87.75% - Accuracy: 87.75%
- QNLI (`bert-base-uncased-qnli`) - QNLI (`bert-base-uncased-qnli`)
- nlp dataset `glue`, subset `qnli`, split `validation` - `datasets` dataset `glue`, subset `qnli`, split `validation`
- Successes: 904/1000 - Successes: 904/1000
- Accuracy: 90.40% - Accuracy: 90.40%
- Quora Question Pairs (`bert-base-uncased-qqp`) - Quora Question Pairs (`bert-base-uncased-qqp`)
- nlp dataset `glue`, subset `qqp`, split `validation` - `datasets` dataset `glue`, subset `qqp`, split `validation`
- Successes: 924/1000 - Successes: 924/1000
- Accuracy: 92.40% - Accuracy: 92.40%
- Recognizing Textual Entailment (`bert-base-uncased-rte`) - Recognizing Textual Entailment (`bert-base-uncased-rte`)
- nlp dataset `glue`, subset `rte`, split `validation` - `datasets` dataset `glue`, subset `rte`, split `validation`
- Successes: 201/277 - Successes: 201/277
- Accuracy: 72.56% - Accuracy: 72.56%
- SNLI (`bert-base-uncased-snli`) - SNLI (`bert-base-uncased-snli`)
- nlp dataset `snli`, split `test` - `datasets` dataset `snli`, split `test`
- Successes: 894/1000 - Successes: 894/1000
- Accuracy: 89.40% - Accuracy: 89.40%
- SST-2 (`bert-base-uncased-sst2`) - SST-2 (`bert-base-uncased-sst2`)
- nlp dataset `glue`, subset `sst2`, split `validation` - `datasets` dataset `glue`, subset `sst2`, split `validation`
- Successes: 806/872 - Successes: 806/872
- Accuracy: 92.43%) - Accuracy: 92.43%)
- STS-b (`bert-base-uncased-stsb`) - STS-b (`bert-base-uncased-stsb`)
- nlp dataset `glue`, subset `stsb`, split `validation` - `datasets` dataset `glue`, subset `stsb`, split `validation`
- Pearson correlation: 0.8775458937815515 - Pearson correlation: 0.8775458937815515
- Spearman correlation: 0.8773251339980935 - Spearman correlation: 0.8773251339980935
- WNLI (`bert-base-uncased-wnli`) - WNLI (`bert-base-uncased-wnli`)
- nlp dataset `glue`, subset `wnli`, split `validation` - `datasets` dataset `glue`, subset `wnli`, split `validation`
- Successes: 40/71 - Successes: 40/71
- Accuracy: 56.34% - Accuracy: 56.34%
- Yelp Polarity (`bert-base-uncased-yelp`) - Yelp Polarity (`bert-base-uncased-yelp`)
- nlp dataset `yelp_polarity`, split `test` - `datasets` dataset `yelp_polarity`, split `test`
- Successes: 963/1000 - Successes: 963/1000
- Accuracy: 96.30% - Accuracy: 96.30%
@@ -202,27 +202,27 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- CoLA (`distilbert-base-cased-cola`) - CoLA (`distilbert-base-cased-cola`)
- nlp dataset `glue`, subset `cola`, split `validation` - `datasets` dataset `glue`, subset `cola`, split `validation`
- Successes: 786/1000 - Successes: 786/1000
- Accuracy: 78.60% - Accuracy: 78.60%
- MRPC (`distilbert-base-cased-mrpc`) - MRPC (`distilbert-base-cased-mrpc`)
- nlp dataset `glue`, subset `mrpc`, split `validation` - `datasets` dataset `glue`, subset `mrpc`, split `validation`
- Successes: 320/408 - Successes: 320/408
- Accuracy: 78.43% - Accuracy: 78.43%
- Quora Question Pairs (`distilbert-base-cased-qqp`) - Quora Question Pairs (`distilbert-base-cased-qqp`)
- nlp dataset `glue`, subset `qqp`, split `validation` - `datasets` dataset `glue`, subset `qqp`, split `validation`
- Successes: 908/1000 - Successes: 908/1000
- Accuracy: 90.80% - Accuracy: 90.80%
- SNLI (`distilbert-base-cased-snli`) - SNLI (`distilbert-base-cased-snli`)
- nlp dataset `snli`, split `test` - `datasets` dataset `snli`, split `test`
- Successes: 861/1000 - Successes: 861/1000
- Accuracy: 86.10% - Accuracy: 86.10%
- SST-2 (`distilbert-base-cased-sst2`) - SST-2 (`distilbert-base-cased-sst2`)
- nlp dataset `glue`, subset `sst2`, split `validation` - `datasets` dataset `glue`, subset `sst2`, split `validation`
- Successes: 785/872 - Successes: 785/872
- Accuracy: 90.02%) - Accuracy: 90.02%)
- STS-b (`distilbert-base-cased-stsb`) - STS-b (`distilbert-base-cased-stsb`)
- nlp dataset `glue`, subset `stsb`, split `validation` - `datasets` dataset `glue`, subset `stsb`, split `validation`
- Pearson correlation: 0.8421540899520146 - Pearson correlation: 0.8421540899520146
- Spearman correlation: 0.8407155030382939 - Spearman correlation: 0.8407155030382939
@@ -233,39 +233,39 @@ All evaluations shown are on the full validation or test set up to 1000 examples
<section> <section>
- AG News (`distilbert-base-uncased-ag-news`) - AG News (`distilbert-base-uncased-ag-news`)
- nlp dataset `ag_news`, split `test` - `datasets` dataset `ag_news`, split `test`
- Successes: 944/1000 - Successes: 944/1000
- Accuracy: 94.40% - Accuracy: 94.40%
- CoLA (`distilbert-base-uncased-cola`) - CoLA (`distilbert-base-uncased-cola`)
- nlp dataset `glue`, subset `cola`, split `validation` - `datasets` dataset `glue`, subset `cola`, split `validation`
- Successes: 786/1000 - Successes: 786/1000
- Accuracy: 78.60% - Accuracy: 78.60%
- IMDB (`distilbert-base-uncased-imdb`) - IMDB (`distilbert-base-uncased-imdb`)
- nlp dataset `imdb`, split `test` - `datasets` dataset `imdb`, split `test`
- Successes: 903/1000 - Successes: 903/1000
- Accuracy: 90.30% - Accuracy: 90.30%
- MNLI matched (`distilbert-base-uncased-mnli`) - MNLI matched (`distilbert-base-uncased-mnli`)
- nlp dataset `glue`, subset `mnli`, split `validation_matched` - `datasets` dataset `glue`, subset `mnli`, split `validation_matched`
- Successes: 817/1000 - Successes: 817/1000
- Accuracy: 81.70% - Accuracy: 81.70%
- MRPC (`distilbert-base-uncased-mrpc`) - MRPC (`distilbert-base-uncased-mrpc`)
- nlp dataset `glue`, subset `mrpc`, split `validation` - `datasets` dataset `glue`, subset `mrpc`, split `validation`
- Successes: 350/408 - Successes: 350/408
- Accuracy: 85.78% - Accuracy: 85.78%
- QNLI (`distilbert-base-uncased-qnli`) - QNLI (`distilbert-base-uncased-qnli`)
- nlp dataset `glue`, subset `qnli`, split `validation` - `datasets` dataset `glue`, subset `qnli`, split `validation`
- Successes: 860/1000 - Successes: 860/1000
- Accuracy: 86.00% - Accuracy: 86.00%
- Recognizing Textual Entailment (`distilbert-base-uncased-rte`) - Recognizing Textual Entailment (`distilbert-base-uncased-rte`)
- nlp dataset `glue`, subset `rte`, split `validation` - `datasets` dataset `glue`, subset `rte`, split `validation`
- Successes: 180/277 - Successes: 180/277
- Accuracy: 64.98% - Accuracy: 64.98%
- STS-b (`distilbert-base-uncased-stsb`) - STS-b (`distilbert-base-uncased-stsb`)
- nlp dataset `glue`, subset `stsb`, split `validation` - `datasets` dataset `glue`, subset `stsb`, split `validation`
- Pearson correlation: 0.8421540899520146 - Pearson correlation: 0.8421540899520146
- Spearman correlation: 0.8407155030382939 - Spearman correlation: 0.8407155030382939
- WNLI (`distilbert-base-uncased-wnli`) - WNLI (`distilbert-base-uncased-wnli`)
- nlp dataset `glue`, subset `wnli`, split `validation` - `datasets` dataset `glue`, subset `wnli`, split `validation`
- Successes: 40/71 - Successes: 40/71
- Accuracy: 56.34% - Accuracy: 56.34%
@@ -276,46 +276,46 @@ All evaluations shown are on the full validation or test set up to 1000 examples
<section> <section>
- AG News (`roberta-base-ag-news`) - AG News (`roberta-base-ag-news`)
- nlp dataset `ag_news`, split `test` - `datasets` dataset `ag_news`, split `test`
- Successes: 947/1000 - Successes: 947/1000
- Accuracy: 94.70% - Accuracy: 94.70%
- CoLA (`roberta-base-cola`) - CoLA (`roberta-base-cola`)
- nlp dataset `glue`, subset `cola`, split `validation` - `datasets` dataset `glue`, subset `cola`, split `validation`
- Successes: 857/1000 - Successes: 857/1000
- Accuracy: 85.70% - Accuracy: 85.70%
- IMDB (`roberta-base-imdb`) - IMDB (`roberta-base-imdb`)
- nlp dataset `imdb`, split `test` - `datasets` dataset `imdb`, split `test`
- Successes: 941/1000 - Successes: 941/1000
- Accuracy: 94.10% - Accuracy: 94.10%
- Movie Reviews [Rotten Tomatoes] (`roberta-base-mr`) - Movie Reviews [Rotten Tomatoes] (`roberta-base-mr`)
- nlp dataset `rotten_tomatoes`, split `validation` - `datasets` dataset `rotten_tomatoes`, split `validation`
- Successes: 899/1000 - Successes: 899/1000
- Accuracy: 89.90% - Accuracy: 89.90%
- nlp dataset `rotten_tomatoes`, split `test` - `datasets` dataset `rotten_tomatoes`, split `test`
- Successes: 883/1000 - Successes: 883/1000
- Accuracy: 88.30% - Accuracy: 88.30%
- MRPC (`roberta-base-mrpc`) - MRPC (`roberta-base-mrpc`)
- nlp dataset `glue`, subset `mrpc`, split `validation` - `datasets` dataset `glue`, subset `mrpc`, split `validation`
- Successes: 371/408 - Successes: 371/408
- Accuracy: 91.18% - Accuracy: 91.18%
- QNLI (`roberta-base-qnli`) - QNLI (`roberta-base-qnli`)
- nlp dataset `glue`, subset `qnli`, split `validation` - `datasets` dataset `glue`, subset `qnli`, split `validation`
- Successes: 917/1000 - Successes: 917/1000
- Accuracy: 91.70% - Accuracy: 91.70%
- Recognizing Textual Entailment (`roberta-base-rte`) - Recognizing Textual Entailment (`roberta-base-rte`)
- nlp dataset `glue`, subset `rte`, split `validation` - `datasets` dataset `glue`, subset `rte`, split `validation`
- Successes: 217/277 - Successes: 217/277
- Accuracy: 78.34% - Accuracy: 78.34%
- SST-2 (`roberta-base-sst2`) - SST-2 (`roberta-base-sst2`)
- nlp dataset `glue`, subset `sst2`, split `validation` - `datasets` dataset `glue`, subset `sst2`, split `validation`
- Successes: 820/872 - Successes: 820/872
- Accuracy: 94.04%) - Accuracy: 94.04%)
- STS-b (`roberta-base-stsb`) - STS-b (`roberta-base-stsb`)
- nlp dataset `glue`, subset `stsb`, split `validation` - `datasets` dataset `glue`, subset `stsb`, split `validation`
- Pearson correlation: 0.906067852162708 - Pearson correlation: 0.906067852162708
- Spearman correlation: 0.9025045272903051 - Spearman correlation: 0.9025045272903051
- WNLI (`roberta-base-wnli`) - WNLI (`roberta-base-wnli`)
- nlp dataset `glue`, subset `wnli`, split `validation` - `datasets` dataset `glue`, subset `wnli`, split `validation`
- Successes: 40/71 - Successes: 40/71
- Accuracy: 56.34% - Accuracy: 56.34%
@@ -326,34 +326,34 @@ All evaluations shown are on the full validation or test set up to 1000 examples
<section> <section>
- CoLA (`xlnet-base-cased-cola`) - CoLA (`xlnet-base-cased-cola`)
- nlp dataset `glue`, subset `cola`, split `validation` - `datasets` dataset `glue`, subset `cola`, split `validation`
- Successes: 800/1000 - Successes: 800/1000
- Accuracy: 80.00% - Accuracy: 80.00%
- IMDB (`xlnet-base-cased-imdb`) - IMDB (`xlnet-base-cased-imdb`)
- nlp dataset `imdb`, split `test` - `datasets` dataset `imdb`, split `test`
- Successes: 957/1000 - Successes: 957/1000
- Accuracy: 95.70% - Accuracy: 95.70%
- Movie Reviews [Rotten Tomatoes] (`xlnet-base-cased-mr`) - Movie Reviews [Rotten Tomatoes] (`xlnet-base-cased-mr`)
- nlp dataset `rotten_tomatoes`, split `validation` - `datasets` dataset `rotten_tomatoes`, split `validation`
- Successes: 908/1000 - Successes: 908/1000
- Accuracy: 90.80% - Accuracy: 90.80%
- nlp dataset `rotten_tomatoes`, split `test` - `datasets` dataset `rotten_tomatoes`, split `test`
- Successes: 876/1000 - Successes: 876/1000
- Accuracy: 87.60% - Accuracy: 87.60%
- MRPC (`xlnet-base-cased-mrpc`) - MRPC (`xlnet-base-cased-mrpc`)
- nlp dataset `glue`, subset `mrpc`, split `validation` - `datasets` dataset `glue`, subset `mrpc`, split `validation`
- Successes: 363/408 - Successes: 363/408
- Accuracy: 88.97% - Accuracy: 88.97%
- Recognizing Textual Entailment (`xlnet-base-cased-rte`) - Recognizing Textual Entailment (`xlnet-base-cased-rte`)
- nlp dataset `glue`, subset `rte`, split `validation` - `datasets` dataset `glue`, subset `rte`, split `validation`
- Successes: 196/277 - Successes: 196/277
- Accuracy: 70.76% - Accuracy: 70.76%
- STS-b (`xlnet-base-cased-stsb`) - STS-b (`xlnet-base-cased-stsb`)
- nlp dataset `glue`, subset `stsb`, split `validation` - `datasets` dataset `glue`, subset `stsb`, split `validation`
- Pearson correlation: 0.883111673280641 - Pearson correlation: 0.883111673280641
- Spearman correlation: 0.8773439961182335 - Spearman correlation: 0.8773439961182335
- WNLI (`xlnet-base-cased-wnli`) - WNLI (`xlnet-base-cased-wnli`)
- nlp dataset `glue`, subset `wnli`, split `validation` - `datasets` dataset `glue`, subset `wnli`, split `validation`
- Successes: 41/71 - Successes: 41/71
- Accuracy: 57.75% - Accuracy: 57.75%

View File

@@ -122,7 +122,7 @@ def set_cache_dir(cache_dir):
os.environ["TFHUB_CACHE_DIR"] = cache_dir os.environ["TFHUB_CACHE_DIR"] = cache_dir
# HuggingFace `transformers` cache directory # HuggingFace `transformers` cache directory
os.environ["PYTORCH_TRANSFORMERS_CACHE"] = cache_dir os.environ["PYTORCH_TRANSFORMERS_CACHE"] = cache_dir
# HuggingFace `nlp` cache directory # HuggingFace `datasets` cache directory
os.environ["HF_HOME"] = cache_dir os.environ["HF_HOME"] = cache_dir
# Basic directory for Linux user-specific non-data files # Basic directory for Linux user-specific non-data files
os.environ["XDG_CACHE_HOME"] = cache_dir os.environ["XDG_CACHE_HOME"] = cache_dir