diff --git a/README.md b/README.md index 27320379..70fef20e 100644 --- a/README.md +++ b/README.md @@ -206,7 +206,7 @@ of a string or a list of strings. Here's an example of how to use the `Embedding Our model training code is available via `textattack train` to help you train LSTMs, CNNs, and `transformers` models using TextAttack out-of-the-box. Datasets are -automatically loaded using the `nlp` package. +automatically loaded using the `datasets` package. #### Training Examples *Train our default LSTM for 50 epochs on the Yelp Polarity dataset:* @@ -227,7 +227,7 @@ textattack train --model bert-base-uncased --dataset glue^cola --batch-size 32 - ### `textattack peek-dataset` -To take a closer look at a dataset, use `textattack peek-dataset`. TextAttack will print some cursory statistics about the inputs and outputs from the dataset. For example, `textattack peek-dataset --dataset-from-nlp snli` will show information about the SNLI dataset from the NLP package. +To take a closer look at a dataset, use `textattack peek-dataset`. TextAttack will print some cursory statistics about the inputs and outputs from the dataset. For example, `textattack peek-dataset --dataset-from-datasets snli` will show information about the SNLI dataset from the NLP package. ### `textattack list` @@ -261,18 +261,18 @@ Here's an example of using one of the built-in models (the SST-2 dataset is auto textattack attack --model roberta-base-sst2 --recipe textfooler --num-examples 10 ``` -#### HuggingFace support: `transformers` models and `nlp` datasets +#### HuggingFace support: `transformers` models and `datasets` datasets We also provide built-in support for [`transformers` pretrained models](https://huggingface.co/models) -and datasets from the [`nlp` package](https://github.com/huggingface/nlp)! Here's an example of loading +and datasets from the [`datasets` package](https://github.com/huggingface/datasets)! Here's an example of loading and attacking a pre-trained model and dataset: ```bash -textattack attack --model-from-huggingface distilbert-base-uncased-finetuned-sst-2-english --dataset-from-nlp glue^sst2 --recipe deepwordbug --num-examples 10 +textattack attack --model-from-huggingface distilbert-base-uncased-finetuned-sst-2-english --dataset-from-datasets glue^sst2 --recipe deepwordbug --num-examples 10 ``` You can explore other pre-trained models using the `--model-from-huggingface` argument, or other datasets by changing -`--dataset-from-nlp`. +`--dataset-from-datasets`. #### Loading a model or dataset from a file diff --git a/docs/quickstart/command_line_usage.md b/docs/quickstart/command_line_usage.md index 5deda163..f1b336da 100644 --- a/docs/quickstart/command_line_usage.md +++ b/docs/quickstart/command_line_usage.md @@ -73,7 +73,7 @@ textattack attack --model lstm-mr --num-examples 20 --search-method beam-search^ ## Training Models with `textattack train` With textattack, you can train models on any classification or regression task -from [`nlp`](https://github.com/huggingface/nlp/) using a single line. +from [`datasets`](https://github.com/huggingface/datasets/) using a single line. ### Available Models #### TextAttack Models @@ -131,6 +131,6 @@ whatever dataset you're working with. Whether you're loading a dataset of your own from a file, or one from NLP, you can use `textattack peek-dataset` to see some basic information about the dataset. -For example, use `textattack peek-dataset --dataset-from-nlp glue^mrpc` to see +For example, use `textattack peek-dataset --dataset-from-datasets glue^mrpc` to see information about the MRPC dataset (from the GLUE set of datasets). This will print statistics like the number of labels, average number of words, etc. diff --git a/examples/attack/attack_huggingface_deepwordbug.sh b/examples/attack/attack_huggingface_deepwordbug.sh index 22598107..6dc6e4ba 100755 --- a/examples/attack/attack_huggingface_deepwordbug.sh +++ b/examples/attack/attack_huggingface_deepwordbug.sh @@ -1,4 +1,4 @@ #!/bin/bash # Shows how to attack a DistilBERT model fine-tuned on SST2 dataset *from the # huggingface model repository& using the DeepWordBug recipe and 10 examples. -textattack attack --model-from-huggingface distilbert-base-uncased-finetuned-sst-2-english --dataset-from-nlp glue^sst2 --recipe deepwordbug --num-examples 10 +textattack attack --model-from-huggingface distilbert-base-uncased-finetuned-sst-2-english --dataset-from-datasets glue^sst2 --recipe deepwordbug --num-examples 10 diff --git a/examples/train/train_lstm_rotten_tomatoes_sentiment_classification.sh b/examples/train/train_lstm_rotten_tomatoes_sentiment_classification.sh index cd6f26a3..d97d00b8 100755 --- a/examples/train/train_lstm_rotten_tomatoes_sentiment_classification.sh +++ b/examples/train/train_lstm_rotten_tomatoes_sentiment_classification.sh @@ -1,4 +1,4 @@ #!/bin/bash # Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a basic -# demonstration of our training script and `nlp` integration. +# demonstration of our training script and `datasets` integration. textattack train --model lstm --dataset rotten_romatoes --batch-size 64 --epochs 50 --learning-rate 1e-5 \ No newline at end of file diff --git a/requirements.txt b/requirements.txt index a7ef0c30..9dba90ab 100644 --- a/requirements.txt +++ b/requirements.txt @@ -5,7 +5,7 @@ filelock language_tool_python lemminflect lru-dict -nlp +datasets nltk numpy pandas>=1.0.1 diff --git a/tests/sample_outputs/run_attack_transformers_nlp.txt b/tests/sample_outputs/run_attack_transformers_datasets.txt similarity index 100% rename from tests/sample_outputs/run_attack_transformers_nlp.txt rename to tests/sample_outputs/run_attack_transformers_datasets.txt diff --git a/tests/test_command_line/test_attack.py b/tests/test_command_line/test_attack.py index 9161ad46..87c19007 100644 --- a/tests/test_command_line/test_attack.py +++ b/tests/test_command_line/test_attack.py @@ -43,10 +43,10 @@ attack_test_params = [ ( "textattack attack --model-from-huggingface " "distilbert-base-uncased-finetuned-sst-2-english " - "--dataset-from-nlp glue^sst2^train --recipe deepwordbug --num-examples 3 " + "--dataset-from-datasets glue^sst2^train --recipe deepwordbug --num-examples 3 " "--shuffle=False" ), - "tests/sample_outputs/run_attack_transformers_nlp.txt", + "tests/sample_outputs/run_attack_transformers_datasets.txt", ), # # test running an attack by loading a model and dataset from file @@ -59,7 +59,7 @@ attack_test_params = [ "--dataset-from-file tests/sample_inputs/sst_model_and_dataset.py " "--recipe deepwordbug --num-examples 3 --shuffle=False" ), - "tests/sample_outputs/run_attack_transformers_nlp.txt", + "tests/sample_outputs/run_attack_transformers_datasets.txt", ), # # test hotflip on 10 samples from LSTM MR diff --git a/tests/test_command_line/test_eval.py b/tests/test_command_line/test_eval.py index ee385f77..c3212f8f 100644 --- a/tests/test_command_line/test_eval.py +++ b/tests/test_command_line/test_eval.py @@ -4,7 +4,7 @@ import pytest eval_test_params = [ ( "eval_model_hub_rt", - "textattack eval --model-from-huggingface textattack/distilbert-base-uncased-rotten-tomatoes --dataset-from-nlp rotten_tomatoes --num-examples 4", + "textattack eval --model-from-huggingface textattack/distilbert-base-uncased-rotten-tomatoes --dataset-from-datasets rotten_tomatoes --num-examples 4", "tests/sample_outputs/eval_model_hub_rt.txt", ), ( diff --git a/textattack/commands/attack/attack_args_helpers.py b/textattack/commands/attack/attack_args_helpers.py index e06424b2..87f1bfca 100644 --- a/textattack/commands/attack/attack_args_helpers.py +++ b/textattack/commands/attack/attack_args_helpers.py @@ -64,11 +64,11 @@ def add_dataset_args(parser): """ dataset_group = parser.add_mutually_exclusive_group() dataset_group.add_argument( - "--dataset-from-nlp", + "--dataset-from-datasets", type=str, required=False, default=None, - help="Dataset to load from `nlp` repository.", + help="Dataset to load from `datasets` repository.", ) dataset_group.add_argument( "--dataset-from-file", @@ -349,7 +349,7 @@ def parse_dataset_from_args(args): # Automatically detect dataset for huggingface & textattack models. # This allows us to use the --model shortcut without specifying a dataset. if args.model in HUGGINGFACE_DATASET_BY_MODEL: - _, args.dataset_from_nlp = HUGGINGFACE_DATASET_BY_MODEL[args.model] + _, args.dataset_from_datasets = HUGGINGFACE_DATASET_BY_MODEL[args.model] elif args.model in TEXTATTACK_DATASET_BY_MODEL: _, dataset = TEXTATTACK_DATASET_BY_MODEL[args.model] if dataset[0].startswith("textattack"): @@ -358,7 +358,7 @@ def parse_dataset_from_args(args): dataset = eval(f"{dataset[0]}")(*dataset[1:]) return dataset else: - args.dataset_from_nlp = dataset + args.dataset_from_datasets = dataset # Automatically detect dataset for models trained with textattack. elif args.model and os.path.exists(args.model): model_args_json_path = os.path.join(args.model, "train_args.json") @@ -372,7 +372,7 @@ def parse_dataset_from_args(args): name, subset = model_train_args["dataset"].split(ARGS_SPLIT_TOKEN) else: name, subset = model_train_args["dataset"], None - args.dataset_from_nlp = ( + args.dataset_from_datasets = ( name, subset, model_train_args["dataset_dev_split"], @@ -403,14 +403,14 @@ def parse_dataset_from_args(args): raise AttributeError( f"``dataset`` not found in module {args.dataset_from_file}" ) - elif args.dataset_from_nlp: - dataset_args = args.dataset_from_nlp + elif args.dataset_from_datasets: + dataset_args = args.dataset_from_datasets if isinstance(dataset_args, str): if ARGS_SPLIT_TOKEN in dataset_args: dataset_args = dataset_args.split(ARGS_SPLIT_TOKEN) else: dataset_args = (dataset_args,) - dataset = textattack.datasets.HuggingFaceNlpDataset( + dataset = textattack.datasets.HuggingFaceDataset( *dataset_args, shuffle=args.shuffle ) dataset.examples = dataset.examples[args.num_examples_offset :] diff --git a/textattack/commands/train_model/train_args_helpers.py b/textattack/commands/train_model/train_args_helpers.py index 4d6e9bd1..174b5544 100644 --- a/textattack/commands/train_model/train_args_helpers.py +++ b/textattack/commands/train_model/train_args_helpers.py @@ -8,8 +8,9 @@ from textattack.commands.augment import AUGMENTATION_RECIPE_NAMES logger = textattack.shared.logger -def prepare_dataset_for_training(nlp_dataset): - """Changes an `nlp` dataset into the proper format for tokenization.""" +def prepare_dataset_for_training(datasets_dataset): + """Changes an `datasets` dataset into the proper format for + tokenization.""" def prepare_example_dict(ex): """Returns the values in order corresponding to the data. @@ -25,22 +26,22 @@ def prepare_dataset_for_training(nlp_dataset): return values[0] return tuple(values) - text, outputs = zip(*((prepare_example_dict(x[0]), x[1]) for x in nlp_dataset)) + text, outputs = zip(*((prepare_example_dict(x[0]), x[1]) for x in datasets_dataset)) return list(text), list(outputs) def dataset_from_args(args): - """Returns a tuple of ``HuggingFaceNlpDataset`` for the train and test + """Returns a tuple of ``HuggingFaceDataset`` for the train and test datasets for ``args.dataset``.""" dataset_args = args.dataset.split(ARGS_SPLIT_TOKEN) - # TODO `HuggingFaceNlpDataset` -> `HuggingFaceDataset` + # TODO `HuggingFaceDataset` -> `HuggingFaceDataset` if args.dataset_train_split: - train_dataset = textattack.datasets.HuggingFaceNlpDataset( + train_dataset = textattack.datasets.HuggingFaceDataset( *dataset_args, split=args.dataset_train_split ) else: try: - train_dataset = textattack.datasets.HuggingFaceNlpDataset( + train_dataset = textattack.datasets.HuggingFaceDataset( *dataset_args, split="train" ) args.dataset_train_split = "train" @@ -49,31 +50,31 @@ def dataset_from_args(args): train_text, train_labels = prepare_dataset_for_training(train_dataset) if args.dataset_dev_split: - eval_dataset = textattack.datasets.HuggingFaceNlpDataset( + eval_dataset = textattack.datasets.HuggingFaceDataset( *dataset_args, split=args.dataset_dev_split ) else: # try common dev split names try: - eval_dataset = textattack.datasets.HuggingFaceNlpDataset( + eval_dataset = textattack.datasets.HuggingFaceDataset( *dataset_args, split="dev" ) args.dataset_dev_split = "dev" except KeyError: try: - eval_dataset = textattack.datasets.HuggingFaceNlpDataset( + eval_dataset = textattack.datasets.HuggingFaceDataset( *dataset_args, split="eval" ) args.dataset_dev_split = "eval" except KeyError: try: - eval_dataset = textattack.datasets.HuggingFaceNlpDataset( + eval_dataset = textattack.datasets.HuggingFaceDataset( *dataset_args, split="validation" ) args.dataset_dev_split = "validation" except KeyError: try: - eval_dataset = textattack.datasets.HuggingFaceNlpDataset( + eval_dataset = textattack.datasets.HuggingFaceDataset( *dataset_args, split="test" ) args.dataset_dev_split = "test" @@ -189,7 +190,7 @@ def write_readme(args, best_eval_score, best_eval_score_epoch): ## TextAttack Model Card This `{args.model}` model was fine-tuned for sequence classification using TextAttack -and the {dataset_name} dataset loaded using the `nlp` library. The model was fine-tuned +and the {dataset_name} dataset loaded using the `datasets` library. The model was fine-tuned for {args.num_train_epochs} epochs with a batch size of {args.batch_size}, a learning rate of {args.learning_rate}, and a maximum sequence length of {args.max_length}. Since this was a {task_name} task, the model was trained with a {loss_func} loss function. diff --git a/textattack/commands/train_model/train_model_command.py b/textattack/commands/train_model/train_model_command.py index 80cef347..49aba152 100644 --- a/textattack/commands/train_model/train_model_command.py +++ b/textattack/commands/train_model/train_model_command.py @@ -47,7 +47,7 @@ class TrainModelCommand(TextAttackCommand): required=True, default="yelp", help="dataset for training; will be loaded from " - "`nlp` library. if dataset has a subset, separate with a colon. " + "`datasets` library. if dataset has a subset, separate with a colon. " " ex: `glue^sst2` or `rotten_tomatoes`", ) parser.add_argument( diff --git a/textattack/datasets/__init__.py b/textattack/datasets/__init__.py index b0189d2d..a8498c6d 100644 --- a/textattack/datasets/__init__.py +++ b/textattack/datasets/__init__.py @@ -1,4 +1,4 @@ from .dataset import TextAttackDataset -from .huggingface_nlp_dataset import HuggingFaceNlpDataset +from .huggingface_dataset import HuggingFaceDataset from . import translation diff --git a/textattack/datasets/huggingface_nlp_dataset.py b/textattack/datasets/huggingface_dataset.py similarity index 89% rename from textattack/datasets/huggingface_nlp_dataset.py rename to textattack/datasets/huggingface_dataset.py index 3e6caf00..9df27720 100644 --- a/textattack/datasets/huggingface_nlp_dataset.py +++ b/textattack/datasets/huggingface_dataset.py @@ -1,7 +1,7 @@ import collections import random -import nlp +import datasets import textattack from textattack.datasets import TextAttackDataset @@ -14,7 +14,7 @@ def _cb(s): return textattack.shared.utils.color_text(str(s), color="blue", method="ansi") -def get_nlp_dataset_columns(dataset): +def get_datasets_dataset_columns(dataset): schema = set(dataset.column_names) if {"premise", "hypothesis", "label"} <= schema: input_columns = ("premise", "hypothesis") @@ -54,15 +54,15 @@ def get_nlp_dataset_columns(dataset): return input_columns, output_column -class HuggingFaceNlpDataset(TextAttackDataset): - """Loads a dataset from HuggingFace ``nlp`` and prepares it as a TextAttack - dataset. +class HuggingFaceDataset(TextAttackDataset): + """Loads a dataset from HuggingFace ``datasets`` and prepares it as a + TextAttack dataset. - name: the dataset name - - subset: the subset of the main dataset. Dataset will be loaded as ``nlp.load_dataset(name, subset)``. + - subset: the subset of the main dataset. Dataset will be loaded as ``datasets.load_dataset(name, subset)``. - label_map: Mapping if output labels should be re-mapped. Useful if model was trained with a different label arrangement than - provided in the ``nlp`` version of the dataset. + provided in the ``datasets`` version of the dataset. - output_scale_factor (float): Factor to divide ground-truth outputs by. Generally, TextAttack goal functions require model outputs between 0 and 1. Some datasets test the model's correlation @@ -82,16 +82,16 @@ class HuggingFaceNlpDataset(TextAttackDataset): shuffle=False, ): self._name = name - self._dataset = nlp.load_dataset(name, subset)[split] + self._dataset = datasets.load_dataset(name, subset)[split] subset_print_str = f", subset {_cb(subset)}" if subset else "" textattack.shared.logger.info( - f"Loading {_cb('nlp')} dataset {_cb(name)}{subset_print_str}, split {_cb(split)}." + f"Loading {_cb('datasets')} dataset {_cb(name)}{subset_print_str}, split {_cb(split)}." ) # Input/output column order, like (('premise', 'hypothesis'), 'label') ( self.input_columns, self.output_column, - ) = dataset_columns or get_nlp_dataset_columns(self._dataset) + ) = dataset_columns or get_datasets_dataset_columns(self._dataset) self._i = 0 self.examples = list(self._dataset) self.label_map = label_map diff --git a/textattack/datasets/translation/ted_multi.py b/textattack/datasets/translation/ted_multi.py index d291896e..ba8ca618 100644 --- a/textattack/datasets/translation/ted_multi.py +++ b/textattack/datasets/translation/ted_multi.py @@ -1,20 +1,20 @@ import collections -import nlp +import datasets import numpy as np -from textattack.datasets import HuggingFaceNlpDataset +from textattack.datasets import HuggingFaceDataset -class TedMultiTranslationDataset(HuggingFaceNlpDataset): - """Loads examples from the Ted Talk translation dataset using the `nlp` - package. +class TedMultiTranslationDataset(HuggingFaceDataset): + """Loads examples from the Ted Talk translation dataset using the + `datasets` package. dataset source: http://www.cs.jhu.edu/~kevinduh/a/multitarget-tedtalks/ """ def __init__(self, source_lang="en", target_lang="de", split="test"): - self._dataset = nlp.load_dataset("ted_multi")[split] + self._dataset = datasets.load_dataset("ted_multi")[split] self.examples = self._dataset["translations"] language_options = set(self.examples[0]["language"]) if source_lang not in language_options: diff --git a/textattack/models/README.md b/textattack/models/README.md index 0f7276e8..e673bb5a 100644 --- a/textattack/models/README.md +++ b/textattack/models/README.md @@ -19,26 +19,26 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- AG News (`lstm-ag-news`) - - nlp dataset `ag_news`, split `test` + - `datasets` dataset `ag_news`, split `test` - Successes: 914/1000 - Accuracy: 91.4% - IMDB (`lstm-imdb`) - - nlp dataset `imdb`, split `test` + - `datasets` dataset `imdb`, split `test` - Successes: 883/1000 - Accuracy: 88.30% - Movie Reviews [Rotten Tomatoes] (`lstm-mr`) - - nlp dataset `rotten_tomatoes`, split `validation` + - `datasets` dataset `rotten_tomatoes`, split `validation` - Successes: 807/1000 - Accuracy: 80.70% - - nlp dataset `rotten_tomatoes`, split `test` + - `datasets` dataset `rotten_tomatoes`, split `test` - Successes: 781/1000 - Accuracy: 78.10% - SST-2 (`lstm-sst2`) - - nlp dataset `glue`, subset `sst2`, split `validation` + - `datasets` dataset `glue`, subset `sst2`, split `validation` - Successes: 737/872 - Accuracy: 84.52% - Yelp Polarity (`lstm-yelp`) - - nlp dataset `yelp_polarity`, split `test` + - `datasets` dataset `yelp_polarity`, split `test` - Successes: 922/1000 - Accuracy: 92.20% @@ -50,26 +50,26 @@ All evaluations shown are on the full validation or test set up to 1000 examples - AG News (`cnn-ag-news`) - - nlp dataset `ag_news`, split `test` + - `datasets` dataset `ag_news`, split `test` - Successes: 910/1000 - Accuracy: 91.00% - IMDB (`cnn-imdb`) - - nlp dataset `imdb`, split `test` + - `datasets` dataset `imdb`, split `test` - Successes: 863/1000 - Accuracy: 86.30% - Movie Reviews [Rotten Tomatoes] (`cnn-mr`) - - nlp dataset `rotten_tomatoes`, split `validation` + - `datasets` dataset `rotten_tomatoes`, split `validation` - Successes: 794/1000 - Accuracy: 79.40% - - nlp dataset `rotten_tomatoes`, split `test` + - `datasets` dataset `rotten_tomatoes`, split `test` - Successes: 768/1000 - Accuracy: 76.80% - SST-2 (`cnn-sst2`) - - nlp dataset `glue`, subset `sst2`, split `validation` + - `datasets` dataset `glue`, subset `sst2`, split `validation` - Successes: 721/872 - Accuracy: 82.68% - Yelp Polarity (`cnn-yelp`) - - nlp dataset `yelp_polarity`, split `test` + - `datasets` dataset `yelp_polarity`, split `test` - Successes: 913/1000 - Accuracy: 91.30% @@ -81,50 +81,50 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- AG News (`albert-base-v2-ag-news`) - - nlp dataset `ag_news`, split `test` + - `datasets` dataset `ag_news`, split `test` - Successes: 943/1000 - Accuracy: 94.30% - CoLA (`albert-base-v2-cola`) - - nlp dataset `glue`, subset `cola`, split `validation` + - `datasets` dataset `glue`, subset `cola`, split `validation` - Successes: 829/1000 - Accuracy: 82.90% - IMDB (`albert-base-v2-imdb`) - - nlp dataset `imdb`, split `test` + - `datasets` dataset `imdb`, split `test` - Successes: 913/1000 - Accuracy: 91.30% - Movie Reviews [Rotten Tomatoes] (`albert-base-v2-mr`) - - nlp dataset `rotten_tomatoes`, split `validation` + - `datasets` dataset `rotten_tomatoes`, split `validation` - Successes: 882/1000 - Accuracy: 88.20% - - nlp dataset `rotten_tomatoes`, split `test` + - `datasets` dataset `rotten_tomatoes`, split `test` - Successes: 851/1000 - Accuracy: 85.10% - Quora Question Pairs (`albert-base-v2-qqp`) - - nlp dataset `glue`, subset `qqp`, split `validation` + - `datasets` dataset `glue`, subset `qqp`, split `validation` - Successes: 914/1000 - Accuracy: 91.40% - Recognizing Textual Entailment (`albert-base-v2-rte`) - - nlp dataset `glue`, subset `rte`, split `validation` + - `datasets` dataset `glue`, subset `rte`, split `validation` - Successes: 211/277 - Accuracy: 76.17% - SNLI (`albert-base-v2-snli`) - - nlp dataset `snli`, split `test` + - `datasets` dataset `snli`, split `test` - Successes: 883/1000 - Accuracy: 88.30% - SST-2 (`albert-base-v2-sst2`) - - nlp dataset `glue`, subset `sst2`, split `validation` + - `datasets` dataset `glue`, subset `sst2`, split `validation` - Successes: 807/872 - Accuracy: 92.55%) - STS-b (`albert-base-v2-stsb`) - - nlp dataset `glue`, subset `stsb`, split `validation` + - `datasets` dataset `glue`, subset `stsb`, split `validation` - Pearson correlation: 0.9041359738552746 - Spearman correlation: 0.8995912861209745 - WNLI (`albert-base-v2-wnli`) - - nlp dataset `glue`, subset `wnli`, split `validation` + - `datasets` dataset `glue`, subset `wnli`, split `validation` - Successes: 42/71 - Accuracy: 59.15% - Yelp Polarity (`albert-base-v2-yelp`) - - nlp dataset `yelp_polarity`, split `test` + - `datasets` dataset `yelp_polarity`, split `test` - Successes: 963/1000 - Accuracy: 96.30% @@ -135,62 +135,62 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- AG News (`bert-base-uncased-ag-news`) - - nlp dataset `ag_news`, split `test` + - `datasets` dataset `ag_news`, split `test` - Successes: 942/1000 - Accuracy: 94.20% - CoLA (`bert-base-uncased-cola`) - - nlp dataset `glue`, subset `cola`, split `validation` + - `datasets` dataset `glue`, subset `cola`, split `validation` - Successes: 812/1000 - Accuracy: 81.20% - IMDB (`bert-base-uncased-imdb`) - - nlp dataset `imdb`, split `test` + - `datasets` dataset `imdb`, split `test` - Successes: 919/1000 - Accuracy: 91.90% - MNLI matched (`bert-base-uncased-mnli`) - - nlp dataset `glue`, subset `mnli`, split `validation_matched` + - `datasets` dataset `glue`, subset `mnli`, split `validation_matched` - Successes: 840/1000 - Accuracy: 84.00% - Movie Reviews [Rotten Tomatoes] (`bert-base-uncased-mr`) - - nlp dataset `rotten_tomatoes`, split `validation` + - `datasets` dataset `rotten_tomatoes`, split `validation` - Successes: 876/1000 - Accuracy: 87.60% - - nlp dataset `rotten_tomatoes`, split `test` + - `datasets` dataset `rotten_tomatoes`, split `test` - Successes: 838/1000 - Accuracy: 83.80% - MRPC (`bert-base-uncased-mrpc`) - - nlp dataset `glue`, subset `mrpc`, split `validation` + - `datasets` dataset `glue`, subset `mrpc`, split `validation` - Successes: 358/408 - Accuracy: 87.75% - QNLI (`bert-base-uncased-qnli`) - - nlp dataset `glue`, subset `qnli`, split `validation` + - `datasets` dataset `glue`, subset `qnli`, split `validation` - Successes: 904/1000 - Accuracy: 90.40% - Quora Question Pairs (`bert-base-uncased-qqp`) - - nlp dataset `glue`, subset `qqp`, split `validation` + - `datasets` dataset `glue`, subset `qqp`, split `validation` - Successes: 924/1000 - Accuracy: 92.40% - Recognizing Textual Entailment (`bert-base-uncased-rte`) - - nlp dataset `glue`, subset `rte`, split `validation` + - `datasets` dataset `glue`, subset `rte`, split `validation` - Successes: 201/277 - Accuracy: 72.56% - SNLI (`bert-base-uncased-snli`) - - nlp dataset `snli`, split `test` + - `datasets` dataset `snli`, split `test` - Successes: 894/1000 - Accuracy: 89.40% - SST-2 (`bert-base-uncased-sst2`) - - nlp dataset `glue`, subset `sst2`, split `validation` + - `datasets` dataset `glue`, subset `sst2`, split `validation` - Successes: 806/872 - Accuracy: 92.43%) - STS-b (`bert-base-uncased-stsb`) - - nlp dataset `glue`, subset `stsb`, split `validation` + - `datasets` dataset `glue`, subset `stsb`, split `validation` - Pearson correlation: 0.8775458937815515 - Spearman correlation: 0.8773251339980935 - WNLI (`bert-base-uncased-wnli`) - - nlp dataset `glue`, subset `wnli`, split `validation` + - `datasets` dataset `glue`, subset `wnli`, split `validation` - Successes: 40/71 - Accuracy: 56.34% - Yelp Polarity (`bert-base-uncased-yelp`) - - nlp dataset `yelp_polarity`, split `test` + - `datasets` dataset `yelp_polarity`, split `test` - Successes: 963/1000 - Accuracy: 96.30% @@ -202,27 +202,27 @@ All evaluations shown are on the full validation or test set up to 1000 examples - CoLA (`distilbert-base-cased-cola`) - - nlp dataset `glue`, subset `cola`, split `validation` + - `datasets` dataset `glue`, subset `cola`, split `validation` - Successes: 786/1000 - Accuracy: 78.60% - MRPC (`distilbert-base-cased-mrpc`) - - nlp dataset `glue`, subset `mrpc`, split `validation` + - `datasets` dataset `glue`, subset `mrpc`, split `validation` - Successes: 320/408 - Accuracy: 78.43% - Quora Question Pairs (`distilbert-base-cased-qqp`) - - nlp dataset `glue`, subset `qqp`, split `validation` + - `datasets` dataset `glue`, subset `qqp`, split `validation` - Successes: 908/1000 - Accuracy: 90.80% - SNLI (`distilbert-base-cased-snli`) - - nlp dataset `snli`, split `test` + - `datasets` dataset `snli`, split `test` - Successes: 861/1000 - Accuracy: 86.10% - SST-2 (`distilbert-base-cased-sst2`) - - nlp dataset `glue`, subset `sst2`, split `validation` + - `datasets` dataset `glue`, subset `sst2`, split `validation` - Successes: 785/872 - Accuracy: 90.02%) - STS-b (`distilbert-base-cased-stsb`) - - nlp dataset `glue`, subset `stsb`, split `validation` + - `datasets` dataset `glue`, subset `stsb`, split `validation` - Pearson correlation: 0.8421540899520146 - Spearman correlation: 0.8407155030382939 @@ -233,39 +233,39 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- AG News (`distilbert-base-uncased-ag-news`) - - nlp dataset `ag_news`, split `test` + - `datasets` dataset `ag_news`, split `test` - Successes: 944/1000 - Accuracy: 94.40% - CoLA (`distilbert-base-uncased-cola`) - - nlp dataset `glue`, subset `cola`, split `validation` + - `datasets` dataset `glue`, subset `cola`, split `validation` - Successes: 786/1000 - Accuracy: 78.60% - IMDB (`distilbert-base-uncased-imdb`) - - nlp dataset `imdb`, split `test` + - `datasets` dataset `imdb`, split `test` - Successes: 903/1000 - Accuracy: 90.30% - MNLI matched (`distilbert-base-uncased-mnli`) - - nlp dataset `glue`, subset `mnli`, split `validation_matched` + - `datasets` dataset `glue`, subset `mnli`, split `validation_matched` - Successes: 817/1000 - Accuracy: 81.70% - MRPC (`distilbert-base-uncased-mrpc`) - - nlp dataset `glue`, subset `mrpc`, split `validation` + - `datasets` dataset `glue`, subset `mrpc`, split `validation` - Successes: 350/408 - Accuracy: 85.78% - QNLI (`distilbert-base-uncased-qnli`) - - nlp dataset `glue`, subset `qnli`, split `validation` + - `datasets` dataset `glue`, subset `qnli`, split `validation` - Successes: 860/1000 - Accuracy: 86.00% - Recognizing Textual Entailment (`distilbert-base-uncased-rte`) - - nlp dataset `glue`, subset `rte`, split `validation` + - `datasets` dataset `glue`, subset `rte`, split `validation` - Successes: 180/277 - Accuracy: 64.98% - STS-b (`distilbert-base-uncased-stsb`) - - nlp dataset `glue`, subset `stsb`, split `validation` + - `datasets` dataset `glue`, subset `stsb`, split `validation` - Pearson correlation: 0.8421540899520146 - Spearman correlation: 0.8407155030382939 - WNLI (`distilbert-base-uncased-wnli`) - - nlp dataset `glue`, subset `wnli`, split `validation` + - `datasets` dataset `glue`, subset `wnli`, split `validation` - Successes: 40/71 - Accuracy: 56.34% @@ -276,46 +276,46 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- AG News (`roberta-base-ag-news`) - - nlp dataset `ag_news`, split `test` + - `datasets` dataset `ag_news`, split `test` - Successes: 947/1000 - Accuracy: 94.70% - CoLA (`roberta-base-cola`) - - nlp dataset `glue`, subset `cola`, split `validation` + - `datasets` dataset `glue`, subset `cola`, split `validation` - Successes: 857/1000 - Accuracy: 85.70% - IMDB (`roberta-base-imdb`) - - nlp dataset `imdb`, split `test` + - `datasets` dataset `imdb`, split `test` - Successes: 941/1000 - Accuracy: 94.10% - Movie Reviews [Rotten Tomatoes] (`roberta-base-mr`) - - nlp dataset `rotten_tomatoes`, split `validation` + - `datasets` dataset `rotten_tomatoes`, split `validation` - Successes: 899/1000 - Accuracy: 89.90% - - nlp dataset `rotten_tomatoes`, split `test` + - `datasets` dataset `rotten_tomatoes`, split `test` - Successes: 883/1000 - Accuracy: 88.30% - MRPC (`roberta-base-mrpc`) - - nlp dataset `glue`, subset `mrpc`, split `validation` + - `datasets` dataset `glue`, subset `mrpc`, split `validation` - Successes: 371/408 - Accuracy: 91.18% - QNLI (`roberta-base-qnli`) - - nlp dataset `glue`, subset `qnli`, split `validation` + - `datasets` dataset `glue`, subset `qnli`, split `validation` - Successes: 917/1000 - Accuracy: 91.70% - Recognizing Textual Entailment (`roberta-base-rte`) - - nlp dataset `glue`, subset `rte`, split `validation` + - `datasets` dataset `glue`, subset `rte`, split `validation` - Successes: 217/277 - Accuracy: 78.34% - SST-2 (`roberta-base-sst2`) - - nlp dataset `glue`, subset `sst2`, split `validation` + - `datasets` dataset `glue`, subset `sst2`, split `validation` - Successes: 820/872 - Accuracy: 94.04%) - STS-b (`roberta-base-stsb`) - - nlp dataset `glue`, subset `stsb`, split `validation` + - `datasets` dataset `glue`, subset `stsb`, split `validation` - Pearson correlation: 0.906067852162708 - Spearman correlation: 0.9025045272903051 - WNLI (`roberta-base-wnli`) - - nlp dataset `glue`, subset `wnli`, split `validation` + - `datasets` dataset `glue`, subset `wnli`, split `validation` - Successes: 40/71 - Accuracy: 56.34% @@ -326,34 +326,34 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- CoLA (`xlnet-base-cased-cola`) - - nlp dataset `glue`, subset `cola`, split `validation` + - `datasets` dataset `glue`, subset `cola`, split `validation` - Successes: 800/1000 - Accuracy: 80.00% - IMDB (`xlnet-base-cased-imdb`) - - nlp dataset `imdb`, split `test` + - `datasets` dataset `imdb`, split `test` - Successes: 957/1000 - Accuracy: 95.70% - Movie Reviews [Rotten Tomatoes] (`xlnet-base-cased-mr`) - - nlp dataset `rotten_tomatoes`, split `validation` + - `datasets` dataset `rotten_tomatoes`, split `validation` - Successes: 908/1000 - Accuracy: 90.80% - - nlp dataset `rotten_tomatoes`, split `test` + - `datasets` dataset `rotten_tomatoes`, split `test` - Successes: 876/1000 - Accuracy: 87.60% - MRPC (`xlnet-base-cased-mrpc`) - - nlp dataset `glue`, subset `mrpc`, split `validation` + - `datasets` dataset `glue`, subset `mrpc`, split `validation` - Successes: 363/408 - Accuracy: 88.97% - Recognizing Textual Entailment (`xlnet-base-cased-rte`) - - nlp dataset `glue`, subset `rte`, split `validation` + - `datasets` dataset `glue`, subset `rte`, split `validation` - Successes: 196/277 - Accuracy: 70.76% - STS-b (`xlnet-base-cased-stsb`) - - nlp dataset `glue`, subset `stsb`, split `validation` + - `datasets` dataset `glue`, subset `stsb`, split `validation` - Pearson correlation: 0.883111673280641 - Spearman correlation: 0.8773439961182335 - WNLI (`xlnet-base-cased-wnli`) - - nlp dataset `glue`, subset `wnli`, split `validation` + - `datasets` dataset `glue`, subset `wnli`, split `validation` - Successes: 41/71 - Accuracy: 57.75% diff --git a/textattack/shared/utils/install.py b/textattack/shared/utils/install.py index 6e5f2cb1..d9a5c4ee 100644 --- a/textattack/shared/utils/install.py +++ b/textattack/shared/utils/install.py @@ -122,7 +122,7 @@ def set_cache_dir(cache_dir): os.environ["TFHUB_CACHE_DIR"] = cache_dir # HuggingFace `transformers` cache directory os.environ["PYTORCH_TRANSFORMERS_CACHE"] = cache_dir - # HuggingFace `nlp` cache directory + # HuggingFace `datasets` cache directory os.environ["HF_HOME"] = cache_dir # Basic directory for Linux user-specific non-data files os.environ["XDG_CACHE_HOME"] = cache_dir