1
0
mirror of https://github.com/QData/TextAttack.git synced 2021-10-13 00:05:06 +03:00

Compare commits

...

424 Commits

Author SHA1 Message Date
Yanjun Qi / Jane
907ec46556 Merge pull request #539 from QData/fix-'textattack'-has-no-attribute-'shared'-bug
Fix logger initiation bug
2021-10-12 11:37:03 -04:00
Yanjun Qi / Jane
3f0d5290be Merge pull request #543 from QData/doc-minor
add custom dataset API use example in doc
2021-10-08 10:39:03 -04:00
Yanjun Qi
42d019262e add custom dataset API use example in doc 2021-10-08 10:37:32 -04:00
Hanyu Liu
fb6a04e10b Update trainer.py 2021-10-05 19:20:54 -04:00
Hanyu Liu
ba68897975 Update trainer.py 2021-10-05 18:47:36 -04:00
Hanyu Liu
841f47580b Update trainer.py 2021-10-05 18:39:00 -04:00
Hanyu Liu
9b089994a1 Update trainer.py 2021-10-05 18:35:17 -04:00
Yanjun Qi / Jane
caacc1c8a7 Merge pull request #533 from QData/fix-dataset-split-bug
Fix dataset-split bug
2021-10-05 16:56:59 -04:00
Yanjun Qi / Jane
0f6401d08a Merge pull request #535 from QData/install-doc-minor
Update installation.md to add FAQ on installation
2021-10-01 08:16:40 -04:00
Yanjun Qi
748990a362 Update installation.md 2021-10-01 08:13:08 -04:00
Hanyu Liu
9ba7d9c1ca fix format 2021-09-30 23:42:17 -04:00
Hanyu Liu
014af17184 Update dataset_args.py 2021-09-30 23:11:04 -04:00
Yanjun Qi / Jane
7d2de08771 Merge pull request #514 from QData/metric-module
New metric module to improve flexibility and  intuitiveness - moved from #475
2021-09-29 16:41:03 -04:00
Yanjun Qi
fa9817af5d fix docstring issues.. 2021-09-29 15:50:57 -04:00
Jack Morris
e7a824e6a7 Merge pull request #521 from dangne/master
Fix a bug when running textattack eval with --num-examples=-1
2021-09-24 11:29:14 -04:00
Jack Morris
cdfcba45b8 Merge pull request #509 from wenh06/master
Fix incorrect `__eq__` method of `AttackedText` in `textattack/shared/attacked_text.py`
2021-09-22 19:55:45 -04:00
Yanjun Qi
2d7daa61de black format fix 2021-09-22 14:39:50 -04:00
WEN Hao
2c5dc2c12c Merge branch 'wenh06' 2021-09-22 15:02:19 +08:00
WEN Hao
30a5ead36c fix typos in _text_index_of_word_index of AttackedText 2021-09-22 15:01:59 +08:00
WEN Hao
a857fde878 Merge branch 'master' into wenh06 2021-09-22 10:41:36 +08:00
WEN Hao
d7afffca52 fix bugs in the method _text_index_of_word_index in the class AttackedText #526 2021-09-22 10:41:17 +08:00
WEN Hao
b3530d954b Merge remote-tracking branch 'upstream/master' 2021-09-22 10:34:17 +08:00
sanchit97
f1ef471ea8 [FIX] Fix perplexity escape seq 2021-09-16 19:53:16 -04:00
sanchit97
3e2b16f344 [FIX] Fix perplexity precision 2021-09-16 18:05:01 -04:00
Yanjun Qi / Jane
983769d2ea Merge pull request #523 from QData/a2t
Add new attack recipe A2T
2021-09-12 18:21:26 -04:00
Jin Yong Yoo
53be760e0b [CODE] Add new attack recipe A2T 2021-09-11 23:07:24 -04:00
WEN Hao
630d5b55ce fix typos 2021-09-11 11:16:27 +08:00
WEN Hao
fc313f4be2 update the close method of CSVLogger
there's no member fout in the CSVLogger, and there's no fout assigned to instances of CSVLogger elsewhere in Textattack
2021-09-11 11:15:33 +08:00
WEN Hao
ac06346ff5 fix typos 2021-09-11 11:05:57 +08:00
Yanjun Qi / Jane
6e704871e8 Merge pull request #522 from QData/doc-fix
readtheDoc fix
2021-09-10 14:12:20 -04:00
Yanjun Qi
b66bb926ee Update requirements.txt 2021-09-10 14:02:07 -04:00
Yanjun Qi
7266c1899e Update index.rst 2021-09-10 11:41:51 -04:00
Yanjun Qi
9482ed0948 correct synt error in command for optional installation 2021-09-10 11:37:26 -04:00
sanchit97
eab1cd06d3 [CODE] Fix black 2021-09-10 10:52:54 -04:00
sanchit97
d44a54c787 [CODE] Fix print 2021-09-10 10:48:04 -04:00
sanchit97
aa1ad15798 [CODE] Add new docs 2021-09-10 10:46:45 -04:00
sanchit97
b5a120987f [CODE] Add new help msg 2021-09-10 02:54:38 -04:00
sanchit97
aab7eec818 [CODE] Fix isort on use 2021-09-10 02:45:35 -04:00
sanchit97
559057e068 [CODE] Fix black on use 2021-09-10 02:38:55 -04:00
sanchit97
32c3e43adc [CODE] Fix metrics, add tests 2021-09-10 02:33:50 -04:00
Dang Minh Nguyen
19148ea9cc Fix a bug
Fix a bug when running `textattack eval` with `--num-examples -1` in CLI.
2021-09-09 01:12:50 +07:00
Yanjun Qi
1c00a8e78f Update overview.png 2021-09-03 14:33:52 -04:00
Yanjun Qi
7bfddaae65 update tensorflow-estimator version to solve read-the-doc compile fail 2021-09-03 14:27:08 -04:00
Yanjun Qi / Jane
9c7515838d Merge pull request #520 from QData/Add-textattack-highlevel-overview-diagram
update overview.png
2021-09-03 12:50:04 -04:00
Yanjun Qi
e3ccf508c7 update overview.png 2021-09-03 12:48:28 -04:00
Yanjun Qi / Jane
8181de0b2d Merge pull request #519 from QData/Add-textattack-highlevel-overview-diagram
Add a high level overview diagram to docs
2021-09-03 12:43:49 -04:00
Yanjun Qi
9e99e90d3d clean up overview pdf to png and add into two docs 2021-09-03 12:42:29 -04:00
Yanjun Qi
fb66f8efee Create overview.png
and add into proper docs
2021-09-03 12:41:52 -04:00
WEN Hao
b88fbdb08f Merge branch 'QData:master' into master 2021-09-03 14:55:50 +08:00
diegoc
0af3d6c872 Fixes to visual bug
fixes a visual bug where textboxes aren't properly outlined.
2021-09-02 23:35:57 -04:00
diegoc
d11ab3a37d Add Diagram
Added a high level overview diagram
2021-09-02 23:32:16 -04:00
sanchit97
10ee24b8da [FIX] Change structure of Metric mdl 2021-09-02 20:19:42 -04:00
WEN Hao
3a9551d633 revert changes made in huggingface_dataset.py
download config was added in previous 2 commits, which is useless for most of the people, hence is removed.
2021-09-03 08:10:30 +08:00
sanchit97
0baa5029fc [FIX] Working USE 2021-09-02 19:20:10 -04:00
Yanjun Qi / Jane
60d179f2f8 Merge pull request #517 from QData/dependabot/pip/docs/tensorflow-2.5.1
Bump tensorflow from 2.4.2 to 2.5.1 in /docs
2021-08-27 10:04:35 -04:00
sanchit97
a1b2c5bc7a [CODE] New USE metric WIP
Signed-off-by: sanchit97 <ss7mu@virginia.edu>
2021-08-27 02:36:17 -04:00
sanchit97
5e929f2498 [FIX] Import order+metrics import 2021-08-27 00:40:23 -04:00
dependabot[bot]
2150f05174 Bump tensorflow from 2.4.2 to 2.5.1 in /docs
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.4.2 to 2.5.1.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.4.2...v2.5.1)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-08-25 15:19:45 +00:00
Yanjun Qi / Jane
2e2125dc62 Update README.md
minor format issue fixed
2021-08-21 14:54:08 -04:00
Yanjun Qi / Jane
a3d8742bc5 Merge pull request #515 from QData/keras-parallel-fix
[CODE] Keras parallel attack fix - Issue #499
2021-08-21 14:49:34 -04:00
Yanjun Qi
f7a2218bdc adding document on this parallel hotfix example script 2021-08-21 14:41:11 -04:00
sanchit97
1d5f2ec758 [FIX] isort errors-unused 2021-08-20 10:49:47 -04:00
sanchit97
d30b9af0ed [FIX] isort errors 2021-08-20 10:42:16 -04:00
sanchit97
35330468c2 [CODE] Keras parallel attack fix 2021-08-20 10:30:19 -04:00
sanchit97
4a94be9209 [CODE] Add command-line option for quality metrics 2021-08-20 10:11:14 -04:00
sanchit97
5a88253692 [CODE] Fixing init file 2021-08-20 02:18:43 -04:00
sanchit97
46781b1d53 [CODE] Add code for perplexity 2021-08-20 02:16:29 -04:00
sanchit97
da4ca68520 [CODE] Changing metric to resemble constraints structure 2021-08-20 01:21:29 -04:00
sanchit97
c0e2993947 [CODE] Transfer from #475 + new structure 2021-08-20 01:05:35 -04:00
WEN Hao
19e253a968 Merge branch 'wenh06' 2021-08-06 20:49:24 +08:00
WEN Hao
ac2cea8b74 fix bug in method of in 2021-08-06 20:48:15 +08:00
Yanjun Qi
6beca9166d Update huggingface_dataset.py
minor format issue
2021-08-03 16:28:08 -04:00
WEN Hao
7a7d211ea7 Merge branch 'wenh06' 2021-08-03 22:32:16 +08:00
WEN Hao
f17c297096 fix incorrect __eq__ method of AttackedText in textattack/shared/attacked_text.py 2021-08-03 22:31:44 +08:00
WEN Hao
ff397efb0f add DownloadConfig in datasets/huggingface_dataset.py 2021-08-03 22:27:12 +08:00
Yanjun Qi / Jane
f64df48beb Merge pull request #508 from QData/example_bug_fix
bug fix for codes in example folder
2021-08-02 22:29:30 -04:00
Yanjun Qi
392576e6fb small format issue 2021-08-02 21:40:40 -04:00
Yanjun Qi
3542339e49 add more extended features in doc 2021-08-02 21:37:27 -04:00
Yanjun Qi
567079bea1 Create train_lstm_rotten_tomatoes_sentiment_classification.sh 2021-08-02 20:56:37 -04:00
Yanjun Qi
7200d7f5a8 Update attack_camembert.py
formatting issue
2021-08-02 17:28:40 -04:00
Yanjun Qi
c57444940a fixing bugs in example folder.. 2021-08-02 16:42:49 -04:00
Yanjun Qi
27cf667553 distilbert-base-uncased-qqp model does not exist in textattack zoo 2021-08-02 15:33:14 -04:00
Yanjun Qi / Jane
fd7117d756 Merge pull request #506 from QData/augment-test
add more tests for augment function
2021-08-02 11:59:27 -04:00
Yanjun Qi
e98c801eca Update word_embedding_distance.py 2021-08-02 11:48:06 -04:00
Yanjun Qi
0da95d803d black formatting 2021-08-02 11:40:37 -04:00
Yanjun Qi
f4d2397714 add more tests for augment function 2021-08-02 11:28:13 -04:00
Yanjun Qi / Jane
c30728bb26 Merge pull request #505 from QData/s3-model-fix
[FixBug] Fix bug with loading pretrained lstm and cnn models
2021-08-01 07:15:31 -04:00
Jin Yong Yoo
d92203c0f3 Merge branch 'master' into s3-model-fix 2021-08-01 15:53:47 +09:00
Jin Yong Yoo
26fdf99b09 [FixBug] Fix bug with loading pretrained lstm and cnn models 2021-08-01 02:50:25 -04:00
Yanjun Qi / Jane
9d7b3b942d Merge pull request #503 from QData/multilingual-doc
adding in documentation on multilingual supports
2021-07-30 12:39:04 -04:00
Yanjun Qi
fdd232c74a adding in documentation on multilingual supports 2021-07-30 12:15:01 -04:00
Yanjun Qi / Jane
47ff63facb Merge pull request #502 from QData/Notebook-10-bug-fix
Update Example_5_Explain_BERT.ipynb
2021-07-30 10:52:33 -04:00
diegoc
a798564ee5 Update Example_5_Explain_BERT.ipynb 2021-07-30 18:23:06 +08:00
Yanjun Qi / Jane
4dc5e591cb Merge pull request #500 from QData/docstring-rework-missing
adding the missed docstrings for api-rework-modules
2021-07-29 18:56:52 -04:00
Yanjun Qi
1fa60f98c6 remove trailing whitespaces 2021-07-29 18:56:26 -04:00
Yanjun Qi
d5ef45900b adding the missed docstrings for api-rework-modules 2021-07-29 18:40:11 -04:00
Yanjun Qi
c2b5086c44 Update requirements.txt 2021-07-28 13:16:20 -04:00
Yanjun Qi
14fa300931 Update conf.py 2021-07-28 12:59:01 -04:00
Yanjun Qi / Jane
f9a7d2cfb7 Merge pull request #497 from QData/dependabot/pip/docs/tensorflow-2.4.2
Bump tensorflow from 2.3.3 to 2.4.2 in /docs
2021-07-28 12:46:54 -04:00
Yanjun Qi
ed75b2af10 Update conf.py 2021-07-28 12:39:54 -04:00
Yanjun Qi
101f83be80 Update requirements.txt 2021-07-27 22:28:59 -04:00
Yanjun Qi
ac9d4adbd8 Update requirements.txt 2021-07-27 22:17:41 -04:00
Yanjun Qi
7f01f22573 Update requirements.txt 2021-07-27 21:54:20 -04:00
Yanjun Qi
c414ccc593 Update requirements.txt 2021-07-27 21:52:38 -04:00
Yanjun Qi
4265ea1121 tensorflow 2.4.2 depends on tensorflow-estimator<2.5.0 and >=2.4.0 2021-07-27 21:23:53 -04:00
dependabot[bot]
bd4c0fe700 Bump tensorflow from 2.3.3 to 2.4.2 in /docs
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.3.3 to 2.4.2.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.3.3...v2.4.2)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-07-28 00:35:28 +00:00
Yanjun Qi / Jane
d129e7b8b0 Merge pull request #496 from QData/readthedoc-fix
Update requirements.txt pillow version...
2021-07-27 20:25:57 -04:00
Yanjun Qi
90660cb04a Update requirements.txt 2021-07-27 20:25:29 -04:00
Yanjun Qi
0f63bae61d Update requirements.txt 2021-07-27 20:22:07 -04:00
Yanjun Qi / Jane
4c40fff13c Merge pull request #495 from QData/readthedoc-fix
readthedoc pip dependency errors fix
2021-07-27 20:19:59 -04:00
Yanjun Qi
af33b88cfe update requirement.txt to local-successful-built version 2021-07-27 20:05:13 -04:00
Yanjun Qi
3b0285184c readthedoc pip dependency errors fix 2021-07-27 19:45:21 -04:00
Yanjun Qi
057fe299f0 Update requirements.txt 2021-07-27 16:37:37 -04:00
Yanjun Qi
138623eee7 Update requirements.txt 2021-07-27 16:30:20 -04:00
Yanjun Qi / Jane
b3fd41e4ea Merge pull request #494 from QData/doc-fix
update  conda env export --from-history | tee environment.yml
2021-07-27 14:40:38 -04:00
Yanjun Qi
0baa1228a7 conda env export --from-history | tee environment.yml
to get a smaller set of conda enviroment.yml
2021-07-27 14:37:13 -04:00
Yanjun Qi / Jane
5cbefa4992 Merge pull request #493 from QData/doc-fix
Create environment.yml to solve readthedoc dependency issues
2021-07-27 13:44:04 -04:00
Yanjun Qi
ee37007d25 Create environment.yml 2021-07-27 13:12:49 -04:00
Yanjun Qi / Jane
00d5f837e5 Merge pull request #492 from QData/doc-fix
fix document errors and inconsistency
2021-07-27 11:24:25 -04:00
Yanjun Qi
4f66ad0ee6 fix document errors and inconsistency 2021-07-27 11:07:38 -04:00
Yanjun Qi / Jane
8f36926295 Merge pull request #491 from QData/CUDA_out_of_memory_fix
document fix regarding the new training commands
2021-07-26 22:40:01 -04:00
Yanjun Qi
17ab22c0c4 again formatting isort issues 2021-07-26 18:57:16 -04:00
Yanjun Qi
54bd03955a again formatting issue 2021-07-26 18:49:03 -04:00
Yanjun Qi
f866e5b316 black formatting 2021-07-26 18:37:34 -04:00
Yanjun Qi
f2e4f48948 update docs with the current "textattack train" CLIs 2021-07-26 18:26:02 -04:00
Yanjun Qi
b64d58efe9 try to empty cache before load models 2021-07-26 15:21:04 -04:00
Yanjun Qi / Jane
ac8872aeb7 Merge pull request #420 from QData/multilingual
Addition of optional language/model parameters to constraints (and download method adjustment)
2021-07-26 09:54:49 -04:00
Yanjun Qi / Jane
1ff364ae44 Merge pull request #490 from QData/scipy-version-plus-two-doc-updates
scipy version plus minor documentation updates
2021-07-25 12:16:26 -04:00
Yanjun Qi
92e085d09a scipy-doc 2021-07-24 23:37:03 -04:00
Yanjun Qi / Jane
776dfece2a Merge pull request #484 from QData/update-torch-version
Update torch version requirement
2021-07-22 14:27:55 -04:00
Yanjun Qi / Jane
41b16e4a59 Merge pull request #453 from matrix1001/master
fix all_words_diff problem
2021-07-22 14:26:59 -04:00
Jin Yong Yoo
b286907399 Update torch version requirement 2021-07-22 23:15:48 +09:00
Yanjun Qi
d2e5e86530 Update FAQ.md 2021-07-22 09:32:58 -04:00
Yanjun Qi
183c47c743 Update FAQ.md 2021-07-22 09:26:07 -04:00
Yanjun Qi / Jane
e29a01ab72 Merge pull request #477 from cogeid/Fix-RandomSwap-and-RandomSynonymInsertion-bug
Fix-RandomSwap-and-RandomSynonymInsertion-bug pr 368 to pass pytest
2021-07-21 14:52:04 -04:00
Yanjun Qi
79db189d6d Update recipes.py
augmented_text = list(set(augmented_text))  to  remove duplicated results in EasyDataAugmenter
2021-07-21 13:44:18 -04:00
sanchit97
a34f5f841f [FIX] Possible temp fix for pytest 2021-07-19 02:47:31 -04:00
diegoc
a7a4f5f88d bug fixes to pass tests 2021-07-19 07:14:05 +08:00
Yanjun Qi / Jane
8918e8f951 Merge pull request #469 from xinzhel/allennlp_doc
fix bugs in doc for "Attacking Allennlp Models"
2021-07-16 10:49:40 -04:00
sanchit97
317a36d85e [FIX] Update download methods to pass tests 2021-07-16 10:42:58 -04:00
Chengyuan C
b3b36b49da Merge branch 'master' into allennlp_doc 2021-07-16 02:57:16 +08:00
Yanjun Qi
b40cf8f57b Update test_attack.py 2021-07-09 17:02:36 -04:00
diegoc
7860fd410a bug fix 2021-07-09 23:34:44 +08:00
diegoc
6bc8173b8f Revert "Bug fix"
This reverts commit b30d4f06cb.
2021-07-09 23:34:15 +08:00
diegoc
b30d4f06cb Bug fix 2021-07-07 22:33:21 +08:00
Yanjun Qi / Jane
2ebbab0720 Merge pull request #473 from cogeid/file-redirection-fix
File redirection fix
2021-07-02 10:14:01 -04:00
alexander.zap
bade0d7446 Merge branch 'master' of https://github.com/QData/TextAttack into multilingual
 Conflicts:
	textattack/constraints/semantics/sentence_encoders/bert/bert.py
	textattack/datasets/dataset.py
	textattack/models/helpers/utils.py
	textattack/shared/word_embeddings.py
2021-06-30 12:53:40 +02:00
Jin Yong Yoo
7855b9e474 Merge pull request #479 from QData/update-version-num
[DOC] Update package version
2021-06-25 21:47:58 +09:00
Jin Yong Yoo
9a7f142e34 [DOC] Update package version 2021-06-25 08:47:25 -04:00
Jin Yong Yoo
f855d15108 Merge pull request #478 from QData/api-doc
Update API docs and fix minor bugs with Trainer
2021-06-25 20:49:44 +09:00
Jin Yong Yoo
d93682d8bd [CODE, DOC] Fix minor bugs with tb logging and update docstrings 2021-06-25 05:50:55 -04:00
Jin Yong Yoo
9d9ecaed06 [DOCS] Update docstrings and user API doc 2021-06-25 05:50:23 -04:00
Jin Yong Yoo
8637d20c2b Merge pull request #476 from QData/api-tests
Fix bugs and update training test
2021-06-25 11:51:39 +09:00
diegoc
57169aaf5d fix sphinx format 2021-06-24 22:39:57 +08:00
diegoc
14974e83b0 bug fixes 2021-06-24 21:49:33 +08:00
diegoc
bf0fff7486 fix pr 368 to pass pytest 2021-06-24 20:29:38 +08:00
diegoc
bfd6216d81 Delete comments 2021-06-24 19:45:24 +08:00
Jin Yong Yoo
4ece786aee [TEST] Modify adv training test 2021-06-24 05:08:50 -04:00
Jin Yong Yoo
5b6663273f Revert "[TEST] Add tests for API attack and training"
This reverts commit f7a476c467.
2021-06-24 04:49:04 -04:00
Jin Yong Yoo
4d3accf290 [CODE] Fix minor bugs 2021-06-21 08:54:27 -04:00
Jin Yong Yoo
f7a476c467 [TEST] Add tests for API attack and training 2021-06-21 08:53:59 -04:00
diegoc
fe99ed4e00 Merge branch 'file-redirection-fix' of https://github.com/cogeid/TextAttack into file-redirection-fix 2021-06-21 03:28:03 +08:00
diegoc
e15a9f9691 pytest update 2021-06-21 03:27:38 +08:00
diegoc
00b8fc81c6 updated pytest, changes to loggers 2021-06-21 03:27:38 +08:00
Yanjun Qi
254c98473d make attack-recipes more easier to find in doc 2021-06-18 16:13:40 -04:00
matrix1001
60a9b3213f fix black format problem 2021-06-18 08:09:23 +08:00
Yanjun Qi / Jane
501e443328 Merge pull request #474 from QData/doc-cmd
add two recipes_cmd.md files
2021-06-17 17:03:31 -04:00
Yanjun Qi
40268e9c63 Update command_line_usage.md 2021-06-17 16:50:14 -04:00
Yanjun Qi
bede5f1c50 change the examples link to absolute URL 2021-06-17 16:48:42 -04:00
Yanjun Qi
262ca05307 Update augmenter_recipes_cmd.md 2021-06-17 16:26:29 -04:00
Yanjun Qi
9540ee457f Update augmenter_recipes_cmd.md 2021-06-17 16:19:21 -04:00
Yanjun Qi
b537b1dae8 add in commandline documentations for recipes in the read-the-doc 2021-06-17 15:09:01 -04:00
diegoc
a3f94ea123 pytest update 2021-06-17 22:54:31 +08:00
diegoc
a5537e1601 updated pytest, changes to loggers 2021-06-17 22:02:49 +08:00
Matrix
44d3f370ba Update attack_log_manager.py
using words_diff_num instead of all_words_diff
2021-06-16 08:42:16 +08:00
Matrix
8fce0db7ad Update attacked_text.py
add words_diff_num
2021-06-16 08:40:28 +08:00
Yanjun Qi / Jane
9a67b9eff0 Merge pull request #428 from QData/api-rework
API Rework
2021-06-10 06:15:51 -04:00
Jin Yong Yoo
7a315ecfe2 [TEST] Update expected test output 2021-06-10 00:40:47 -04:00
Yanjun Qi
6295e633ce update two tutorials' headers 2021-06-09 22:27:45 -04:00
Jin Yong Yoo
17eb7fc846 [TEST] Update expected output 2021-06-09 21:41:08 -04:00
Jin Yong Yoo
120025d8c0 [DOC, CODE] fix documentation and minor bugs 2021-06-09 19:38:12 -04:00
Jin Yong Yoo
da8d5d06a0 [DOC] resolve merge conflicts 2021-06-09 17:20:52 -04:00
Jin Yong Yoo
819f5c0e4a [DOC] Update notebooks and docstrings 2021-06-09 17:06:57 -04:00
XinzheL
82f73a7324 fix bugs in doc for Allennlp 2021-06-08 15:17:57 +10:00
Yanjun Qi
abdf349256 Update index.rst 2021-06-04 01:50:19 -04:00
Yanjun Qi
d75a27c9d1 add in the complete API with auto-api generated rst files 2021-06-04 01:50:19 -04:00
Jin Yong Yoo
c2b11f386b Change to get_train_dataloader 2021-06-04 01:50:18 -04:00
Jin Yong Yoo
ae5ee0672c fix segfault bug with PyTorch Tensor 2021-06-04 01:50:18 -04:00
diegoc
f208318e35 Keras Fix for api rework 2021-06-04 01:50:18 -04:00
Tahmid Kazi
c9cf6d7394 Fixed Notebook Tutorial 1 2021-06-04 01:49:40 -04:00
Jin Yong Yoo
b430d76d25 WIP: update API reference 2021-06-04 01:49:40 -04:00
Jin Yong Yoo
1d09830091 fix bugs 2021-06-04 01:49:40 -04:00
Jin Yong Yoo
70362230e5 WIP: update API docs 2021-06-04 01:49:39 -04:00
Jin Yong Yoo
2c471f55f5 fix path issues, hf model predictions, etc. 2021-06-04 01:47:24 -04:00
Jin Yong Yoo
0eaa5883f8 revert changes to attack-recipes 2021-06-04 01:47:24 -04:00
Jin Yong Yoo
1d8b72b85d make trainer extendable 2021-06-04 01:47:23 -04:00
Tahmid Kazi
72409a0dd9 Newer Update to the sklearn tutorial with the rotten_tomatoes dataset 2021-06-04 01:47:23 -04:00
Tahmid Kazi
9310492974 Update Example_1_sklearn.ipynb
Made the necessary changes to enable it to work with the new API
2021-06-04 01:47:23 -04:00
Jin Yong Yoo
0f39de49f7 fix issues to pass tests 2021-06-04 01:47:22 -04:00
Jin Yong Yoo
3854f8f770 undo accidental directory-level autoreplacements 2021-06-04 01:47:00 -04:00
Jin Yong Yoo
3a375a3d8e finish trainer with cmd-line command 2021-06-04 01:47:00 -04:00
Jin Yong Yoo
4fbb09c9ac WIP: Trainer class feature complete 2021-06-04 01:47:00 -04:00
Mint Lin
429254d873 add output scale factor assignment 2021-06-04 01:45:11 -04:00
Will Peterson
1f709ff0fa Adding Tutorial 4: Tensorflow, Tutorial 8: Attacking BERT models w/ CAptum 2021-06-04 01:45:11 -04:00
Will Peterson
25d3296979 Adapted Examples: 2, 6 (AllenNLP), 7 (Multilingual) to be compatible with attack-api. Still need to finish revising code for Examples 0, 8 2021-06-04 01:33:17 -04:00
Jin Yong Yoo
21054c99c2 fix wandb logging bug 2021-06-04 01:33:17 -04:00
Jin Yong Yoo
cc5ff46226 fix formatting 2021-06-04 01:33:16 -04:00
Jin Yong Yoo
3c40ce8605 add text window for masked LM transformations 2021-06-04 01:33:16 -04:00
Jin Yong Yoo
a32263a20e fix bug with HFDataset 2021-06-04 01:33:15 -04:00
Jin Yong Yoo
fccf60dfd4 WIP: Add an example for parallel attack 2021-06-04 01:32:33 -04:00
Jin Yong Yoo
58d6ea6652 WIP: Get parallel attacks working 2021-06-04 01:32:33 -04:00
Jin Yong Yoo
de88a5ee23 WIP: finish attack args, attacker, cli commands 2021-06-04 01:32:33 -04:00
Jin Yong Yoo
d394d048ac WIP: completed attack, dataset, model, logging args 2021-06-04 01:31:09 -04:00
Jin Yong Yoo
65010cf562 WIP: reactor cmd line args 2021-06-04 01:31:09 -04:00
Jin Yong Yoo
2744fd3ca8 WIP: add datasets and attacker 2021-06-04 01:25:49 -04:00
Jin Yong Yoo
10f54a16f4 add datasets 2021-06-04 01:25:48 -04:00
Jack Morris
4015ad23b4 Merge pull request #448 from Ashiq5/Ashiq5-patch-1
Proposed solution to Issue #447
2021-05-30 10:28:28 -04:00
Md. Ishtiaq Ashiq
3f224bce50 formatting fixed 2021-05-29 13:59:27 -04:00
Jack Morris
937ba0d6b8 release v0.2.16 2021-05-23 22:18:45 -04:00
Jack Morris
d5e9048a48 Merge pull request #466 from QData/qwerty-bug
return [] when not in adj list
2021-05-23 20:50:10 -04:00
Jack Morris
5030d52997 Merge pull request #467 from QData/max-logic-fix
UntargetedClassification: Don't check self.target_max_score when it is already known to be None
2021-05-23 20:49:54 -04:00
Jack Morris
c7e15279a0 simplify (self.x or y) to just y when self.x is None 2021-05-23 20:08:05 -04:00
Jack Morris
723bf6c191 return [] when not in adj list 2021-05-23 19:59:54 -04:00
Jack Morris
6950621a95 Merge pull request #465 from QData/run-training-bug
Run training bug
2021-05-23 19:49:59 -04:00
Jack Morris
e9fde9912e Merge pull request #464 from QData/fast-alzantot-doc-fix
Rename all instances of faster-alzantot to fast-alzantot
2021-05-23 19:49:47 -04:00
Jack Morris
fe8bd660e4 Merge pull request #461 from FrancoMuniz/patch-1
Quick fix for issue #424
2021-05-23 19:45:13 -04:00
Jack Morris
facad249d8 args.global_step -> global_step 2021-05-23 19:44:35 -04:00
Jack Morris
ed303a58a1 Rename all instances of faster-alzantot to fast-alzantot 2021-05-23 19:37:59 -04:00
Franco Muñiz
b8a5830c7f Quick fix for issue #424
https://github.com/QData/TextAttack/issues/424
2021-05-10 17:53:28 -03:00
Jin Yong Yoo
10309a32de Merge pull request #452 from QData/doc0414
fix minor documentation errors
2021-04-23 23:12:20 +09:00
U-DESKTOP-99E94MF\matrix
fa4ccb2f97 fix all_words_diff problem 2021-04-22 08:54:47 +08:00
Yanjun Qi
1021c5643e Update benchmark-search.md 2021-04-21 12:42:18 -04:00
ashiq
dc45bb346d minor 2021-04-15 22:54:13 -04:00
Md. Ishtiaq Ashiq
5453e1e05f Issue #447 2021-04-14 19:16:34 -04:00
Md. Ishtiaq Ashiq
ef72936a83 Issue #447
Checking if the model_from_file argument exists or not
2021-04-14 19:14:28 -04:00
Yanjun Qi / Jane
4da3a257ab Merge pull request #446 from QData/doc0414
add in talks-visualization.md for our tutorial talks on TextAttack
2021-04-14 15:17:45 -04:00
Yanjun Qi
12b76d3c3b add in talks-visualization.md for our tutorial talks on TextAttack 2021-04-14 14:04:57 -04:00
Yanjun Qi / Jane
ae68c81411 Update README.md
update citation to EMNLP version
2021-04-09 09:05:54 -04:00
Yanjun Qi / Jane
8e18c3c850 Merge pull request #442 from P3n9W31/example_5_explain_BERT_fix
Fix errors in Example_5_Explain_BERT
2021-04-08 23:09:05 -04:00
ZhanPw
aa736d26b4 Fix errors in Example_5_Explain_BERT
calculate(input_ids, token_type_ids, attention_mask) miss an important parameter 'position_ids' of Huggingface BERT model, lead to wrong results (logit and Visualization). Fix it.

get_text() is just never called. Delete it.

deepcopy of the model is unnecessary. Delete it.

colored text can not display correctly in notebook. Fix it.

A lot of useless test code. Delete them.

Attribution score of perturbed text is calculated wrongly.  Fix it.

Some attribute methods that are commented out cannot be used directly and may be misleading. Deleted and provided a link to captum document.
2021-04-09 00:43:56 +08:00
Yanjun Qi / Jane
6c3359a33d Merge pull request #441 from QData/doc_configure_issues
Update requirements.txt
2021-04-05 14:11:22 -04:00
Yanjun Qi
ac306c7973 Update setup.py 2021-04-05 13:03:48 -04:00
Yanjun Qi
3b6de0b28a locally test all passed... 2021-04-05 12:56:59 -04:00
Yanjun Qi
62a99ae7de Update requirements.txt 2021-04-05 12:07:12 -04:00
Yanjun Qi
e6b3709b31 Revert "correct wrong library mentions of docs"
This reverts commit 678cf4ba93.
2021-04-05 11:55:32 -04:00
Yanjun Qi
9f84732045 Revert "Update requirements.txt"
This reverts commit d04c740f11.
2021-04-05 11:55:28 -04:00
Yanjun Qi
d04c740f11 Update requirements.txt 2021-04-05 11:40:00 -04:00
Yanjun Qi
678cf4ba93 correct wrong library mentions of docs 2021-04-05 11:22:55 -04:00
Yanjun Qi / Jane
749c023c61 Merge pull request #440 from nithvijay/cuda-installation-documentation
CUDA installation documentation
2021-04-02 18:23:38 -04:00
Yanjun Qi / Jane
9b7fb83ab4 Merge pull request #433 from cogeid/new-tutorial-custom
New tutorial on custom dataset & word embedding
2021-04-02 18:23:14 -04:00
Yanjun Qi
0c9356d316 add the custom-notebook in index.rst and correct keras notebook index 2021-04-02 17:31:39 -04:00
Nithin Vijayakumar
32c25bbb58 Merge branch 'master' of https://github.com/QData/TextAttack into cuda-installation-documentation 2021-04-02 15:46:25 -04:00
Yanjun Qi / Jane
055505955e Merge pull request #439 from QData/revert-400-specify_split
Revert "add --split to specify train/test/dev dataset"
2021-04-02 11:52:46 -04:00
Jin Yong Yoo
aa0d39bdfb Revert "add --split to specify train/test/dev dataset" 2021-04-02 23:30:52 +09:00
Yanjun Qi / Jane
77a244d5cc Merge pull request #400 from QData/specify_split
add --split to specify train/test/dev dataset
2021-04-02 10:19:19 -04:00
Nithin Vijayakumar
67e1af6c02 Merge branch 'master' into cuda-installation-documentation 2021-04-02 01:23:09 -04:00
Nithin Vijayakumar
3172bf0bb9 initial cuda installation instructions 2021-04-02 01:17:25 -04:00
Nithin Vijayakumar
022c69fb5e autobuild cli changed 2021-04-01 22:48:10 -04:00
MintForever
211abc5a67 remove unnecessary print statements 2021-03-30 21:38:19 -04:00
Yanjun Qi
1855169799 Update 4_Custom_Datasets_Word_Embedding.ipynb 2021-03-30 13:55:59 -04:00
Yanjun Qi
b1dd4dfbc7 Update 4_Custom_Datasets_Word_Embedding.ipynb 2021-03-30 11:47:49 -04:00
Yanjun Qi
96cf4e42b8 Merge branch 'new-tutorial-custom' of https://github.com/cogeid/TextAttack into pr/433 2021-03-30 11:40:56 -04:00
Yanjun Qi
db58b2f567 Revert "Add chinese readme document"
This reverts commit 153e21171b.
2021-03-30 11:35:28 -04:00
Opdoop
3a26b143b8 Add chinese readme document
Signed-off-by: Opdoop <247536381@qq.com>
2021-03-30 11:35:28 -04:00
diegoc
a1d8c4d347 New tutorial on custom dataset & word embedding
Adds a new notebook on how to use textattack with custom dataset and word embedding
2021-03-30 11:35:28 -04:00
Yanjun Qi
0e7a831325 Revert "Add chinese readme document"
This reverts commit 153e21171b.
2021-03-30 11:33:47 -04:00
Opdoop
153e21171b Add chinese readme document
Signed-off-by: Opdoop <247536381@qq.com>
2021-03-30 11:29:23 -04:00
Yanjun Qi / Jane
e343f6dbfd Merge pull request #434 from sanchit97/fix-torch-version
Fix torch version to 1.7.1
2021-03-21 10:59:00 -04:00
sanchit97
93a8c5cd42 Fix torch version to 1.7.1
Signed-off-by: sanchit97 <sanchit15083@iiitd.ac.in>
2021-03-21 02:14:46 -04:00
diegoc
6fa44c1270 New tutorial on custom dataset & word embedding
Adds a new notebook on how to use textattack with custom dataset and word embedding
2021-03-19 22:41:44 +08:00
Jack Morris
54cbfbcafc Update word_swap_wordnet.py 2021-03-12 10:48:12 -05:00
Jack Morris
ba8a755024 Update word_innerswap_random.py 2021-03-12 10:47:37 -05:00
alexander.zap
7a43aa049d Merge branch 'master' of https://github.com/QData/TextAttack into multilingual 2021-03-11 15:11:21 +01:00
Yanjun Qi / Jane
30e21cd85d Merge pull request #425 from QData/free-runner-space
Update run-pytest.yml
2021-02-24 12:05:16 -05:00
Jin Yong Yoo
ca8b990c40 Update run-pytest.yml 2021-02-25 00:00:57 +09:00
Yanjun Qi / Jane
9c01755aa2 delete stale.yml
the stable bot does not work well in our context.. So we abandon it for now.
2021-02-22 12:12:27 -05:00
Jin Yong Yoo
eebf2071b9 Merge pull request #417 from alexander-zap/fix_transformation_masked_lm
Fixed bug where in masked_lm transformations only subwords were candidates for top_words
2021-02-16 00:32:18 +09:00
Yanjun Qi / Jane
ad2cf4a49c Merge pull request #418 from alexander-zap/word_embedding_nn_performance
Increased performance of nearest_neighbours method in GensimWordEmbeddings
2021-02-15 10:31:02 -05:00
alexander.zap
693d8dd6eb Merge remote-tracking branch 'origin/word_embedding_nn_performance' into word_embedding_nn_performance 2021-02-15 11:03:35 +01:00
alexander.zap
494caeaf59 Increased performance of nearest_neighbours method in GensimWordEmbedding
- replaced index search of word in index2word with a call to word2index
2021-02-15 11:03:25 +01:00
alexander.zap
203dba9b4d Merge remote-tracking branch 'origin/fix_transformation_masked_lm' into fix_transformation_masked_lm 2021-02-15 11:03:04 +01:00
alexander.zap
86a3d3116e Fixed bug where only subwords were candidates for top_words 2021-02-15 11:02:44 +01:00
alexander.zap
c0cb92dc9e Merge remote-tracking branch 'origin/multilingual' into multilingual 2021-02-15 10:56:10 +01:00
alexander.zap
a013da9cf6 Fixed flake8 error (undefined variable and formatting) 2021-02-15 10:55:00 +01:00
alexander-zap
bb4b76aa41 Delete language.py 2021-02-15 10:55:00 +01:00
alexander.zap
44cc852366 Multilingual adjustment for most constraints
- the used models/languages can be passed in the __init__
2021-02-15 10:55:00 +01:00
alexander.zap
19c165ef61 Multilingual adjustment for most constraints
- the used models/languages can be passed in the __init__
2021-02-15 10:55:00 +01:00
alexander.zap
2d7421c290 Multilingual adjustment for constraints
- the used models/languages can be passed in the __init__
- fixed tensorflow_text error in MultilingualUniversalSentenceEncoder (does not work with LazyLoader)
2021-02-15 10:55:00 +01:00
Jin Yong Yoo
9ea4e7ef42 introduce new download method from any url 2021-02-15 10:55:00 +01:00
Jin Yong Yoo
ab698022c0 start multi-lingual branch 2021-02-15 10:55:00 +01:00
Yanjun Qi
5780b177ef Update requirements.txt
add sphinx-markdown-tables in dependency
2021-02-13 22:46:41 -05:00
Yanjun Qi
c2bc09f528 Revert "bypass the sphinx issue on ubuntu-latest to ubuntu-18.04"
This reverts commit 91e364aa64.
2021-02-13 22:44:38 -05:00
Yanjun Qi
91e364aa64 bypass the sphinx issue on ubuntu-latest to ubuntu-18.04 2021-02-13 22:32:24 -05:00
Yanjun Qi
2884cbf65f sphinx-markdown-tables runs fine on readthedoc , but not in Github Workflow.. 2021-02-13 22:25:41 -05:00
Yanjun Qi / Jane
2009a77f7d Merge pull request #419 from tahmid-kazi/patch-4
Update Model Zoo README.md with NLP Tasks table
2021-02-13 20:50:59 -05:00
Yanjun Qi
47a03e3e40 add in sphinx-markdown-tables 2021-02-13 20:49:54 -05:00
Yanjun Qi
20f79e1af5 add more details in the model page on readthedoc 2021-02-13 20:13:15 -05:00
Yanjun Qi
e32476a8af fix minor errors 2021-02-13 20:08:31 -05:00
Yanjun Qi
29bf26f326 make all url smaller font in models/readme.md 2021-02-13 16:37:59 -05:00
Yanjun Qi
4019eb1479 reduce font of Model/readme.md 2021-02-13 16:24:33 -05:00
Yanjun Qi
aa2ccb5734 Update README.md 2021-02-13 15:51:23 -05:00
Jin Yong Yoo
261c02640f Merge pull request #421 from QData/fix-test
update test output to ignore num_queries
2021-02-13 02:07:45 +09:00
Jin Yong Yoo
50daaa91e7 update test to fix inconsistency introduced by new stanza resource 2021-02-12 11:27:54 -05:00
Jin Yong Yoo
08302ea336 update test 2021-02-12 10:01:22 -05:00
Jin Yong Yoo
4316c02110 update test output to ignore num_queries 2021-02-12 01:19:40 -05:00
alexander.zap
c27e460762 Fixed flake8 error (undefined variable and formatting) 2021-02-11 13:37:18 +01:00
alexander-zap
611496a2f6 Delete language.py 2021-02-11 12:46:07 +01:00
alexander.zap
d163b8ab40 Multilingual adjustment for most constraints
- the used models/languages can be passed in the __init__
2021-02-11 12:03:51 +01:00
alexander.zap
2e28f3c4a5 Merge branch 'master' of https://github.com/QData/TextAttack into multilingual 2021-02-11 11:57:29 +01:00
alexander.zap
030a17df40 Multilingual adjustment for most constraints
- the used models/languages can be passed in the __init__
2021-02-11 11:49:18 +01:00
Tahmid Kazi
3828c745f3 Update README.md
Added the Table comparing TextAttack models with paperswithcode.com SOTA, organized by NLP tasks
2021-02-10 23:49:05 -05:00
alexander.zap
e0633b8ec5 Multilingual adjustment for constraints
- the used models/languages can be passed in the __init__
- fixed tensorflow_text error in MultilingualUniversalSentenceEncoder (does not work with LazyLoader)
2021-02-10 14:53:53 +01:00
alexander.zap
1092c2f479 Increased performance of nearest_neighbours method in GensimWordEmbedding
- replaced index search of word in index2word with a call to word2index
2021-02-10 13:01:51 +01:00
alexander.zap
b0a9c97be9 Fixed bug where only subwords were candidates for top_words 2021-02-10 11:02:22 +01:00
Yanjun Qi / Jane
b7b036aa36 Merge pull request #403 from willyptrain/keras-custom-attacks
Running TextAttack attacks on Custom Keras Model Wrapper using Huggin…
2021-01-15 12:37:07 -05:00
Yanjun Qi
4509e9aa57 add open-in-colab link 2021-01-15 12:36:11 -05:00
Will Peterson
6713f087d8 Correcting mistype in Example_6_Keras filename to fix file-not-recognized error 2021-01-14 10:39:00 -05:00
Will Peterson
9cd54c2625 Added Example 6 - Keras Example to the toctree documentation 2021-01-14 10:22:48 -05:00
Will Peterson
3a5351a349 Added additional text alongside code to provide explanation and table of contents 2021-01-13 15:15:55 -05:00
MintForever
2ef1905ce5 format code 2021-01-08 18:09:35 -05:00
MintForever
870200cc96 modified test_attack to specify which split of dataset to use during pytest 2021-01-08 18:07:14 -05:00
MintForever
fa68f522c6 Merge branch 'specify_split' of https://github.com/QData/TextAttack into specify_split 2021-01-05 22:13:16 -05:00
MintForever
026e20c6b9 fix error 2021-01-05 21:58:35 -05:00
MintForever
fb745638c1 remove test file 2021-01-05 21:58:35 -05:00
MintForever
ff37cded04 fix argument passing 2021-01-05 21:58:35 -05:00
Hanyu Liu
06189a40f9 formatting 2021-01-05 21:58:35 -05:00
Mint Lin
0acd8e42dc fix multiple argument for split 2021-01-05 21:58:35 -05:00
MintForever
62a1a15717 add --split 2021-01-05 21:58:35 -05:00
MintForever
f6a8d264fd fix error 2021-01-05 18:25:24 -05:00
MintForever
2db7ebd533 remove test file 2021-01-02 16:04:52 -05:00
MintForever
ef42e02417 fix argument passing 2021-01-02 16:03:47 -05:00
MintForever
e618838fa8 Merge branch 'specify_split' of https://github.com/QData/TextAttack into specify_split 2021-01-02 14:51:26 -05:00
MintForever
0603d7c132 Merge remote-tracking branch 'origin/master' into specify_split 2021-01-02 14:12:53 -05:00
Hanyu Liu
e2396ab2dc formatting 2020-12-30 16:02:56 -05:00
Yanjun Qi / Jane
994a56caf2 fix the wrong anchor in Readme 2020-12-29 10:45:06 -05:00
Jin Yong Yoo
a64af14da5 introduce new download method from any url 2020-12-29 04:18:45 -05:00
Jin Yong Yoo
f757af005f start multi-lingual branch 2020-12-29 04:05:17 -05:00
Yanjun Qi / Jane
748ec1c7e7 Create stale.yml
set up an automatic workflow to mark stale issues
2020-12-28 16:59:57 -05:00
Yanjun Qi / Jane
a17f702e01 Create codeql-analysis.yml
create automated code vulnerability scanning analysis
2020-12-28 16:53:17 -05:00
MintForever
bdc29925a5 Merge branch 'specify_split' of https://github.com/QData/TextAttack into specify_split 2020-12-28 12:51:29 -05:00
Will Peterson
ebf1620a59 Running TextAttack attacks on Custom Keras Model Wrapper using HuggingFace dataset and PWWSRen2019 recipe 2020-12-28 10:28:57 -05:00
Mint Lin
eb8172a64f fix multiple argument for split 2020-12-28 10:14:17 -05:00
MintForever
83416735c4 Merge branch 'master' of https://github.com/QData/TextAttack into specify_split 2020-12-27 23:36:55 -05:00
Jin Yong Yoo
6a98570c66 Update publish-to-pypi.yml 2020-12-27 14:16:10 +09:00
Jin Yong Yoo
1adc909d74 Update version to 0.2.15 2020-12-26 23:00:02 -05:00
Jin Yong Yoo
4222393d03 Merge pull request #402 from QData/update-transformers-version
Update version requirement for 🤗 transformers
2020-12-27 12:54:53 +09:00
Jin Yong Yoo
d6cdae4091 relax transformers packag requirement, remove BERTForSequenceClassifcation 2020-12-25 06:39:22 -05:00
Jin Yong Yoo
88fb9853bf Merge pull request #399 from alexander-zap/gensim_support_fasttext
Added support for FastText word vectors in GensimWordEmbedding class
2020-12-24 16:11:16 +09:00
MintForever
c5863a020c add --split 2020-12-23 17:03:15 -05:00
alexander.zap
b47bf6051a Reformatting 2020-12-23 14:54:21 +01:00
alexander.zap
89d47b9def Gensim word embedding tests now pass pre-loaded Word2VecKeyedVectors 2020-12-23 14:31:25 +01:00
alexander.zap
d66215d334 Added support for FastText word vectors in GensimWordEmbedding class
- __init__ now requires passing pre-loaded WordEmbeddingsKeyedVectors
- passing the path of a saved word embedding file is not supported anymore due to specific loading logic of different embedding types
2020-12-23 13:37:04 +01:00
Jin Yong Yoo
3a27cb0d36 Merge pull request #396 from QData/fix-mlm-transformations
Fix Bug in Masked LM Transformations
2020-12-22 14:04:40 +09:00
Jin Yong Yoo
15ad4298a0 fix bug that adds unsanitized tokens instead of words 2020-12-21 12:46:01 -05:00
Jin Yong Yoo
d7422ba61c change to descending argsort 2020-12-20 07:57:43 -05:00
Jack Morris
a029964dc7 Merge pull request #361 from Opdoop/patch-1
Change check_robustness dataset
2020-12-19 14:48:28 -05:00
Yanjun Qi / Jane
c40e7f83c1 Merge pull request #394 from QData/pr-template
Add PR template
2020-12-19 12:35:39 -05:00
Jin Yong Yoo
04437d7193 Update pull_request_template.md 2020-12-19 15:07:52 +09:00
Jin Yong Yoo
47c17586cd better template 2020-12-19 01:05:28 -05:00
Jin Yong Yoo
7408bf0199 rename template file and remove ex 2020-12-19 00:57:27 -05:00
Jin Yong Yoo
fb039c6b7b add PR template 2020-12-19 00:53:25 -05:00
Yanjun Qi
ffd75ce3e5 Correct the wrong mention of "True Positive" in the model_zoo.md 2020-12-18 11:23:14 -05:00
Jin Yong Yoo
86f3294125 Merge pull request #392 from QData/clare-augmentatiom
Clare augmentation recipe
2020-12-18 23:39:32 +09:00
Hanyu Liu
eb51ca31ba more fixing... 2020-12-18 01:41:14 -05:00
Hanyu Liu
b619607255 Update recipes.py 2020-12-18 01:32:40 -05:00
Hanyu Liu
cb2237e9b0 Update recipes.py 2020-12-18 00:56:47 -05:00
Hanyu Liu
1909a874ee Update recipes.py 2020-12-18 00:55:57 -05:00
Hanyu Liu
1565374241 Update recipes.py 2020-12-18 00:44:01 -05:00
Hanyu Liu
a6b3d8c16d Fix errors 2020-12-18 00:36:49 -05:00
Hanyu Liu
8ddac0eb88 Merge branch 'clare-augmentatiom' of https://github.com/QData/TextAttack into clare-augmentatiom 2020-12-17 17:36:23 -05:00
Hanyu Liu
48a04b4969 Update recipes.py 2020-12-17 17:36:05 -05:00
Hanyu-Liu-123
77ae894957 Update README.md 2020-12-17 17:35:02 -05:00
Hanyu Liu
adae04ac1a formating 2020-12-17 17:27:59 -05:00
Hanyu Liu
ca5f15ce90 add clare augmentation recipe 2020-12-17 17:20:23 -05:00
Yanjun Qi / Jane
03f64b022e Merge pull request #390 from QData/qiyanjun-issue-templates
Add issue templates
2020-12-17 12:49:40 -05:00
Jin Yong Yoo
cef0bb02e2 Update bug_report.md 2020-12-18 02:24:05 +09:00
Yanjun Qi / Jane
4bfed05499 Add issue templates 2020-12-17 11:02:26 -05:00
Jack Morris
953878dcaa Merge pull request #381 from QData/fix-issue-365
make get_grad normal method
2020-12-16 10:51:09 -05:00
Jack Morris
b4cecad98c Merge pull request #382 from QData/fix-issue-374
fix tag type used by WordSwapHowNet
2020-12-15 19:54:35 -05:00
Yanjun Qi
2773cdd1d1 adding two analysis papers' Github URLs in readme / urging users to be mindful in setting up constraints 2020-12-15 17:47:07 -05:00
Yanjun Qi / Jane
0e38e8dc9b Merge pull request #384 from QData/newnewdoc
add a new doc page on evaluating the quality of generated adversarial…
2020-12-15 17:42:19 -05:00
Yanjun Qi
8a0acc24d2 adding reevaluating analysis in Readme.md / urging researchers to pay attention to constraint thresholds and quality 2020-12-15 17:42:02 -05:00
Yanjun Qi
9fda56c685 add a new doc page on evaluating the quality of generated adversarial examples from SOTA recipes. 2020-12-15 16:34:33 -05:00
Jin Yong Yoo
c739959ff6 fix tag type used by WordSwapHowNet 2020-12-15 06:59:18 -05:00
Jin Yong Yoo
c8253ff520 make get_grad normal method 2020-12-15 06:11:55 -05:00
Jin Yong Yoo
7670b8ea24 Merge pull request #376 from lethaiq/hotfix_single_character_perturbed_homoglyph
Fix issue #371: add homoglyph characters list to words_from_text() function
2020-12-15 13:44:30 +09:00
Jin Yong Yoo
451978b88e Merge pull request #380 from QData/update-readme
Update README.md
2020-12-15 13:02:59 +09:00
leqthai
4d19b60c89 Reformat 2020-12-14 22:47:43 -05:00
Jin Yong Yoo
5f4a0eb379 Update README.md 2020-12-15 12:04:59 +09:00
Yanjun Qi / Jane
b8cf9ba646 Merge pull request #378 from QData/newnewdoc
correct mentions of BlackBox NLP
2020-12-14 11:48:53 -05:00
Yanjun Qi
e49f3c0da9 Merge branch 'master' of https://github.com/QData/TextAttack 2020-12-14 09:28:02 -05:00
Yanjun Qi
544a36c0bc correct the EMNLP BlackBoxNLP mentions. 2020-12-14 09:28:00 -05:00
leqthai
a47e90c868 Fix issue #371 function words_from_text() does not account possible Homoglyph characters 2020-12-13 23:30:41 -05:00
Yanjun Qi / Jane
82e8f51c90 Merge pull request #373 from QData/update-recipes
update recipe parameters
2020-12-12 15:12:18 -05:00
Jin Yong Yoo
d6e4dab61a update recipe parameters 2020-12-12 10:11:16 -05:00
Yanjun Qi / Jane
04b7c6f79b Merge pull request #356 from QData/clare
Add CLARE Attack
2020-12-12 09:19:38 -05:00
Jin Yong Yoo
bdbd370499 formatting 2020-12-11 10:30:29 -05:00
Jin Yong Yoo
5f978edb2a resolve bugs with LM predictions 2020-12-11 09:24:55 -05:00
Jin Yong Yoo
b120fb5159 fix documentation errors 2020-12-11 09:24:54 -05:00
Jin Yong Yoo
3426940895 fix more bugs 2020-12-11 09:24:54 -05:00
Jin Yong Yoo
68246ea202 fix bugs 2020-12-11 09:24:54 -05:00
Jin Yong Yoo
7047f44829 wip: fix word merge 2020-12-11 09:14:40 -05:00
Jin Yong Yoo
e2012aa2e4 wip 2020-12-11 09:12:10 -05:00
Jin Yong Yoo
8814393093 WIP: organize transformations 2020-12-11 09:12:10 -05:00
Hanyu Liu
d44b2ccf75 Add POS Order Constraint to Masked_Merge 2020-12-11 09:10:27 -05:00
Hanyu Liu
aef7af6c9a add word_merge_maked_Im
still need to add PSO constraint!
2020-12-11 09:09:41 -05:00
Hanyu Liu
da1b22627a add word_insertion_masked_lm and command line stuff 2020-12-11 09:07:10 -05:00
Jin Yong Yoo
0dbbcd5c9b add masked-lm-replacement for clare 2020-12-11 08:42:03 -05:00
Hanyu Liu
850bc31388 fix errors! 2020-12-11 08:37:11 -05:00
Hanyu Liu
15a11c4a4d Change superclass of Insertion and Merge 2020-12-11 08:37:11 -05:00
Hanyu Liu
27c52e2cdf Add POS Order Constraint to Masked_Merge 2020-12-11 08:37:11 -05:00
Hanyu Liu
3bfb42ccdf add word_merge_maked_Im
still need to add PSO constraint!
2020-12-11 08:37:11 -05:00
Hanyu Liu
7f5ea106e8 add word_insertion_masked_lm and command line stuff 2020-12-11 08:37:10 -05:00
Jin Yong Yoo
973e4d5fe6 add masked-lm-replacement for clare 2020-12-11 08:37:10 -05:00
Yanjun Qi
0d69fd2733 Update benchmark-search.md 2020-12-10 10:53:42 -05:00
Yanjun Qi
9931b90594 Update make-docs.yml 2020-12-10 10:47:10 -05:00
Yanjun Qi / Jane
cb5a123fac Merge pull request #369 from DerekChia/patch-1
Update attacks4Components.md
2020-12-10 10:16:56 -05:00
Yanjun Qi
4ab2b96d68 Update README.md 2020-12-10 10:15:36 -05:00
Derek Chia
2e8d6170c7 Update attacks4Components.md
Add missing word
2020-12-10 17:32:07 +08:00
Opdoop
94f0208b1c Change check_robustness dataset
As Adversarial training using training data, although here has a random sample, it's more reasonable to use eval data to check robustness.
2020-12-06 14:59:11 +08:00
Yanjun Qi
0cc049a8c7 update make-docs.yml to avoid install dependency issue 2020-12-05 08:04:51 -05:00
Yanjun Qi
4fcdc4be10 Revert "remove version for build-doc"
This reverts commit 2df9a3ba43.
2020-12-05 07:41:00 -05:00
Yanjun Qi
2df9a3ba43 remove version for build-doc 2020-12-05 07:36:06 -05:00
Yanjun Qi
7f7e10831d fix the confusing word use of "Successes" to "TP/P" 2020-12-04 23:59:17 -05:00
288 changed files with 33303 additions and 15901 deletions

31
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@@ -0,0 +1,31 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: ''
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Run following command `textattack ...`
2. Run following code ...
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots or Traceback**
If applicable, add screenshots to help explain your problem. Also, copy and paste tracebacks produced by the bug.
**System Information (please complete the following information):**
- OS: [e.g. MacOS, Linux, Windows]
- Library versions (e.g. `torch==1.7.0, transformers==3.3.0`)
- Textattack version
**Additional context**
Add any other context about the problem here.

10
.github/ISSUE_TEMPLATE/custom.md vendored Normal file
View File

@@ -0,0 +1,10 @@
---
name: Custom issue template
about: Describe this issue template's purpose here.
title: ''
labels: ''
assignees: ''
---

View File

@@ -0,0 +1,20 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

22
.github/pull_request_template.md vendored Normal file
View File

@@ -0,0 +1,22 @@
# What does this PR do?
## Summary
*Example: This PR adds [CLARE](https://arxiv.org/abs/2009.07502) attack, which uses distilled RoBERTa masked language model to perform word swaps, word insertions, word merges (which is where we combine two adjacent words and replace it with another word) in a greedy manner. s*
## Additions
- *Example: Added `clare` recipe as `textattack.attack_recipes.CLARE2020`.*
## Changes
- *Example: `WordSwapMaskedLM` has been updated to have a minimum confidence score cutoff and batching has been added for faster performance.*
## Deletions
- *Example: Remove unnecessary files under `textattack.models...`*
## Checklist
- [ ] The title of your pull request should be a summary of its contribution.
- [ ] Please write detailed description of what parts have been newly added and what parts have been modified. Please also explain why certain changes were made.
- [ ] If your pull request addresses an issue, please mention the issue number in the pull request description to make sure they are linked (and people consulting the issue know you are working on it)
- [ ] To indicate a work in progress please mark it as a draft on Github.
- [ ] Make sure existing tests pass.
- [ ] Add relevant tests. No quality testing = no merge.
- [ ] All public methods must have informative docstrings that work nicely with sphinx. For new modules/files, please add/modify the appropriate `.rst` file in `TextAttack/docs/apidoc`.'

67
.github/workflows/codeql-analysis.yml vendored Normal file
View File

@@ -0,0 +1,67 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"
on:
push:
branches: [ master, master* ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ master ]
schedule:
- cron: '24 1 * * 0'
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
language: [ 'python' ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python' ]
# Learn more:
# https://docs.github.com/en/free-pro-team@latest/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning#changing-the-languages-that-are-analyzed
steps:
- name: Checkout repository
uses: actions/checkout@v2
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v1
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v1
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
# ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
# and modify them (or add more) to build your code if your project
# uses a compiled language
#- run: |
# make bootstrap
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v1

View File

@@ -31,7 +31,7 @@ jobs:
python -m pip install --upgrade pip setuptools wheel # update python
pip install ipython --upgrade # needed for Github for whatever reason
python setup.py install_egg_info # Workaround https://github.com/pypa/pip/issues/4537
pip install -e . ".[dev]" # This should install all packages for development
pip install -e .[dev]
pip install jupyter 'ipykernel<5.0.0' 'ipython<7.0.0' # ipykernel workaround: github.com/jupyter/notebook/issues/4050
- name: Build docs with Sphinx and check for errors
run: |

View File

@@ -5,7 +5,7 @@ name: Upload Python Package to PyPI
on:
release:
types: [created]
types: [published]
jobs:
deploy:

View File

@@ -31,6 +31,14 @@ jobs:
python setup.py install_egg_info # Workaround https://github.com/pypa/pip/issues/4537
pip install -e .[dev]
pip freeze
- name: Free disk space
run: |
sudo apt-get remove mysql-client libmysqlclient-dev -y >/dev/null 2>&1
sudo apt-get remove php* -y >/dev/null 2>&1
sudo apt-get autoremove -y >/dev/null 2>&1
sudo apt-get autoclean -y >/dev/null 2>&1
sudo rm -rf /usr/local/lib/android >/dev/null 2>&1
docker rmi $(docker image ls -aq) >/dev/null 2>&1
- name: Test with pytest
run: |
pytest tests -v

6
.gitignore vendored
View File

@@ -36,10 +36,14 @@ dist/
# Weights & Biases outputs
wandb/
# Tensorboard logs
runs/
# checkpoints
checkpoints/
# vim
*.swp
.vscode
.vscode
*.csv

View File

@@ -20,7 +20,7 @@ docs-check: FORCE ## Builds docs using Sphinx. If there is an error, exit with a
sphinx-build -b html docs docs/_build/html -W
docs-auto: FORCE ## Build docs using Sphinx and run hotreload server using Sphinx autobuild.
sphinx-autobuild docs docs/_build/html -H 0.0.0.0 -p 8765
sphinx-autobuild docs docs/_build/html --port 8765
all: format lint docs-check test ## Format, lint, and test.

127
README.md
View File

@@ -56,7 +56,9 @@ or via python module (`python -m textattack ...`).
> dataset samples, and the configuration file `config.yaml`. To change the cache path, set the
> environment variable `TA_CACHE_DIR`. (for example: `TA_CACHE_DIR=/tmp/ textattack attack ...`).
## Usage: `textattack --help`
## Usage
### Help: `textattack --help`
TextAttack's main features can all be accessed via the `textattack` command. Two very
common commands are `textattack attack <args>`, and `textattack augment <args>`. You can see more
@@ -69,13 +71,18 @@ or a specific command using, for example,
textattack attack --help
```
The [`examples/`](examples/) folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file. The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
The [`examples/`](examples/) folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file.
The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
### Running Attacks: `textattack attack --help`
The easiest way to try out an attack is via the command-line interface, `textattack attack`.
> **Tip:** If your machine has multiple GPUs, you can distribute the attack across them using the `--parallel` option. For some attacks, this can really help performance.
> **Tip:** If your machine has multiple GPUs, you can distribute the attack across them using the `--parallel` option. For some attacks, this can really help performance. (If you want to attack Keras models in parallel, please check out `examples/attack/attack_keras_parallel.py` instead)
Here are some concrete examples:
@@ -86,7 +93,7 @@ textattack attack --recipe textfooler --model bert-base-uncased-mr --num-example
*DeepWordBug on DistilBERT trained on the Quora Question Pairs paraphrase identification dataset*:
```bash
textattack attack --model distilbert-base-uncased-qqp --recipe deepwordbug --num-examples 100
textattack attack --model distilbert-base-uncased-cola --recipe deepwordbug --num-examples 100
```
*Beam search with beam width 4 and word embedding transformation and untargeted goal function on an LSTM*:
@@ -105,6 +112,7 @@ We include attack recipes which implement attacks from the literature. You can l
To run an attack recipe: `textattack attack --recipe [recipe_name]`
<img src="docs/_static/imgs/overview.png" alt="TextAttack Overview" style="display: block; margin: 0 auto;" />
<table style="width:100%" border="1">
<thead>
@@ -120,6 +128,15 @@ To run an attack recipe: `textattack attack --recipe [recipe_name]`
<tbody>
<tr><td style="text-align: center;" colspan="6"><strong><br>Attacks on classification tasks, like sentiment classification and entailment:<br></strong></td></tr>
<tr>
<td><code>a2t</code>
<span class="citation" data-cites="yoo2021a2t"></span></td>
<td><sub>Untargeted {Classification, Entailment}</sub></td>
<td><sub>Percentage of words perturbed, Word embedding distance, DistilBERT sentence encoding cosine similarity, part-of-speech consistency</sub></td>
<td><sub>Counter-fitted word embedding swap (or) BERT Masked Token Prediction</sub></td>
<td><sub>Greedy-WIR (gradient)</sub></td>
<td ><sub>from (["Towards Improving Adversarial Training of NLP Models" (Yoo et al., 2021)](https://arxiv.org/abs/2109.00544))</sub></td>
</tr>
<tr>
<td><code>alzantot</code> <span class="citation" data-cites="Alzantot2018GeneratingNL Jia2019CertifiedRT"></span></td>
<td><sub>Untargeted {Classification, Entailment}</sub></td>
@@ -153,10 +170,10 @@ To run an attack recipe: `textattack attack --recipe [recipe_name]`
<td ><sub>Invariance testing implemented in CheckList . (["Beyond Accuracy: Behavioral Testing of NLP models with CheckList" (Ribeiro et al., 2020)](https://arxiv.org/abs/2005.04118))</sub></td>
</tr>
<tr>
<td> <code>clare (*coming soon*)</code> <span class="citation" data-cites="Alzantot2018GeneratingNL Jia2019CertifiedRT"></span></td>
<td> <code>clare</code> <span class="citation" data-cites="Alzantot2018GeneratingNL Jia2019CertifiedRT"></span></td>
<td><sub>Untargeted {Classification, Entailment}</sub></td>
<td><sub>RoBERTa masked language model</sub></td>
<td><sub>word swap, insertion, and merge</sub></td>
<td><sub>USE sentence encoding cosine similarity</sub></td>
<td><sub>RoBERTa Masked Prediction for token swap, insert and merge</sub></td>
<td><sub>Greedy</sub></td>
<td ><sub>["Contextualized Perturbation for Textual Adversarial Attack" (Li et al., 2020)](https://arxiv.org/abs/2009.07502))</sub></td>
</tr>
@@ -294,13 +311,15 @@ textattack attack --model bert-base-uncased-sst2 --recipe textfooler --num-examp
### Augmenting Text: `textattack augment`
Many of the components of TextAttack are useful for data augmentation. The `textattack.Augmenter` class
uses a transformation and a list of constraints to augment data. We also offer five built-in recipes
uses a transformation and a list of constraints to augment data. We also offer built-in recipes
for data augmentation:
- `textattack.WordNetAugmenter` augments text by replacing words with WordNet synonyms
- `textattack.EmbeddingAugmenter` augments text by replacing words with neighbors in the counter-fitted embedding space, with a constraint to ensure their cosine similarity is at least 0.8
- `textattack.CharSwapAugmenter` augments text by substituting, deleting, inserting, and swapping adjacent characters
- `textattack.EasyDataAugmenter` augments text with a combination of word insertions, substitutions and deletions.
- `textattack.CheckListAugmenter` augments text by contraction/extension and by substituting names, locations, numbers.
- `wordnet` augments text by replacing words with WordNet synonyms
- `embedding` augments text by replacing words with neighbors in the counter-fitted embedding space, with a constraint to ensure their cosine similarity is at least 0.8
- `charswap` augments text by substituting, deleting, inserting, and swapping adjacent characters
- `eda` augments text with a combination of word insertions, substitutions and deletions.
- `checklist` augments text by contraction/extension and by substituting names, locations, numbers.
- `clare` augments text by replacing, inserting, and merging with a pre-trained masked language model.
#### Augmentation Command-Line Interface
The easiest way to use our data augmentation tools is with `textattack augment <args>`. `textattack augment`
@@ -319,10 +338,16 @@ For example, given the following as `examples.csv`:
"it's a mystery how the movie could be released in this condition .", 0
```
The command `textattack augment --csv examples.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
The command
```bash
textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original
```
will augment the `text` column by altering 10% of each example's words, generating twice as many augmentations as original inputs, and exclude the original inputs from the
output CSV. (All of this will be saved to `augment.csv` by default.)
> **Tip:** Just as running attacks interactively, you can also pass `--interactive` to augment samples inputted by the user to quickly try out different augmentation recipes!
After augmentation, here are the contents of `augment.csv`:
```csv
text,label
@@ -374,24 +399,23 @@ automatically loaded using the `datasets` package.
#### Training Examples
*Train our default LSTM for 50 epochs on the Yelp Polarity dataset:*
```bash
textattack train --model lstm --dataset yelp_polarity --batch-size 64 --epochs 50 --learning-rate 1e-5
textattack train --model-name-or-path lstm --dataset yelp_polarity --epochs 50 --learning-rate 1e-5
```
The training process has data augmentation built-in:
```bash
textattack train --model lstm --dataset rotten_tomatoes --augment eda --pct-words-to-swap .1 --transformations-per-example 4
```
This uses the `EasyDataAugmenter` recipe to augment the `rotten_tomatoes` dataset before training.
*Fine-Tune `bert-base` on the `CoLA` dataset for 5 epochs**:
```bash
textattack train --model bert-base-uncased --dataset glue^cola --batch-size 32 --epochs 5
textattack train --model-name-or-path bert-base-uncased --dataset glue^cola --per-device-train-batch-size 8 --epochs 5
```
### To check datasets: `textattack peek-dataset`
To take a closer look at a dataset, use `textattack peek-dataset`. TextAttack will print some cursory statistics about the inputs and outputs from the dataset. For example, `textattack peek-dataset --dataset-from-huggingface snli` will show information about the SNLI dataset from the NLP package.
To take a closer look at a dataset, use `textattack peek-dataset`. TextAttack will print some cursory statistics about the inputs and outputs from the dataset. For example,
```bash
textattack peek-dataset --dataset-from-huggingface snli
```
will show information about the SNLI dataset from the NLP package.
### To list functional components: `textattack list`
@@ -447,7 +471,7 @@ create a short file that loads them as variables `model` and `tokenizer`. The `
be able to transform string inputs to lists or tensors of IDs using a method called `encode()`. The
model must take inputs via the `__call__` method.
##### Model from a file
##### Custom Model from a file
To experiment with a model you've trained, you could create the following file
and name it `my_model.py`:
@@ -475,21 +499,25 @@ dataset = [('Today was....', 1), ('This movie is...', 0), ...]
You can then run attacks on samples from this dataset by adding the argument `--dataset-from-file my_dataset.py`.
#### Dataset loading via other mechanism, see: [more details at here](https://textattack.readthedocs.io/en/latest/api/datasets.html)
```python
import textattack
my_dataset = [("text",label),....]
new_dataset = textattack.datasets.Dataset(my_dataset)
```
#### Dataset via AttackedText class
To allow for word replacement after a sequence has been tokenized, we include an `AttackedText` object
which maintains both a list of tokens and the original text, with punctuation. We use this object in favor of a list of words or just raw text.
#### Dataset via Data Frames (*coming soon*)
### Attacks and how to design a new attack
The `attack_one` method in an `Attack` takes as input an `AttackedText`, and outputs either a `SuccessfulAttackResult` if it succeeds or a `FailedAttackResult` if it fails.
We formulate an attack as consisting of four components: a **goal function** which determines if the attack has succeeded, **constraints** defining which perturbations are valid, a **transformation** that generates potential modifications given an input, and a **search method** which traverses through the search space of possible perturbations. The attack attempts to perturb an input text such that the model output fulfills the goal function (i.e., indicating whether the attack is successful) and the perturbation adheres to the set of constraints (e.g., grammar constraint, semantic similarity constraint). A search method is used to find a sequence of transformations that produce a successful adversarial example.
@@ -518,24 +546,38 @@ A `Transformation` takes as input an `AttackedText` and returns a list of possib
A `SearchMethod` takes as input an initial `GoalFunctionResult` and returns a final `GoalFunctionResult` The search is given access to the `get_transformations` function, which takes as input an `AttackedText` object and outputs a list of possible transformations filtered by meeting all of the attacks constraints. A search consists of successive calls to `get_transformations` until the search succeeds (determined using `get_goal_results`) or is exhausted.
### Benchmarking Attacks
## On Benchmarking Attacks
- See our analysis paper: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples at [EMNLP BlackNLP](https://arxiv.org/abs/2009.06368).
- See our analysis paper: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples at [EMNLP BlackBoxNLP](https://arxiv.org/abs/2009.06368).
- As we emphasized in the above paper, we don't recommend to directly compare Attack Recipes out of the box.
- This comment is due to that attack recipes in the recent literature used different ways or thresholds in setting up their constraints. Without the constraint space held constant, an increase in attack success rate could from an improved search or transformation method or a less restrictive search space.
- This comment is due to that attack recipes in the recent literature used different ways or thresholds in setting up their constraints. Without the constraint space held constant, an increase in attack success rate could come from an improved search or transformation method or a less restrictive search space.
- Our Github on benchmarking scripts and results: [TextAttack-Search-Benchmark Github](https://github.com/QData/TextAttack-Search-Benchmark)
## On Quality of Generated Adversarial Examples in Natural Language
- Our analysis Paper in [EMNLP Findings](https://arxiv.org/abs/2004.14174)
- We analyze the generated adversarial examples of two state-of-the-art synonym substitution attacks. We find that their perturbations often do not preserve semantics, and 38% introduce grammatical errors. Human surveys reveal that to successfully preserve semantics, we need to significantly increase the minimum cosine similarities between the embeddings of swapped words and between the sentence encodings of original and perturbed sentences.With constraints adjusted to better preserve semantics and grammaticality, the attack success rate drops by over 70 percentage points.
- Our Github on Reevaluation results: [Reevaluating-NLP-Adversarial-Examples Github](https://github.com/QData/Reevaluating-NLP-Adversarial-Examples)
- As we have emphasized in this analysis paper, we recommend researchers and users to be EXTREMELY mindful on the quality of generated adversarial examples in natural language
- We recommend the field to use human-evaluation derived thresholds for setting up constraints
## Multi-lingual Support
- see example code: [https://github.com/QData/TextAttack/blob/master/examples/attack/attack_camembert.py](https://github.com/QData/TextAttack/blob/master/examples/attack/attack_camembert.py) for using our framework to attack French-BERT.
- see tutorial notebook: [https://textattack.readthedocs.io/en/latest/2notebook/Example_4_CamemBERT.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_4_CamemBERT.html) for using our framework to attack French-BERT.
- See [README_ZH.md](https://github.com/QData/TextAttack/blob/master/README_ZH.md) for our README in Chinese
## Contributing to TextAttack
We welcome suggestions and contributions! Submit an issue or pull request and we will do our best to respond in a timely manner. TextAttack is currently in an "alpha" stage in which we are working to improve its capabilities and design.
@@ -547,13 +589,12 @@ See [CONTRIBUTING.md](https://github.com/QData/TextAttack/blob/master/CONTRIBUTI
If you use TextAttack for your research, please cite [TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP](https://arxiv.org/abs/2005.05909).
```bibtex
@misc{morris2020textattack,
title={TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP},
author={John X. Morris and Eli Lifland and Jin Yong Yoo and Jake Grigsby and Di Jin and Yanjun Qi},
year={2020},
eprint={2005.05909},
archivePrefix={arXiv},
primaryClass={cs.CL}
@inproceedings{morris2020textattack,
title={TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP},
author={Morris, John and Lifland, Eli and Yoo, Jin Yong and Grigsby, Jake and Jin, Di and Qi, Yanjun},
booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
pages={119--126},
year={2020}
}
```

View File

@@ -82,7 +82,7 @@ textattack attack --recipe textfooler --model bert-base-uncased-mr --num-example
*对 Quora 问句对数据集上训练的 DistilBERT 模型进行 DeepWordBug 攻击*:
```bash
textattack attack --model distilbert-base-uncased-qqp --recipe deepwordbug --num-examples 100
textattack attack --model distilbert-base-uncased-cola --recipe deepwordbug --num-examples 100
```
*对 MR 数据集上训练的 LSTM 模型:设置束搜索宽度为 4使用词嵌入转换进行无目标攻击*:
@@ -103,6 +103,8 @@ textattack attack --model lstm-mr --num-examples 20 \
运行攻击策略:`textattack attack --recipe [recipe_name]`
<img src="docs/_static/imgs/overview.png" alt="TextAttack Overview" style="display: block; margin: 0 auto;" />
<table>
<thead>
<tr class="header">
@@ -292,11 +294,14 @@ textattack attack --model bert-base-uncased-sst2 --recipe textfooler --num-examp
### 增强文本数据:`textattack augment`
TextAttack 的组件中,有很多易用的数据增强工具。`textattack.Augmenter` 类使用 *变换* 与一系列的 *约束* 进行数据增强。我们提供了 5 中内置的数据增强策略:
- `textattack.WordNetAugmenter` 通过基于 WordNet 同义词替换的方式增强文本
- `textattack.EmbeddingAugmenter` 通过邻近词替换的方式增强文本,使用 counter-fitted 词嵌入空间中的邻近词进行替换,约束二者的 cosine 相似度不低于 0.8
- `textattack.CharSwapAugmenter` 通过字符的增删改,以及临近字符交换的方式增强文本
- `textattack.EasyDataAugmenter` 通过对词的增删改来增强文本
- `textattack.CheckListAugmenter` 通过简写,扩写以及对实体、地点、数字的替换来增强文本
- `wordnet` 通过基于 WordNet 同义词替换的方式增强文本
- `embedding` 通过邻近词替换的方式增强文本,使用 counter-fitted 词嵌入空间中的邻近词进行替换,约束二者的 cosine 相似度不低于 0.8
- `charswap` 通过字符的增删改,以及临近字符交换的方式增强文本
- `eda` 通过对词的增删改来增强文本
- `checklist` 通过简写,扩写以及对实体、地点、数字的替换来增强文本
- `clare` 使用 pre-trained masked language model, 通过对词的增删改来增强文本
#### 数据增强的命令行接口
使用 textattack 来进行数据增强,最快捷的方法是通过 `textattack augment <args>` 命令行接口。 `textattack augment` 使用 CSV 文件作为输入,在参数中设置需要增强的文本列,每个样本允许改变的比例,以及对于每个输入样本生成多少个增强样本。输出的结果保存为与输入文件格式一致的 CSV 文件,结果文件中为对指定的文本列生成的增强样本。
@@ -312,7 +317,7 @@ TextAttack 的组件中,有很多易用的数据增强工具。`textattack.Aug
"it's a mystery how the movie could be released in this condition .", 0
```
使用命令 `textattack augment --csv examples.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
使用命令 `textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original`
会增强 `text` 列,约束对样本中 10% 的词进行修改,生成输入数据两倍的样本,同时结果文件中不保存 csv 文件的原始输入。(默认所有结果将会保存在 `augment.csv` 文件中)
数据增强后,下面是 `augment.csv` 文件的内容:
@@ -362,18 +367,13 @@ it's a enigma how the filmmaking wo be publicized in this condition .,0
#### 运行训练的例子
*在 Yelp 分类数据集上对 TextAttack 中默认的 LSTM 模型训练 50 个 epoch*
```bash
textattack train --model lstm --dataset yelp_polarity --batch-size 64 --epochs 50 --learning-rate 1e-5
textattack train --model-name-or-path lstm --dataset yelp_polarity --epochs 50 --learning-rate 1e-5
```
训练接口中同样内置了数据增强功能:
```bash
textattack train --model lstm --dataset rotten_tomatoes --augment eda --pct-words-to-swap .1 --transformations-per-example 4
```
上面这个例子在训练之前使用 `EasyDataAugmenter` 策略对 `rotten_tomatoes` 数据集进行数据增强。
*在 `CoLA` 数据集上对 `bert-base` 模型精调 5 个 epoch*
```bash
textattack train --model bert-base-uncased --dataset glue^cola --batch-size 32 --epochs 5
textattack train --model-name-or-path bert-base-uncased --dataset glue^cola --per-device-train-batch-size 8 --epochs 5
```
@@ -456,8 +456,6 @@ dataset = [('Today was....', 1), ('This movie is...', 0), ...]
### 何为攻击 & 如何设计新的攻击
`Attack` 中的 `attack_one` 方法以 `AttackedText` 对象作为输入,若攻击成功,返回 `SuccessfulAttackResult`,若攻击失败,返回 `FailedAttackResult`
我们将攻击划分并定义为四个组成部分:**目标函数** 定义怎样的攻击是一次成功的攻击,**约束条件** 定义怎样的扰动是可行的,**变换规则** 对输入文本生成一系列可行的扰动结果,**搜索方法** 在搜索空间中遍历所有可行的扰动结果。每一次攻击都尝试对输入的文本添加扰动,使其通过目标函数(即判断攻击是否成功),并且扰动要符合约束(如语法约束,语义相似性约束)。最后用搜索方法在所有可行的变换结果中,挑选出优质的对抗样本。
@@ -487,7 +485,7 @@ TextAttack 是不依赖具体模型的,这意味着可以对任何深度学习
### 公平比较攻击策略Benchmarking Attacks
- 详细情况参见我们的分析文章Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples at [EMNLP BlackNLP](https://arxiv.org/abs/2009.06368).
- 详细情况参见我们的分析文章Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples at [EMNLP BlackBoxNLP](https://arxiv.org/abs/2009.06368).
- 正如我们在上面的文章中所强调的,我们不推荐在对攻击策略没有约束的情况下直接进行比较。

View File

@@ -17,10 +17,6 @@ Where should I start?
This is a great question, and one we get a lot. First of all, almost everything in TextAttack can be done in two ways: via the command-line or via the Python API. If you're looking to integrate TextAttack into an existing project, the Python API is likely for you. If you'd prefer to use built-in functionality end-to-end (training a model, running an adversarial attack, augmenting a CSV) then you can just use the command-line API.
For future developers, visit the :ref:`Installation <installation>` page for more details about installing TextAttack onto your own computer. To start making contributions, read the detailed instructions `here <https://github.com/QData/TextAttack/blob/master/CONTRIBUTING.md>`__.
TextAttack does three things very well:
1. Adversarial attacks (Python: ``textattack.shared.Attack``, Bash: ``textattack attack``)

View File

@@ -9,6 +9,13 @@ just about anything TextAttack offers in a single bash command.
> can access all the same functionality by prepending `python -m` to the command
> (`python -m textattack ...`).
> The [`examples/`](https://github.com/QData/TextAttack/tree/master/examples) folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file.
> The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
To see all available commands, type `textattack --help`. This page explains
some of the most important functionalities of textattack: NLP data augmentation,
adversarial attacks, and training and evaluating models.
@@ -33,7 +40,7 @@ For example, given the following as `examples.csv`:
The command:
```
textattack augment --csv examples.csv --input-column text --recipe eda --pct-words-to-swap .1 \
textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe eda --pct-words-to-swap .1 \
--transformations-per-example 2 --exclude-original
```
will augment the `text` column with 10% of words edited per augmentation, twice as many augmentations as original inputs, and exclude the original inputs from the

View File

@@ -0,0 +1,80 @@
Installation
==============
To use TextAttack, you must be running Python 3.6 or above. A CUDA-compatible GPU is optional but will greatly improve speed.
We recommend installing TextAttack in a virtual environment (check out this [guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/)).
There are two ways to install TextAttack. If you want to simply use as it is, install via `pip`. If you want to make any changes and play around, install it from source.
## Install with pip
Simply run
pip install textattack
## Install from Source
To install TextAttack from source, first clone the repo by running
git clone https://github.com/QData/TextAttack.git
cd TextAttack
Then, install it using `pip`.
pip install -e .
To install TextAttack for further development, please run this instead.
pip install -e .[dev]
This installs additional dependencies required for development.
## Optional Dependencies
For quick installation, TextAttack only installs esssential packages as dependencies (e.g. Transformers, PyTorch). However, you might need to install additional packages to run certain attacks or features.
For example, Tensorflow and Tensorflow Hub are required to use the TextFooler attack, which was proposed in [Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment](https://arxiv.org/abs/1907.11932) by Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolov.
If you attempting to use a feature that requires additional dependencies, TextAttack will let you know which ones you need to install.
However, during installation step, you can also install them together with TextAttack.
You can install Tensorflow and its related packages by running
pip install textattack[tensorflow]
You can also install other miscallenous optional dependencies by running
pip install textattack[optional]
To install both groups of packages, run
pip install textattack[tensorflow,optional]
## FAQ on installation
For many of the dependent library issues, the following command is the first you could try:
```bash
pip install --force-reinstall textattack
```
OR
```bash
pip install textattack[tensorflow,optional]
```
Besides, we highly recommend you to use virtual environment for textattack use,
see [information here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#removing-an-environment). Here is one conda example:
```bash
conda create -n textattackenv python=3.7
conda activate textattackenv
conda env list
```
If you want to use the most-up-to-date version of textattack (normally with newer bug fixes), you can run the following:
```bash
git clone https://github.com/QData/TextAttack.git
cd TextAttack
pip install .[dev]
```

View File

@@ -0,0 +1,38 @@
Quick Tour
==========================
Let us have a quick look at how TextAttack can be used to carry out adversarial attack.
Attacking a BERT model
------------------------------
Let us attack a BERT model fine-tuned for sentimental classification task. We are going to use a model that has already been fine-tuned on IMDB dataset using the Transformers library.
.. code-block::
>>> import transformers
>>> model = transformers.AutoModelForSequenceClassification.from_pretrained("textattack/bert-base-uncased-imdb")
>>> tokenizer = transformers.AutoTokenizer.from_pretrained("textattack/bert-base-uncased-imdb")
TextAttack requires both the model and the tokenizer to be wrapped by a :class:`~transformers.models.wrapper.ModelWrapper` class that implements the forward pass operation given a list of input texts. For models provided by Transformers library, we can also simply use :class:`~transformers.models.wrapper.HuggingFaceModelWrapper` class which implements both the forward pass and tokenization.
.. code-block::
>>> import textattack
>>> model_wrapper = textattack.models.wrappers.HuggingFaceModelWrapper(model, tokenizer)
Next, let's build the attack that we want to use. TextAttack provides prebuilt attacks in the form of :class:`~transformers.attack_recipes.AttackRecipe`. For this example, we will use :ref:TextFooler attack
.. code-block::
>>> dataset = textattack.datasets.HuggingFaceDataset("imdb", split="test")
>>> attack = textattack.attack_recipes.TextFoolerJin2019.build(model_wrapper)
>>> # Attack 20 samples with CSV logging and checkpoint saved every 5 interval
>>> # Attack 20 samples with CSV logging and checkpoint saved every 5 interval
>>> attack_args = textattack.AttackArgs(num_examples=20, log_to_csv="log.csv", checkpoint_interval=5, checkpoint_dir="checkpoints", disable_stdout=True)
>>> attacker = textattack.Attacker(attack, dataset, attack_args)
>>> attacker.attack_dataset()
.. image:: ../_static/imgs/overview.png

View File

@@ -28,25 +28,46 @@ For help and realtime updates related to TextAttack, please [join the TextAttack
## More Concrete Questions:
### 0. For many of the dependent library issues, the following command is the first you could try:
```bash
pip install --force-reinstall textattack
```
OR
```bash
pip install textattack[tensorflow,optional]
```
Besides, we highly recommend you to use virtual environment for textattack use,
see [information here](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#removing-an-environment). Here is one conda example:
```bash
conda create -n textattackenv python=3.7
conda activate textattackenv
conda env list
```
If you want to use the most-up-to-date version of textattack (normally with newer bug fixes), you can run the following:
```bash
git clone https://github.com/QData/TextAttack.git
cd TextAttack
pip install .[dev]
```
### 1. How to Train
For example, you can *Train our default LSTM for 50 epochs on the Yelp Polarity dataset:*
```bash
textattack train --model lstm --dataset yelp_polarity --batch-size 64 --epochs 50 --learning-rate 1e-5
textattack train --model-name-or-path lstm --dataset yelp_polarity --epochs 50 --learning-rate 1e-5
```
The training process has data augmentation built-in:
*Fine-Tune `bert-base` on the `CoLA` dataset for 5 epochs*:
```bash
textattack train --model lstm --dataset rotten_tomatoes --augment eda --pct-words-to-swap .1 --transformations-per-example 4
textattack train --model-name-or-path bert-base-uncased --dataset glue^cola --per-device-train-batch-size 8 --epochs 5
```
This uses the `EasyDataAugmenter` recipe to augment the `rotten_tomatoes` dataset before training.
*Fine-Tune `bert-base` on the `CoLA` dataset for 5 epochs**:
```bash
textattack train --model bert-base-uncased --dataset glue^cola --batch-size 32 --epochs 5
```
### 2. Use Custom Models
@@ -89,18 +110,25 @@ You can then run attacks on samples from this dataset by adding the argument `--
#### Dataset loading via other mechanism, see: [more details at here](https://textattack.readthedocs.io/en/latest/api/datasets.html)
```python
import textattack
my_dataset = [("text",label),....]
new_dataset = textattack.datasets.Dataset(my_dataset)
```
#### Custom Dataset via AttackedText class
To allow for word replacement after a sequence has been tokenized, we include an `AttackedText` object
which maintains both a list of tokens and the original text, with punctuation. We use this object in favor of a list of words or just raw text.
#### Custome Dataset via Data Frames or other python data objects (*coming soon*)
### 4. Benchmarking Attacks
- See our analysis paper: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples at [EMNLP BlackNLP](https://arxiv.org/abs/2009.06368).
- See our analysis paper: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples at [EMNLP BlackBoxNLP](https://arxiv.org/abs/2009.06368).
- As we emphasized in the above paper, we don't recommend to directly compare Attack Recipes out of the box.
@@ -121,3 +149,9 @@ This modular design unifies adversarial attack methods into one system, enables
### 6. The attacking is too slow
- **Tip:** If your machine has multiple GPUs, you can distribute the attack across them using the `--parallel` option. For some attacks, this can really help performance.
- If you want to attack Keras models in parallel, please check out `examples/attack/attack_keras_parallel.py` instead. (This is a hotfix for issues caused by a recent update of Keras in TF)

View File

@@ -7,6 +7,24 @@ Lessons learned in designing TextAttack
TextAttack is an open-source Python toolkit for adversarial attacks, adversarial training, and data augmentation in NLP. TextAttack unites 15+ papers from the NLP adversarial attack literature into a single shared framework, with many components reused across attacks. This framework allows both researchers and developers to test and study the weaknesses of their NLP models.
## Presentations on TextAttack
### 2020: Jack Morris' summary tutorial talk on TextAttack
- On Jul 31, 2020, Jack Morries gave an invited talk at Weights & Biases research salon on " TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP"
- [Youtube Talk link](https://www.youtube.com/watch?v=22Q3f7Fb110)
### 2021: Dr. Qi's summary tutorial talk on TextAttack
- On April 14 2021, Prof. Qi gave an invited talk at the UVA Human and Machine Intelligence Seminar on "Generalizing Adversarial Examples to Natural Language Processing"
- [TalkSlide](https://qdata.github.io/qdata-page/pic/20210414-HMI-textAttack.pdf)
## Challenges in Design

View File

@@ -12,8 +12,16 @@ This modular design enables us to easily assemble attacks from the literature wh
![two-categorized-attacks](/_static/imgs/intro/01-categorized-attacks.png)
- You can create one new attack (in one line of code!!!) from composing members of four components we proposed, for instance:
```bash
# Shows how to build an attack from components and use it on a pre-trained model on the Yelp dataset.
textattack attack --attack-n --model bert-base-uncased-yelp --num-examples 8 \
--goal-function untargeted-classification \
--transformation word-swap-wordnet \
--constraints edit-distance^12 max-words-perturbed^max_percent=0.75 repeat stopword \
--search greedy
```
### Goal Functions
@@ -35,11 +43,11 @@ A `SearchMethod` takes as input an initial `GoalFunctionResult` and returns a fi
### On Benchmarking Attack Recipes
- Please read our analysis paper: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples at [EMNLP BlackNLP](https://arxiv.org/abs/2009.06368).
- Please read our analysis paper: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples at [EMNLP BlackBoxNLP](https://arxiv.org/abs/2009.06368).
- As we emphasized in the above paper, we don't recommend to directly compare Attack Recipes out of the box.
- This is due to that attack recipes in the recent literature used different ways or thresholds in setting up their constraints. Without the constraint space held constant, an increase in attack success rate could from an improved search or a better transformation method or a less restrictive search space.
- This is due to that attack recipes in the recent literature used different ways or thresholds in setting up their constraints. Without the constraint space held constant, an increase in attack success rate could come from an improved search or a better transformation method or a less restrictive search space.
@@ -98,12 +106,12 @@ A `SearchMethod` takes as input an initial `GoalFunctionResult` and returns a fi
<td style="text-align: left;"><sub>Greedy-WIR</sub></td>
<td ><sub>Invariance testing implemented in CheckList . (["Beyond Accuracy: Behavioral Testing of NLP models with CheckList" (Ribeiro et al., 2020)](https://arxiv.org/abs/2005.04118))</sub></td>
</tr>
<tr class="even">
<td style="text-align: left;"> <code>clare (*coming soon*)</code> <span class="citation" data-cites="Alzantot2018GeneratingNL Jia2019CertifiedRT"></span></td>
<td style="text-align: left;"><sub>Untargeted {Classification, Entailment}</sub></td>
<td style="text-align: left;"><sub>RoBERTa masked language model</sub></td>
<td style="text-align: left;"><sub>word swap, insertion, and merge</sub></td>
<td style="text-align: left;"><sub>Greedy</sub></td>
<tr>
<td> <code>clare</code> <span class="citation" data-cites="Alzantot2018GeneratingNL Jia2019CertifiedRT"></span></td>
<td><sub>Untargeted {Classification, Entailment}</sub></td>
<td><sub>USE sentence encoding cosine similarity</sub></td>
<td><sub>RoBERTa Masked Prediction for token swap, insert and merge</sub></td>
<td><sub>Greedy</sub></td>
<td ><sub>["Contextualized Perturbation for Textual Adversarial Attack" (Li et al., 2020)](https://arxiv.org/abs/2009.07502))</sub></td>
</tr>
<tr class="odd">
@@ -234,4 +242,4 @@ A `SearchMethod` takes as input an initial `GoalFunctionResult` and returns a fi
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
```

View File

@@ -2,7 +2,7 @@ Benchmarking Search Algorithms for Generating NLP Adversarial Examples
=========================================================================
*This documentation page was adapted from Our Paper in [EMNLP BlackNLP](https://arxiv.org/abs/2009.06368).*
*This documentation page was adapted from Our Paper in [EMNLP BlackBoxNLP](https://arxiv.org/abs/2009.06368).*
### Title: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples
@@ -11,9 +11,6 @@ Benchmarking Search Algorithms for Generating NLP Adversarial Examples
- Abstract: We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, search space, and search budget. When new search methods are proposed in past work, the attack search space is often modified alongside the search method. Without ablation studies benchmarking the search algorithm change with the search space held constant, an increase in attack success rate could from an improved search method or a less restrictive search space. Additionally, many previous studies fail to properly consider the search algorithms' run-time cost, which is essential for downstream tasks like adversarial training. Our experiments provide a reproducible benchmark of search algorithms across a variety of search spaces and query budgets to guide future research in adversarial NLP. Based on our experiments, we recommend greedy attacks with word importance ranking when under a time constraint or attacking long inputs, and either beam search or particle swarm optimization otherwise.
- As we emphasized in the above paper, we don't recommend to directly compare Attack Recipes out of the box, because attack recipes in the recent literature used different ways or thresholds in setting up their constraints.
+ Citations:
```
@misc{yoo2020searching,
@@ -28,7 +25,7 @@ Benchmarking Search Algorithms for Generating NLP Adversarial Examples
### Our search benchmarking result Github
`TextAttack-Search-Benchmark Github <https://github.com/QData/TextAttack-Search-Benchmark>`__
TextAttack-Search-Benchmark Github [https://github.com/QData/TextAttack-Search-Benchmark](https://github.com/QData/TextAttack-Search-Benchmark)
### Our benchmarking results on comparing search methods used in the past attacks.
@@ -43,3 +40,9 @@ Benchmarking Search Algorithms for Generating NLP Adversarial Examples
![Table1](/_static/imgs/benchmark/search-table1.png)
### Benchmarking Attack Recipes
- As we emphasized in the above paper, we don't recommend to directly compare Attack Recipes out of the box.
- This is due to that attack recipes in the recent literature used different ways or thresholds in setting up their constraints. Without the constraint space held constant, an increase in attack success rate could come from an improved search or transformation method or a less restrictive search space.

View File

@@ -1,20 +0,0 @@
.. _installation:
Installation
==============
To use TextAttack, you must be running Python 3.6+. Tensorflow needs to be installed for users, and Java needs to be installed for developers. A CUDA-compatible GPU is optional but will greatly improve speed. To install, simply run::
pip install textattack
You're now all set to use TextAttack! Try running an attack from the command line::
textattack attack --recipe textfooler --model bert-base-uncased-mr --num-examples 10
This will run an attack using the TextFooler_ recipe, attacking BERT fine-tuned on the MR dataset. It will attack the first 10 samples. Once everything downloads and starts running, you should see attack results print to ``stdout``.
Read on for more information on TextAttack, including how to use it from a Python script (``import textattack``).
.. _TextFooler: https://arxiv.org/abs/1907.11932

View File

@@ -0,0 +1,68 @@
TextAttack Extended Functions (Multilingual)
============================================
## Textattack Supports Multiple Model Types besides huggingface models and our textattack models:
- Example attacking TensorFlow models @ [https://textattack.readthedocs.io/en/latest/2notebook/Example_0_tensorflow.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_0_tensorflow.html)
- Example attacking scikit-learn models @ [https://textattack.readthedocs.io/en/latest/2notebook/Example_1_sklearn.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_1_sklearn.html)
- Example attacking AllenNLP models @ [https://textattack.readthedocs.io/en/latest/2notebook/Example_2_allennlp.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_2_allennlp.html)
- Example attacking Kera models @ [https://textattack.readthedocs.io/en/latest/2notebook/Example_3_Keras.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_3_Keras.html)
## Multilingual Supports
- see tutorial notebook for using our framework to attack French-BERT.: [https://textattack.readthedocs.io/en/latest/2notebook/Example_4_CamemBERT.html](https://textattack.readthedocs.io/en/latest/2notebook/Example_4_CamemBERT.html)
- see example code for using our framework to attack French-BERT: [https://github.com/QData/TextAttack/blob/master/examples/attack/attack_camembert.py](https://github.com/QData/TextAttack/blob/master/examples/attack/attack_camembert.py) .
## User defined custom inputs and models
### Custom Datasets: Dataset from a file
Loading a dataset from a file is very similar to loading a model from a file. A 'dataset' is any iterable of `(input, output)` pairs.
The following example would load a sentiment classification dataset from file `my_dataset.py`:
```python
dataset = [('Today was....', 1), ('This movie is...', 0), ...]
```
You can then run attacks on samples from this dataset by adding the argument `--dataset-from-file my_dataset.py`.
#### Custom Model: from a file
To experiment with a model you've trained, you could create the following file
and name it `my_model.py`:
```python
model = load_your_model_with_custom_code() # replace this line with your model loading code
tokenizer = load_your_tokenizer_with_custom_code() # replace this line with your tokenizer loading code
```
Then, run an attack with the argument `--model-from-file my_model.py`. The model and tokenizer will be loaded automatically.
## User defined Custom attack components
The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
- custom transformation example @ [https://textattack.readthedocs.io/en/latest/2notebook/1_Introduction_and_Transformations.html](https://textattack.readthedocs.io/en/latest/2notebook/1_Introduction_and_Transformations.html)
- custome constraint example @[https://textattack.readthedocs.io/en/latest/2notebook/2_Constraints.html#A-custom-constraint](https://textattack.readthedocs.io/en/latest/2notebook/2_Constraints.html#A-custom-constraint)
## Visulizing TextAttack generated Examples;
- You can visualize the generated adversarial examples vs. see examples, following visualization ways we provided here: [https://textattack.readthedocs.io/en/latest/2notebook/2_Constraints.html](https://textattack.readthedocs.io/en/latest/2notebook/2_Constraints.html)
- If you have webapp, we have also built a new WebDemo [TextAttack-WebDemo Github](https://github.com/QData/TextAttack-WebDemo) for visualizing generated adversarial examples from textattack..

View File

@@ -0,0 +1,45 @@
On Quality of Generated Adversarial Examples and How to Set Attack Contraints
==============================================================================
### Title: Reevaluating Adversarial Examples in Natural Language
- Paper [EMNLP Findings](https://arxiv.org/abs/2004.14174)
- Abstract: State-of-the-art attacks on NLP models lack a shared definition of a what constitutes a successful attack. We distill ideas from past work into a unified framework: a successful natural language adversarial example is a perturbation that fools the model and follows some linguistic constraints. We then analyze the outputs of two state-of-the-art synonym substitution attacks. We find that their perturbations often do not preserve semantics, and 38% introduce grammatical errors. Human surveys reveal that to successfully preserve semantics, we need to significantly increase the minimum cosine similarities between the embeddings of swapped words and between the sentence encodings of original and perturbed sentences.With constraints adjusted to better preserve semantics and grammaticality, the attack success rate drops by over 70 percentage points.
### Our Github on Reevaluation: [Reevaluating-NLP-Adversarial-Examples Github](https://github.com/QData/Reevaluating-NLP-Adversarial-Examples)
- Citations
```
@misc{morris2020reevaluating,
title={Reevaluating Adversarial Examples in Natural Language},
author={John X. Morris and Eli Lifland and Jack Lanchantin and Yangfeng Ji and Yanjun Qi},
year={2020},
eprint={2004.14174},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
### Some of our evaluation results on quality of two SOTA attack recipes
- As we have emphasized in this paper, we recommend researchers and users to be EXTREMELY mindful on the quality of generated adversarial examples in natural language
- We recommend the field to use human-evaluation derived thresholds for setting up constraints
![Table3](/_static/imgs/benchmark/table3.png)
![Table4](/_static/imgs/benchmark/table4.png)
### Some of our evaluation results on how to set constraints to evaluate NLP model's adversarial robustness
![Table5](/_static/imgs/benchmark/table5-main.png)
![Table7](/_static/imgs/benchmark/table7.png)
![Table9](/_static/imgs/benchmark/table9.png)

View File

@@ -45,7 +45,7 @@ How to Cite TextAttack
## Our Analysis paper: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples
- Paper [EMNLP BlackNLP](https://arxiv.org/abs/2009.06368)
- Paper [EMNLP BlackBoxNLP](https://arxiv.org/abs/2009.06368)
- Abstract: We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, search space, and search budget. When new search methods are proposed in past work, the attack search space is often modified alongside the search method. Without ablation studies benchmarking the search algorithm change with the search space held constant, an increase in attack success rate could from an improved search method or a less restrictive search space. Additionally, many previous studies fail to properly consider the search algorithms' run-time cost, which is essential for downstream tasks like adversarial training. Our experiments provide a reproducible benchmark of search algorithms across a variety of search spaces and query budgets to guide future research in adversarial NLP. Based on our experiments, we recommend greedy attacks with word importance ranking when under a time constraint or attacking long inputs, and either beam search or particle swarm optimization otherwise.

View File

@@ -70,7 +70,7 @@ TextAttack attack recipes that fall under this category: deepwordbug, hotflip, p
Some NLP models are trained to measure semantic similarity. Adversarial attacks based on the notion of semantic indistinguishability typically use another NLP model to enforce that perturbations are grammatically valid and semantically similar to the original input.
TextAttack attack recipes that fall under this category: alzantot, bae, bert-attack, faster-alzantot, iga, kuleshov, pso, pwws, textbugger\*, textfooler
TextAttack attack recipes that fall under this category: alzantot, bae, bert-attack, fast-alzantot, iga, kuleshov, pso, pwws, textbugger\*, textfooler
\*The textbugger attack generates perturbations using both typo-like character edits and synonym substitutions. It could be considered to use both definitions of indistinguishability.

File diff suppressed because it is too large Load Diff

View File

@@ -2,14 +2,18 @@
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "xK7B3NnYaPR6"
},
"source": [
"# The TextAttack ecosystem: search, transformations, and constraints"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "9rY3w9b2aPSG"
},
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/QData/TextAttack/blob/master/docs/2notebook/1_Introduction_and_Transformations.ipynb)\n",
"\n",
@@ -18,7 +22,18 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "urhoEHXJf8YK"
},
"source": [
"Installation of Attack-api branch"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HTe13zUKaPSH"
},
"source": [
"An attack in TextAttack consists of four parts.\n",
"\n",
@@ -38,7 +53,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "tiXXNJO4aPSI"
},
"source": [
"### A custom transformation\n",
"\n",
@@ -53,7 +70,9 @@
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"metadata": {
"id": "8r7zviXkaPSJ"
},
"outputs": [],
"source": [
"from textattack.transformations import WordSwap\n",
@@ -76,7 +95,7 @@
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
"id": "RHGvZxenaPSJ"
},
"source": [
"### Using our transformation\n",
@@ -89,159 +108,33 @@
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wREwoDkMaPSK",
"outputId": "4a8f74c7-c51a-4216-8435-be52d2165d4c"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[34;1mtextattack\u001b[0m: Goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'> compatible with model BertForSequenceClassification.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b537c513e8b3410eb2f7e3ec5df851fc",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=3939.0, style=ProgressStyle(description…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4f3b600b1f1b4a4da538f43582846964",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2486.0, style=ProgressStyle(description…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Using custom data configuration default\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Downloading and preparing dataset ag_news/default (download: 29.88 MiB, generated: 30.23 MiB, total: 60.10 MiB) to /u/edl9cy/.cache/huggingface/datasets/ag_news/default/0.0.0...\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "df8846bd027a457891dd665e3fd4156f",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=11045148.0, style=ProgressStyle(descrip…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e3a3710421f6423ba77fb3276b3240af",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=751209.0, style=ProgressStyle(descripti…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[34;1mtextattack\u001b[0m: Loading \u001b[94mnlp\u001b[0m dataset \u001b[94mag_news\u001b[0m, split \u001b[94mtest\u001b[0m.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset ag_news downloaded and prepared to /u/edl9cy/.cache/huggingface/datasets/ag_news/default/0.0.0. Subsequent calls will reuse this data.\n"
"textattack: Unknown if model of class <class 'transformers.models.bert.modeling_bert.BertForSequenceClassification'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.\n",
"Using custom data configuration default\n",
"Reusing dataset ag_news (/p/qdata/jy2ma/.cache/textattack/datasets/ag_news/default/0.0.0/0eeeaaa5fb6dffd81458e293dfea1adba2881ffcbdc3fb56baeb5a892566c29a)\n",
"textattack: Loading \u001b[94mdatasets\u001b[0m dataset \u001b[94mag_news\u001b[0m, split \u001b[94mtest\u001b[0m.\n"
]
}
],
"source": [
"# Import the model\n",
"import transformers\n",
"from textattack.models.tokenizers import AutoTokenizer\n",
"from textattack.models.wrappers import HuggingFaceModelWrapper\n",
"\n",
"model = transformers.AutoModelForSequenceClassification.from_pretrained(\"textattack/bert-base-uncased-ag-news\")\n",
"tokenizer = AutoTokenizer(\"textattack/bert-base-uncased-ag-news\")\n",
"tokenizer = transformers.AutoTokenizer.from_pretrained(\"textattack/bert-base-uncased-ag-news\")\n",
"\n",
"model_wrapper = HuggingFaceModelWrapper(model, tokenizer)\n",
"\n",
@@ -256,7 +149,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "sfGMvqcTaPSN"
},
"source": [
"### Creating the attack\n",
"Let's keep it simple: let's use a greedy search method, and let's not use any constraints for now. "
@@ -264,13 +159,15 @@
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"execution_count": 3,
"metadata": {
"id": "nSAHSoI_aPSO"
},
"outputs": [],
"source": [
"from textattack.search_methods import GreedySearch\n",
"from textattack.constraints.pre_transformation import RepeatModification, StopwordModification\n",
"from textattack.shared import Attack\n",
"from textattack import Attack\n",
"\n",
"# We're going to use our Banana word swap class as the attack transformation.\n",
"transformation = BananaWordSwap() \n",
@@ -285,15 +182,23 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "PqrHaZOaaPSO"
},
"source": [
"Let's print our attack to see all the parameters:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "d2qYOr0maPSP",
"outputId": "7266dc40-fc6c-4c78-90a8-8150e8fb5d8e"
},
"outputs": [
{
"name": "stdout",
@@ -315,9 +220,34 @@
"print(attack)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "m97uyJxDh1wq",
"outputId": "87ca8836-9781-4c5d-85f2-7ffbf4a7ef80"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(OrderedDict([('text', \"Fears for T N pension after talks Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul.\")]), 2)\n"
]
}
],
"source": [
"print(dataset[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "GYKoVFuXaPSP"
},
"source": [
"### Using the attack\n",
"\n",
@@ -326,23 +256,263 @@
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "LyokhnFtaPSQ",
"outputId": "d8a43c4f-1551-40c9-d031-a42b429ed33d"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" 0%| | 0/10 [00:00<?, ?it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 of 10 successes complete.\n",
"2 of 10 successes complete.\n",
"3 of 10 successes complete.\n",
"4 of 10 successes complete.\n",
"5 of 10 successes complete.\n",
"6 of 10 successes complete.\n",
"7 of 10 successes complete.\n",
"8 of 10 successes complete.\n",
"9 of 10 successes complete.\n",
"10 of 10 successes complete.\n"
"Attack(\n",
" (search_method): GreedySearch\n",
" (goal_function): UntargetedClassification\n",
" (transformation): BananaWordSwap\n",
" (constraints): \n",
" (0): RepeatModification\n",
" (1): StopwordModification\n",
" (is_black_box): True\n",
") \n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1: 10%|█ | 1/10 [00:01<00:14, 1.57s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 1 ---------------------------------------------\n",
"\u001b[94mBusiness (100%)\u001b[0m --> \u001b[91mWorld (89%)\u001b[0m\n",
"\n",
"Fears for T N \u001b[94mpension\u001b[0m after \u001b[94mtalks\u001b[0m \u001b[94mUnions\u001b[0m representing \u001b[94mworkers\u001b[0m at Turner Newall say they are '\u001b[94mdisappointed'\u001b[0m after talks with stricken parent firm Federal \u001b[94mMogul\u001b[0m.\n",
"\n",
"Fears for T N \u001b[91mbanana\u001b[0m after \u001b[91mbanana\u001b[0m \u001b[91mbanana\u001b[0m representing \u001b[91mbanana\u001b[0m at Turner Newall say they are '\u001b[91mbanana\u001b[0m after talks with stricken parent firm Federal \u001b[91mbanana\u001b[0m.\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 2 / 0 / 0 / 2: 20%|██ | 2/10 [00:13<00:53, 6.68s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 2 ---------------------------------------------\n",
"\u001b[35mSci/tech (100%)\u001b[0m --> \u001b[91mWorld (64%)\u001b[0m\n",
"\n",
"The Race is On: Second Private \u001b[35mTeam\u001b[0m Sets Launch \u001b[35mDate\u001b[0m for \u001b[35mHuman\u001b[0m \u001b[35mSpaceflight\u001b[0m (\u001b[35mSPACE\u001b[0m.\u001b[35mcom\u001b[0m) \u001b[35mSPACE\u001b[0m.\u001b[35mcom\u001b[0m - \u001b[35mTORONTO\u001b[0m, \u001b[35mCanada\u001b[0m -- \u001b[35mA\u001b[0m \u001b[35msecond\u001b[0m\\\u001b[35mteam\u001b[0m of rocketeers \u001b[35mcompeting\u001b[0m for the #36;10 million Ansari X \u001b[35mPrize\u001b[0m, a \u001b[35mcontest\u001b[0m for\\\u001b[35mprivately\u001b[0m funded \u001b[35msuborbital\u001b[0m \u001b[35mspace\u001b[0m \u001b[35mflight\u001b[0m, has officially \u001b[35mannounced\u001b[0m the first\\\u001b[35mlaunch\u001b[0m date for its \u001b[35mmanned\u001b[0m rocket.\n",
"\n",
"The Race is On: Second Private \u001b[91mbanana\u001b[0m Sets Launch \u001b[91mbanana\u001b[0m for \u001b[91mbanana\u001b[0m \u001b[91mbanana\u001b[0m (\u001b[91mbanana\u001b[0m.\u001b[91mbanana\u001b[0m) \u001b[91mbanana\u001b[0m.\u001b[91mbanana\u001b[0m - \u001b[91mbanana\u001b[0m, \u001b[91mbanana\u001b[0m -- \u001b[91mbanana\u001b[0m \u001b[91mbanana\u001b[0m\\\u001b[91mbanana\u001b[0m of rocketeers \u001b[91mbanana\u001b[0m for the #36;10 million Ansari X \u001b[91mbanana\u001b[0m, a \u001b[91mbanana\u001b[0m for\\\u001b[91mbanana\u001b[0m funded \u001b[91mbanana\u001b[0m \u001b[91mbanana\u001b[0m \u001b[91mbanana\u001b[0m, has officially \u001b[91mbanana\u001b[0m the first\\\u001b[91mbanana\u001b[0m date for its \u001b[91mbanana\u001b[0m rocket.\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 3 / 0 / 0 / 3: 30%|███ | 3/10 [00:18<00:42, 6.06s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 3 ---------------------------------------------\n",
"\u001b[35mSci/tech (100%)\u001b[0m --> \u001b[94mBusiness (77%)\u001b[0m\n",
"\n",
"Ky. Company Wins Grant to \u001b[35mStudy\u001b[0m \u001b[35mPeptides\u001b[0m (\u001b[35mAP\u001b[0m) \u001b[35mAP\u001b[0m - \u001b[35mA\u001b[0m company \u001b[35mfounded\u001b[0m by a \u001b[35mchemistry\u001b[0m \u001b[35mresearcher\u001b[0m at the \u001b[35mUniversity\u001b[0m of Louisville won a grant to develop a method of producing better \u001b[35mpeptides\u001b[0m, which are short chains of \u001b[35mamino\u001b[0m \u001b[35macids\u001b[0m, the building blocks of \u001b[35mproteins\u001b[0m.\n",
"\n",
"Ky. Company Wins Grant to \u001b[94mbanana\u001b[0m \u001b[94mbanana\u001b[0m (\u001b[94mbanana\u001b[0m) \u001b[94mbanana\u001b[0m - \u001b[94mbanana\u001b[0m company \u001b[94mbanana\u001b[0m by a \u001b[94mbanana\u001b[0m \u001b[94mbanana\u001b[0m at the \u001b[94mbanana\u001b[0m of Louisville won a grant to develop a method of producing better \u001b[94mbanana\u001b[0m, which are short chains of \u001b[94mbanana\u001b[0m \u001b[94mbanana\u001b[0m, the building blocks of \u001b[94mbanana\u001b[0m.\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 4 / 0 / 0 / 4: 40%|████ | 4/10 [00:20<00:30, 5.11s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 4 ---------------------------------------------\n",
"\u001b[35mSci/tech (100%)\u001b[0m --> \u001b[91mWorld (65%)\u001b[0m\n",
"\n",
"\u001b[35mPrediction\u001b[0m Unit Helps \u001b[35mForecast\u001b[0m Wildfires (AP) \u001b[35mAP\u001b[0m - It's barely dawn when Mike Fitzpatrick \u001b[35mstarts\u001b[0m his shift with a blur of colorful maps, figures and endless charts, but already he knows what the day will bring. Lightning will strike in places he expects. Winds will pick up, moist places will dry and flames will roar.\n",
"\n",
"\u001b[91mbanana\u001b[0m Unit Helps \u001b[91mbanana\u001b[0m Wildfires (AP) \u001b[91mbanana\u001b[0m - It's barely dawn when Mike Fitzpatrick \u001b[91mbanana\u001b[0m his shift with a blur of colorful maps, figures and endless charts, but already he knows what the day will bring. Lightning will strike in places he expects. Winds will pick up, moist places will dry and flames will roar.\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 5 / 0 / 0 / 5: 50%|█████ | 5/10 [00:22<00:22, 4.42s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 5 ---------------------------------------------\n",
"\u001b[35mSci/tech (100%)\u001b[0m --> \u001b[91mWorld (62%)\u001b[0m\n",
"\n",
"Calif. Aims to Limit Farm-Related \u001b[35mSmog\u001b[0m (AP) AP - Southern California's \u001b[35msmog-fighting\u001b[0m agency went after \u001b[35memissions\u001b[0m of the \u001b[35mbovine\u001b[0m variety Friday, adopting the nation's first rules to reduce air pollution from dairy cow manure.\n",
"\n",
"Calif. Aims to Limit Farm-Related \u001b[91mbanana\u001b[0m (AP) AP - Southern California's \u001b[91mbanana\u001b[0m agency went after \u001b[91mbanana\u001b[0m of the \u001b[91mbanana\u001b[0m variety Friday, adopting the nation's first rules to reduce air pollution from dairy cow manure.\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 6 / 0 / 0 / 6: 60%|██████ | 6/10 [00:54<00:36, 9.07s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 6 ---------------------------------------------\n",
"\u001b[35mSci/tech (100%)\u001b[0m --> \u001b[91mWorld (53%)\u001b[0m\n",
"\n",
"Open \u001b[35mLetter\u001b[0m Against \u001b[35mBritish\u001b[0m \u001b[35mCopyright\u001b[0m Indoctrination in Schools The \u001b[35mBritish\u001b[0m Department for Education and Skills (DfES) \u001b[35mrecently\u001b[0m \u001b[35mlaunched\u001b[0m a \"\u001b[35mMusic\u001b[0m \u001b[35mManifesto\u001b[0m\" campaign, with the ostensible \u001b[35mintention\u001b[0m of \u001b[35meducating\u001b[0m the \u001b[35mnext\u001b[0m \u001b[35mgeneration\u001b[0m of \u001b[35mBritish\u001b[0m \u001b[35mmusicians\u001b[0m. \u001b[35mUnfortunately\u001b[0m, they also teamed up with the \u001b[35mmusic\u001b[0m industry (\u001b[35mEMI\u001b[0m, and \u001b[35mvarious\u001b[0m \u001b[35martists\u001b[0m) to make this popular. \u001b[35mEMI\u001b[0m has \u001b[35mapparently\u001b[0m \u001b[35mnegotiated\u001b[0m their end well, so that \u001b[35mchildren\u001b[0m in our schools will now be indoctrinated about the illegality of \u001b[35mdownloading\u001b[0m music.The ignorance and audacity of this got to me a little, so I wrote an open letter to the DfES about it. Unfortunately, it's pedantic, as I suppose you have to be when writing to goverment representatives. But I hope you find it useful, and perhaps feel inspired to do something similar, if or when the same thing has happened in your area.\n",
"\n",
"Open \u001b[91mbanana\u001b[0m Against \u001b[91mbanana\u001b[0m \u001b[91mbanana\u001b[0m Indoctrination in Schools The \u001b[91mbanana\u001b[0m Department for Education and Skills (DfES) \u001b[91mbanana\u001b[0m \u001b[91mbanana\u001b[0m a \"\u001b[91mbanana\u001b[0m \u001b[91mbanana\u001b[0m\" campaign, with the ostensible \u001b[91mbanana\u001b[0m of \u001b[91mbanana\u001b[0m the \u001b[91mbanana\u001b[0m \u001b[91mbanana\u001b[0m of \u001b[91mbanana\u001b[0m \u001b[91mbanana\u001b[0m. \u001b[91mbanana\u001b[0m, they also teamed up with the \u001b[91mbanana\u001b[0m industry (\u001b[91mbanana\u001b[0m, and \u001b[91mbanana\u001b[0m \u001b[91mbanana\u001b[0m) to make this popular. \u001b[91mbanana\u001b[0m has \u001b[91mbanana\u001b[0m \u001b[91mbanana\u001b[0m their end well, so that \u001b[91mbanana\u001b[0m in our schools will now be indoctrinated about the illegality of \u001b[91mbanana\u001b[0m music.The ignorance and audacity of this got to me a little, so I wrote an open letter to the DfES about it. Unfortunately, it's pedantic, as I suppose you have to be when writing to goverment representatives. But I hope you find it useful, and perhaps feel inspired to do something similar, if or when the same thing has happened in your area.\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 6 / 1 / 0 / 7: 70%|███████ | 7/10 [01:47<00:46, 15.36s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 7 ---------------------------------------------\n",
"\u001b[35mSci/tech (100%)\u001b[0m --> \u001b[91m[FAILED]\u001b[0m\n",
"\n",
"Loosing the War on Terrorism \\\\\"Sven Jaschan, self-confessed author of the Netsky and Sasser viruses, is\\responsible for 70 percent of virus infections in 2004, according to a six-month\\virus roundup published Wednesday by antivirus company Sophos.\"\\\\\"The 18-year-old Jaschan was taken into custody in Germany in May by police who\\said he had admitted programming both the Netsky and Sasser worms, something\\experts at Microsoft confirmed. (A Microsoft antivirus reward program led to the\\teenager's arrest.) During the five months preceding Jaschan's capture, there\\were at least 25 variants of Netsky and one of the port-scanning network worm\\Sasser.\"\\\\\"Graham Cluley, senior technology consultant at Sophos, said it was staggeri ...\\\\\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 6 / 2 / 0 / 8: 80%|████████ | 8/10 [02:55<00:43, 21.96s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 8 ---------------------------------------------\n",
"\u001b[35mSci/tech (100%)\u001b[0m --> \u001b[91m[FAILED]\u001b[0m\n",
"\n",
"FOAFKey: FOAF, PGP, Key Distribution, and Bloom Filters \\\\FOAF/LOAF and bloom filters have a lot of interesting properties for social\\network and whitelist distribution.\\\\I think we can go one level higher though and include GPG/OpenPGP key\\fingerpring distribution in the FOAF file for simple web-of-trust based key\\distribution.\\\\What if we used FOAF and included the PGP key fingerprint(s) for identities?\\This could mean a lot. You include the PGP key fingerprints within the FOAF\\file of your direct friends and then include a bloom filter of the PGP key\\fingerprints of your entire whitelist (the source FOAF file would of course need\\to be encrypted ).\\\\Your whitelist would be populated from the social network as your client\\discovered new identit ...\\\\\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 7 / 2 / 0 / 9: 90%|█████████ | 9/10 [02:56<00:19, 19.57s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 9 ---------------------------------------------\n",
"\u001b[35mSci/tech (98%)\u001b[0m --> \u001b[91mWorld (100%)\u001b[0m\n",
"\n",
"\u001b[35mE-mail\u001b[0m scam targets police chief Wiltshire Police warns about \"\u001b[35mphishing\u001b[0m\" after its fraud squad chief was targeted.\n",
"\n",
"\u001b[91mbanana\u001b[0m scam targets police chief Wiltshire Police warns about \"\u001b[91mbanana\u001b[0m\" after its fraud squad chief was targeted.\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 8 / 2 / 0 / 10: 100%|██████████| 10/10 [02:56<00:00, 17.66s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 10 ---------------------------------------------\n",
"\u001b[35mSci/tech (98%)\u001b[0m --> \u001b[91mWorld (77%)\u001b[0m\n",
"\n",
"Card fraud unit nets 36,000 cards In its first two years, the UK's dedicated \u001b[35mcard\u001b[0m fraud unit, has recovered 36,000 stolen cards and 171 arrests - and estimates it saved 65m.\n",
"\n",
"Card fraud unit nets 36,000 cards In its first two years, the UK's dedicated \u001b[91mbanana\u001b[0m fraud unit, has recovered 36,000 stolen cards and 171 arrests - and estimates it saved 65m.\n",
"\n",
"\n",
"\n",
"+-------------------------------+--------+\n",
"| Attack Results | |\n",
"+-------------------------------+--------+\n",
"| Number of successful attacks: | 8 |\n",
"| Number of failed attacks: | 2 |\n",
"| Number of skipped attacks: | 0 |\n",
"| Original accuracy: | 100.0% |\n",
"| Accuracy under attack: | 20.0% |\n",
"| Attack success rate: | 80.0% |\n",
"| Average perturbed word %: | 18.71% |\n",
"| Average num. words per input: | 63.0 |\n",
"| Avg num queries: | 934.0 |\n",
"+-------------------------------+--------+\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
@@ -350,23 +520,38 @@
"from tqdm import tqdm # tqdm provides us a nice progress bar.\n",
"from textattack.loggers import CSVLogger # tracks a dataframe for us.\n",
"from textattack.attack_results import SuccessfulAttackResult\n",
"from textattack import Attacker\n",
"from textattack import AttackArgs\n",
"from textattack.datasets import Dataset\n",
"\n",
"results_iterable = attack.attack_dataset(dataset)\n",
"attack_args = AttackArgs(num_examples=10)\n",
"\n",
"logger = CSVLogger(color_method='html')\n",
"attacker = Attacker(attack, dataset, attack_args)\n",
"\n",
"num_successes = 0\n",
"while num_successes < 10:\n",
" result = next(results_iterable)\n",
" if isinstance(result, SuccessfulAttackResult):\n",
" logger.log_attack_result(result)\n",
" num_successes += 1\n",
" print(f'{num_successes} of 10 successes complete.')"
"attack_results = attacker.attack_dataset()\n",
"\n",
"#The following legacy tutorial code shows how the Attack API works in detail.\n",
"\n",
"#logger = CSVLogger(color_method='html')\n",
"\n",
"#num_successes = 0\n",
"#i = 0\n",
"#while num_successes < 10:\n",
" #result = next(results_iterable)\n",
"# example, ground_truth_output = dataset[i]\n",
"# i += 1\n",
"# result = attack.attack(example, ground_truth_output)\n",
"# if isinstance(result, SuccessfulAttackResult):\n",
"# logger.log_attack_result(result)\n",
"# num_successes += 1\n",
"# print(f'{num_successes} of 10 successes complete.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "oRRkNXYmaPSQ"
},
"source": [
"### Visualizing attack results\n",
"\n",
@@ -375,9 +560,23 @@
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "JafXMELLaPSR",
"outputId": "48178d1c-5ba9-45f9-b1be-dc6533462c95"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"textattack: Logging to CSV at path results.csv\n"
]
},
{
"data": {
"text/html": [
@@ -422,24 +621,24 @@
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td><font color = purple>Loosing</font> the <font color = purple>War</font> on <font color = purple>Terrorism</font> \\\\\"<font color = purple>Sven</font> <font color = purple>Jaschan</font>, <font color = purple>self-confessed</font> <font color = purple>author</font> of the <font color = purple>Netsky</font> and <font color = purple>Sasser</font> <font color = purple>viruses</font>, is\\<font color = purple>responsible</font> for <font color = purple>70</font> <font color = purple>percent</font> of <font color = purple>virus</font> <font color = purple>infections</font> in <font color = purple>2004</font>, <font color = purple>according</font> to a <font color = purple>six-month</font>\\<font color = purple>virus</font> <font color = purple>roundup</font> <font color = purple>published</font> <font color = purple>Wednesday</font> by <font color = purple>antivirus</font> <font color = purple>company</font> <font color = purple>Sophos</font>.\"\\\\\"<font color = purple>The</font> <font color = purple>18-year-old</font> <font color = purple>Jaschan</font> was <font color = purple>taken</font> into <font color = purple>custody</font> in <font color = purple>Germany</font> in <font color = purple>May</font> by <font color = purple>police</font> who\\<font color = purple>said</font> he had <font color = purple>admitted</font> <font color = purple>programming</font> both the <font color = purple>Netsky</font> and <font color = purple>Sasser</font> <font color = purple>worms</font>, <font color = purple>something</font>\\<font color = purple>experts</font> at <font color = purple>Microsoft</font> <font color = purple>confirmed</font>. (<font color = purple>A</font> <font color = purple>Microsoft</font> <font color = purple>antivirus</font> <font color = purple>reward</font> <font color = purple>program</font> <font color = purple>led</font> to the\\<font color = purple>teenager's</font> <font color = purple>arrest</font>.) <font color = purple>During</font> the <font color = purple>five</font> <font color = purple>months</font> <font color = purple>preceding</font> <font color = purple>Jaschan's</font> <font color = purple>capture</font>, there\\were at <font color = purple>least</font> <font color = purple>25</font> <font color = purple>variants</font> of <font color = purple>Netsky</font> and <font color = purple>one</font> of the <font color = purple>port-scanning</font> <font color = purple>network</font> <font color = purple>worm</font>\\<font color = purple>Sasser</font>.\"\\\\\"<font color = purple>Graham</font> <font color = purple>Cluley</font>, <font color = purple>senior</font> <font color = purple>technology</font> <font color = purple>consultant</font> at <font color = purple>Sophos</font>, <font color = purple>said</font> it was <font color = purple>staggeri</font> ...\\\\</td>\n",
" <td><font color = purple>banana</font> the <font color = purple>banana</font> on <font color = purple>banana</font> \\\\\"<font color = purple>banana</font> <font color = purple>banana</font>, <font color = purple>banana</font> <font color = purple>banana</font> of the <font color = purple>banana</font> and <font color = purple>banana</font> <font color = purple>banana</font>, is\\<font color = purple>banana</font> for <font color = purple>banana</font> <font color = purple>banana</font> of <font color = purple>banana</font> <font color = purple>banana</font> in <font color = purple>banana</font>, <font color = purple>banana</font> to a <font color = purple>banana</font>\\<font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> by <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font>.\"\\\\\"<font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> was <font color = purple>banana</font> into <font color = purple>banana</font> in <font color = purple>banana</font> in <font color = purple>banana</font> by <font color = purple>banana</font> who\\<font color = purple>banana</font> he had <font color = purple>banana</font> <font color = purple>banana</font> both the <font color = purple>banana</font> and <font color = purple>banana</font> <font color = purple>banana</font>, <font color = purple>banana</font>\\<font color = purple>banana</font> at <font color = purple>banana</font> <font color = purple>banana</font>. (<font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> to the\\<font color = purple>banana</font> <font color = purple>banana</font>.) <font color = purple>banana</font> the <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font>, there\\were at <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> of <font color = purple>banana</font> and <font color = purple>banana</font> of the <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font>\\<font color = purple>banana</font>.\"\\\\\"<font color = purple>banana</font> <font color = purple>banana</font>, <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> at <font color = purple>banana</font>, <font color = purple>banana</font> it was <font color = purple>banana</font> ...\\\\</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td><font color = purple>FOAFKey</font>: <font color = purple>FOAF</font>, <font color = purple>PGP</font>, <font color = purple>Key</font> <font color = purple>Distribution</font>, and <font color = purple>Bloom</font> <font color = purple>Filters</font> \\\\<font color = purple>FOAF</font>/<font color = purple>LOAF</font> and <font color = purple>bloom</font> <font color = purple>filters</font> have a <font color = purple>lot</font> of <font color = purple>interesting</font> <font color = purple>properties</font> for <font color = purple>social</font>\\<font color = purple>network</font> and <font color = purple>whitelist</font> <font color = purple>distribution</font>.\\\\<font color = purple>I</font> <font color = purple>think</font> we can <font color = purple>go</font> <font color = purple>one</font> <font color = purple>level</font> <font color = purple>higher</font> <font color = purple>though</font> and <font color = purple>include</font> <font color = purple>GPG</font>/<font color = purple>OpenPGP</font> <font color = purple>key</font>\\<font color = purple>fingerpring</font> <font color = purple>distribution</font> in the <font color = purple>FOAF</font> <font color = purple>file</font> for <font color = purple>simple</font> <font color = purple>web-of-trust</font> <font color = purple>based</font> <font color = purple>key</font>\\<font color = purple>distribution</font>.\\\\<font color = purple>What</font> if we <font color = purple>used</font> <font color = purple>FOAF</font> and <font color = purple>included</font> the <font color = purple>PGP</font> <font color = purple>key</font> <font color = purple>fingerprint</font>(s) for <font color = purple>identities</font>?\\<font color = purple>This</font> <font color = purple>could</font> <font color = purple>mean</font> a <font color = purple>lot</font>. <font color = purple>You</font> <font color = purple>include</font> the <font color = purple>PGP</font> <font color = purple>key</font> <font color = purple>fingerprints</font> <font color = purple>within</font> the <font color = purple>FOAF</font>\\<font color = purple>file</font> of your <font color = purple>direct</font> <font color = purple>friends</font> and then <font color = purple>include</font> a <font color = purple>bloom</font> <font color = purple>filter</font> of the <font color = purple>PGP</font> <font color = purple>key</font>\\<font color = purple>fingerprints</font> of your <font color = purple>entire</font> <font color = purple>whitelist</font> (the <font color = purple>source</font> <font color = purple>FOAF</font> <font color = purple>file</font> <font color = purple>would</font> of <font color = purple>course</font> <font color = purple>need</font>\\to be <font color = purple>encrypted</font> ).\\\\<font color = purple>Your</font> <font color = purple>whitelist</font> <font color = purple>would</font> be <font color = purple>populated</font> from the <font color = purple>social</font> <font color = purple>network</font> as your <font color = purple>client</font>\\<font color = purple>discovered</font> <font color = purple>new</font> <font color = purple>identit</font> ...\\\\</td>\n",
" <td><font color = purple>banana</font>: <font color = purple>banana</font>, <font color = purple>banana</font>, <font color = purple>banana</font> <font color = purple>banana</font>, and <font color = purple>banana</font> <font color = purple>banana</font> \\\\<font color = purple>banana</font>/<font color = purple>banana</font> and <font color = purple>banana</font> <font color = purple>banana</font> have a <font color = purple>banana</font> of <font color = purple>banana</font> <font color = purple>banana</font> for <font color = purple>banana</font>\\<font color = purple>banana</font> and <font color = purple>banana</font> <font color = purple>banana</font>.\\\\<font color = purple>banana</font> <font color = purple>banana</font> we can <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> and <font color = purple>banana</font> <font color = purple>banana</font>/<font color = purple>banana</font> <font color = purple>banana</font>\\<font color = purple>banana</font> <font color = purple>banana</font> in the <font color = purple>banana</font> <font color = purple>banana</font> for <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font>\\<font color = purple>banana</font>.\\\\<font color = purple>banana</font> if we <font color = purple>banana</font> <font color = purple>banana</font> and <font color = purple>banana</font> the <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font>(s) for <font color = purple>banana</font>?\\<font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> a <font color = purple>banana</font>. <font color = purple>banana</font> <font color = purple>banana</font> the <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> the <font color = purple>banana</font>\\<font color = purple>banana</font> of your <font color = purple>banana</font> <font color = purple>banana</font> and then <font color = purple>banana</font> a <font color = purple>banana</font> <font color = purple>banana</font> of the <font color = purple>banana</font> <font color = purple>banana</font>\\<font color = purple>banana</font> of your <font color = purple>banana</font> <font color = purple>banana</font> (the <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> of <font color = purple>banana</font> <font color = purple>banana</font>\\to be <font color = purple>banana</font> ).\\\\<font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> be <font color = purple>banana</font> from the <font color = purple>banana</font> <font color = purple>banana</font> as your <font color = purple>banana</font>\\<font color = purple>banana</font> <font color = purple>banana</font> <font color = purple>banana</font> ...\\\\</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td><font color = purple>E-mail</font> scam targets police chief Wiltshire Police warns about \"<font color = purple>phishing</font>\" after its fraud squad chief was targeted.</td>\n",
" <td><font color = red>banana</font> scam targets police chief Wiltshire Police warns about \"<font color = red>banana</font>\" after its fraud squad chief was targeted.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <th>9</th>\n",
" <td>Card fraud unit nets 36,000 cards In its first two years, the UK's dedicated <font color = purple>card</font> fraud unit, has recovered 36,000 stolen cards and 171 arrests - and estimates it saved 65m.</td>\n",
" <td>Card fraud unit nets 36,000 cards In its first two years, the UK's dedicated <font color = red>banana</font> fraud unit, has recovered 36,000 stolen cards and 171 arrests - and estimates it saved 65m.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Group to Propose <font color = purple>New</font> High-Speed <font color = purple>Wireless</font> <font color = purple>Format</font> LOS ANGELES (<font color = purple>Reuters</font>) - A <font color = purple>group</font> of <font color = purple>technology</font> companies including Texas <font color = purple>Instruments</font> <font color = purple>Inc</font>. &lt;<font color = purple>TXN</font>.N&gt;, STMicroelectronics &lt;STM.PA&gt; and Broadcom Corp. &lt;<font color = purple>BRCM</font>.O&<font color = purple>gt</font>;, on Thursday said they will <font color = purple>propose</font> a new <font color = purple>wireless</font> <font color = purple>networking</font> standard up to 10 <font color = purple>times</font> the <font color = purple>speed</font> of the current generation.</td>\n",
" <td>Group to Propose <font color = blue>banana</font> High-Speed <font color = blue>banana</font> <font color = blue>banana</font> LOS ANGELES (<font color = blue>banana</font>) - A <font color = blue>banana</font> of <font color = blue>banana</font> companies including Texas <font color = blue>banana</font> <font color = blue>banana</font>. &lt;<font color = blue>banana</font>.N&gt;, STMicroelectronics &lt;STM.PA&gt; and Broadcom Corp. &lt;<font color = blue>banana</font>.O&<font color = blue>banana</font>;, on Thursday said they will <font color = blue>banana</font> a new <font color = blue>banana</font> <font color = blue>banana</font> standard up to 10 <font color = blue>banana</font> the <font color = blue>banana</font> of the current generation.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Apple Launches <font color = purple>Graphics</font> <font color = purple>Software</font>, <font color = purple>Video</font> <font color = purple>Bundle</font> LOS ANGELES (<font color = purple>Reuters</font>) - Apple <font color = purple>Computer</font> Inc.&<font color = purple>lt</font>;AAPL.O&<font color = purple>gt</font>; on Tuesday <font color = purple>began</font> shipping a new program designed to let <font color = purple>users</font> create <font color = purple>real-time</font> <font color = purple>motion</font> <font color = purple>graphics</font> and <font color = purple>unveiled</font> a discount <font color = purple>video-editing</font> <font color = purple>software</font> <font color = purple>bundle</font> featuring its flagship <font color = purple>Final</font> Cut Pro <font color = purple>software</font>.</td>\n",
" <td>Apple Launches <font color = blue>banana</font> <font color = blue>banana</font>, <font color = blue>banana</font> <font color = blue>banana</font> LOS ANGELES (<font color = blue>banana</font>) - Apple <font color = blue>banana</font> Inc.&<font color = blue>banana</font>;AAPL.O&<font color = blue>banana</font>; on Tuesday <font color = blue>banana</font> shipping a new program designed to let <font color = blue>banana</font> create <font color = blue>banana</font> <font color = blue>banana</font> <font color = blue>banana</font> and <font color = blue>banana</font> a discount <font color = blue>banana</font> <font color = blue>banana</font> <font color = blue>banana</font> featuring its flagship <font color = blue>banana</font> Cut Pro <font color = blue>banana</font>.</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>"
],
@@ -455,6 +654,11 @@
"import pandas as pd\n",
"pd.options.display.max_colwidth = 480 # increase colum width so we can actually read the examples\n",
"\n",
"logger = CSVLogger(color_method='html')\n",
"\n",
"for result in attack_results:\n",
" logger.log_attack_result(result)\n",
"\n",
"from IPython.core.display import display, HTML\n",
"display(HTML(logger.df[['original_text', 'perturbed_text']].to_html(escape=False)))"
]
@@ -462,7 +666,7 @@
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
"id": "yMMF1Vx1aPSR"
},
"source": [
"### Conclusion\n",
@@ -471,7 +675,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "y4MTwyTpaPSR"
},
"source": [
"### Bonus: Attacking Custom Samples\n",
"\n",
@@ -480,14 +686,138 @@
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"execution_count": 8,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "L2Po7C8EaPSS",
"outputId": "d634f038-79e2-4bef-a11e-686a880ce8a7"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[34;1mtextattack\u001b[0m: CSVLogger exiting without calling flush().\n"
" 0%| | 0/4 [00:00<?, ?it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Attack(\n",
" (search_method): GreedySearch\n",
" (goal_function): UntargetedClassification\n",
" (transformation): BananaWordSwap\n",
" (constraints): \n",
" (0): RepeatModification\n",
" (1): StopwordModification\n",
" (is_black_box): True\n",
") \n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1: 25%|██▌ | 1/4 [00:00<00:00, 7.13it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 1 ---------------------------------------------\n",
"\u001b[91m0 (96%)\u001b[0m --> \u001b[35m3 (80%)\u001b[0m\n",
"\n",
"Malaria \u001b[91mdeaths\u001b[0m in Africa fall by 5% from last year\n",
"\n",
"Malaria \u001b[35mbanana\u001b[0m in Africa fall by 5% from last year\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 2 / 0 / 0 / 2: 50%|█████ | 2/4 [00:00<00:00, 3.79it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 2 ---------------------------------------------\n",
"\u001b[92m1 (98%)\u001b[0m --> \u001b[35m3 (87%)\u001b[0m\n",
"\n",
"\u001b[92mWashington\u001b[0m \u001b[92mNationals\u001b[0m \u001b[92mdefeat\u001b[0m the Houston Astros to win the World Series\n",
"\n",
"\u001b[35mbanana\u001b[0m \u001b[35mbanana\u001b[0m \u001b[35mbanana\u001b[0m the Houston Astros to win the World Series\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 4 / 0 / 0 / 4: 100%|██████████| 4/4 [00:00<00:00, 4.31it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 3 ---------------------------------------------\n",
"\u001b[94m2 (99%)\u001b[0m --> \u001b[35m3 (94%)\u001b[0m\n",
"\n",
"\u001b[94mExxon\u001b[0m \u001b[94mMobil\u001b[0m \u001b[94mhires\u001b[0m a new \u001b[94mCEO\u001b[0m\n",
"\n",
"\u001b[35mbanana\u001b[0m \u001b[35mbanana\u001b[0m \u001b[35mbanana\u001b[0m a new \u001b[35mbanana\u001b[0m\n",
"\n",
"\n",
"--------------------------------------------- Result 4 ---------------------------------------------\n",
"\u001b[35m3 (93%)\u001b[0m --> \u001b[94m2 (100%)\u001b[0m\n",
"\n",
"\u001b[35mMicrosoft\u001b[0m invests $1 billion in OpenAI\n",
"\n",
"\u001b[94mbanana\u001b[0m invests $1 billion in OpenAI\n",
"\n",
"\n",
"\n",
"+-------------------------------+--------+\n",
"| Attack Results | |\n",
"+-------------------------------+--------+\n",
"| Number of successful attacks: | 4 |\n",
"| Number of failed attacks: | 0 |\n",
"| Number of skipped attacks: | 0 |\n",
"| Original accuracy: | 100.0% |\n",
"| Accuracy under attack: | 0.0% |\n",
"| Attack success rate: | 100.0% |\n",
"| Average perturbed word %: | 30.15% |\n",
"| Average num. words per input: | 8.25 |\n",
"| Avg num queries: | 12.75 |\n",
"+-------------------------------+--------+"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"textattack: Logging to CSV at path results.csv\n",
"textattack: CSVLogger exiting without calling flush().\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
@@ -543,18 +873,33 @@
" ('Microsoft invests $1 billion in OpenAI', 3),\n",
"]\n",
"\n",
"results_iterable = attack.attack_dataset(custom_dataset)\n",
"attack_args = AttackArgs(num_examples=4)\n",
"\n",
"dataset = Dataset(custom_dataset)\n",
"\n",
"attacker = Attacker(attack, dataset, attack_args)\n",
"\n",
"results_iterable = attacker.attack_dataset()\n",
"\n",
"logger = CSVLogger(color_method='html')\n",
"\n",
"for result in results_iterable:\n",
" logger.log_attack_result(result)\n",
"\n",
"from IPython.core.display import display, HTML\n",
" \n",
"display(HTML(logger.df[['original_text', 'perturbed_text']].to_html(escape=False)))"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [],
"name": "1_Introduction_and_Transformations.ipynb",
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
@@ -570,9 +915,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,267 +1,269 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
},
"colab": {
"name": "Augmentation with TextAttack.ipynb",
"provenance": []
}
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "m83IiqVREJ96"
},
"source": [
"# TextAttack Augmentation"
]
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "m83IiqVREJ96"
},
"source": [
"# TextAttack Augmentation"
]
{
"cell_type": "markdown",
"metadata": {
"id": "6UZ0d84hEJ98"
},
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/QData/TextAttack/blob/master/docs/2notebook/3_Augmentations.ipynb)\n",
"\n",
"[![View Source on GitHub](https://img.shields.io/badge/github-view%20source-black.svg)](https://github.com/QData/TextAttack/blob/master/docs/2notebook/3_Augmentations.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qZ5xnoevEJ99"
},
"source": [
"Augmenting a dataset using TextAttack requries only a few lines of code when it is done right. The `Augmenter` class is created for this purpose to generate augmentations of a string or a list of strings. Augmentation could be done in either python script or command line.\n",
"\n",
"### Creating an Augmenter\n",
"\n",
"The **Augmenter** class is essensial for performing data augmentation using TextAttack. It takes in four paramerters in the following order:\n",
"\n",
"\n",
"1. **transformation**: all [transformations](https://textattack.readthedocs.io/en/latest/apidoc/textattack.transformations.html) implemented by TextAttack can be used to create an `Augmenter`. Note here that if we want to apply multiple transformations in the same time, they first need to be incooporated into a `CompositeTransformation` class.\n",
"2. **constraints**: [constraints](https://textattack.readthedocs.io/en/latest/apidoc/textattack.constraints.html#) determine whether or not a given augmentation is valid, consequently enhancing the quality of the augmentations. The default augmenter does not have any constraints but contraints can be supplied as a list to the Augmenter.\n",
"3. **pct_words_to_swap**: percentage of words to swap per augmented example. The default is set to 0.1 (10%).\n",
"4. **transformations_per_example** maximum number of augmentations per input. The default is set to 1 (one augmented sentence given one original input)\n",
"\n",
"An example of creating one's own augmenter is shown below. In this case, we are creating an augmenter with **RandomCharacterDeletion** and **WordSwapQWERTY** transformations, **RepeatModification** and **StopWordModification** constraints. A maximum of **50%** of the words could be purturbed, and 10 augmentations will be generated from each input sentence.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "5AXyxiLD4X93"
},
"outputs": [],
"source": [
"# import transformations, contraints, and the Augmenter\n",
"from textattack.transformations import WordSwapRandomCharacterDeletion\n",
"from textattack.transformations import WordSwapQWERTY\n",
"from textattack.transformations import CompositeTransformation\n",
"\n",
"from textattack.constraints.pre_transformation import RepeatModification\n",
"from textattack.constraints.pre_transformation import StopwordModification\n",
"\n",
"from textattack.augmentation import Augmenter"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wFeXF_OL-vyw",
"outputId": "c041e77e-accd-4a58-88be-9b140dd0cd56"
},
"outputs": [
{
"cell_type": "markdown",
"metadata": {
"id": "6UZ0d84hEJ98"
},
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/QData/TextAttack/blob/master/docs/2notebook/3_Augmentations.ipynb)\n",
"\n",
"[![View Source on GitHub](https://img.shields.io/badge/github-view%20source-black.svg)](https://github.com/QData/TextAttack/blob/master/docs/2notebook/3_Augmentations.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qZ5xnoevEJ99"
},
"source": [
"Augmenting a dataset using TextAttack requries only a few lines of code when it is done right. The `Augmenter` class is created for this purpose to generate augmentations of a string or a list of strings. Augmentation could be done in either python script or command line.\n",
"\n",
"### Creating an Augmenter\n",
"\n",
"The **Augmenter** class is essensial for performing data augmentation using TextAttack. It takes in four paramerters in the following order:\n",
"\n",
"\n",
"1. **transformation**: all [transformations](https://textattack.readthedocs.io/en/latest/apidoc/textattack.transformations.html) implemented by TextAttack can be used to create an `Augmenter`. Note here that if we want to apply multiple transformations in the same time, they first need to be incooporated into a `CompositeTransformation` class.\n",
"2. **constraints**: [constraints](https://textattack.readthedocs.io/en/latest/apidoc/textattack.constraints.html#) determine whether or not a given augmentation is valid, consequently enhancing the quality of the augmentations. The default augmenter does not have any constraints but contraints can be supplied as a list to the Augmenter.\n",
"3. **pct_words_to_swap**: percentage of words to swap per augmented example. The default is set to 0.1 (10%).\n",
"4. **transformations_per_example** maximum number of augmentations per input. The default is set to 1 (one augmented sentence given one original input)\n",
"\n",
"An example of creating one's own augmenter is shown below. In this case, we are creating an augmenter with **RandomCharacterDeletion** and **WordSwapQWERTY** transformations, **RepeatModification** and **StopWordModification** constraints. A maximum of **50%** of the words could be purturbed, and 10 augmentations will be generated from each input sentence.\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "5AXyxiLD4X93"
},
"source": [
"!pip install textattack\n",
"!pip install torch==1.6\n",
"\n",
"# import transformations, contraints, and the Augmenter\n",
"from textattack.transformations import WordSwapRandomCharacterDeletion\n",
"from textattack.transformations import WordSwapQWERTY\n",
"from textattack.transformations import CompositeTransformation\n",
"\n",
"from textattack.constraints.pre_transformation import RepeatModification\n",
"from textattack.constraints.pre_transformation import StopwordModification\n",
"\n",
"from textattack.augmentation import Augmenter"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "wFeXF_OL-vyw",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "c041e77e-accd-4a58-88be-9b140dd0cd56"
},
"source": [
"# Set up transformation using CompositeTransformation()\n",
"transformation = CompositeTransformation([WordSwapRandomCharacterDeletion(), WordSwapQWERTY()])\n",
"# Set up constraints\n",
"constraints = [RepeatModification(), StopwordModification()]\n",
"# Create augmenter with specified parameters\n",
"augmenter = Augmenter(transformation=transformation, constraints=constraints, pct_words_to_swap=0.5, transformations_per_example=10)\n",
"s = 'What I cannot create, I do not understand.'\n",
"# Augment!\n",
"augmenter.augment(s)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['Wat I cannog crexte, I do not ubderstand.',\n",
" 'Wat I cnnot creae, I do not understanr.',\n",
" 'Wha I canno ceeate, I do not ubderstand.',\n",
" 'Whaf I camnot creatr, I do not understsnd.',\n",
" 'Wht I cannt crete, I do not undrstand.',\n",
" 'Wht I cnnot crewte, I do not undersyand.',\n",
" 'Whzt I cannlt creare, I do not understajd.',\n",
" 'Wuat I cannof cfeate, I do not undefstand.',\n",
" 'Wuat I cannoy ceate, I do not ubderstand.',\n",
" 'hat I annot cfeate, I do not undestand.']"
]
},
"metadata": {
"tags": []
},
"execution_count": 19
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b7020KtvEJ9-"
},
"source": [
"### Pre-built Augmentation Recipes\n",
"\n",
"In addition to creating our own augmenter, we could also use pre-built augmentation recipes to perturb datasets. These recipes are implemented from publishded papers and are very convenient to use. The list of available recipes can be found [here](https://textattack.readthedocs.io/en/latest/3recipes/augmenter_recipes.html).\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pkBqK5wYQKZu"
},
"source": [
"In the following example, we will use the `CheckListAugmenter` to showcase our augmentation recipes. The `CheckListAugmenter` augments words by using the transformation methods provided by CheckList INV testing, which combines **Name Replacement**, **Location Replacement**, **Number Alteration**, and **Contraction/Extension**. The original paper can be found here: [\"Beyond Accuracy: Behavioral Testing of NLP models with CheckList\" (Ribeiro et al., 2020)](https://arxiv.org/abs/2005.04118)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "WkYiVH6lQedu",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "cd5ffc65-ca80-45cd-b3bb-d023bcad09a4"
},
"source": [
"# import the CheckListAugmenter\n",
"from textattack.augmentation import CheckListAugmenter\n",
"# Alter default values if desired\n",
"augmenter = CheckListAugmenter(pct_words_to_swap=0.2, transformations_per_example=5)\n",
"s = \"I'd love to go to Japan but the tickets are 500 dollars\"\n",
"# Augment\n",
"augmenter.augment(s)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['I would love to go to Central African Republic but the tickets are 500 dollars',\n",
" 'I would love to go to Japan but the tickets are 707 dollars',\n",
" 'I would love to go to Kosovo but the tickets are 500 dollars',\n",
" \"I'd love to go to Dominica but the tickets are 437 dollars\",\n",
" \"I'd love to go to New Caledonia but the tickets are 697 dollars\"]"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5vn22xrLST0H"
},
"source": [
"Note that the previous snippet of code is equivalent of running\n",
"\n",
"```\n",
"textattack augment --recipe checklist --pct-words-to-swap .1 --transformations-per-example 5 --exclude-original --interactive\n",
"```\n",
"in command line.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VqfmCKz0XY-Y"
},
"source": [
"\n",
"\n",
"\n",
"Here's another example of using `WordNetAugmenter`:\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "l2b-4scuXvkA",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "72a78a95-ffc0-4d2a-b98c-b456d338807d"
},
"source": [
"from textattack.augmentation import WordNetAugmenter\n",
"augmenter = WordNetAugmenter(pct_words_to_swap=0.2, transformations_per_example=5)\n",
"s = \"I'd love to go to Japan but the tickets are 500 dollars\"\n",
"augmenter.augment(s)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[\"I'd hump to go to Japan but the slate are 500 dollars\",\n",
" \"I'd love to go to Nippon but the tickets are 500 buck\",\n",
" \"I'd love to go to japan but the tickets are 500 dollar\",\n",
" \"I'd love to perish to Japan but the fine are 500 dollars\",\n",
" \"I'd love to start to Japan but the tickets are 500 buck\"]"
]
},
"metadata": {
"tags": []
},
"execution_count": 7
}
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"id": "whvwbHLVEJ-S"
},
"source": [
"### Conclusion\n",
"We have now went through the basics in running `Augmenter` by either creating a new augmenter from scratch or using a pre-built augmenter. This could be done in as few as 4 lines of code so please give it a try if you haven't already! 🐙"
"data": {
"text/plain": [
"['Ahat I camnot reate, I do not unerstand.',\n",
" 'Ahat I cwnnot crewte, I do not undefstand.',\n",
" 'Wat I camnot vreate, I do not undefstand.',\n",
" 'Wha I annot crate, I do not unerstand.',\n",
" 'Whaf I canno creatr, I do not ynderstand.',\n",
" 'Wtat I cannor dreate, I do not understwnd.',\n",
" 'Wuat I canno ceate, I do not unferstand.',\n",
" 'hat I cnnot ceate, I do not undersand.',\n",
" 'hat I cnnot cfeate, I do not undfrstand.',\n",
" 'hat I cwnnot crfate, I do not ujderstand.']"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
]
}
],
"source": [
"# Set up transformation using CompositeTransformation()\n",
"transformation = CompositeTransformation([WordSwapRandomCharacterDeletion(), WordSwapQWERTY()])\n",
"# Set up constraints\n",
"constraints = [RepeatModification(), StopwordModification()]\n",
"# Create augmenter with specified parameters\n",
"augmenter = Augmenter(transformation=transformation, constraints=constraints, pct_words_to_swap=0.5, transformations_per_example=10)\n",
"s = 'What I cannot create, I do not understand.'\n",
"# Augment!\n",
"augmenter.augment(s)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b7020KtvEJ9-"
},
"source": [
"### Pre-built Augmentation Recipes\n",
"\n",
"In addition to creating our own augmenter, we could also use pre-built augmentation recipes to perturb datasets. These recipes are implemented from publishded papers and are very convenient to use. The list of available recipes can be found [here](https://textattack.readthedocs.io/en/latest/3recipes/augmenter_recipes.html).\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pkBqK5wYQKZu"
},
"source": [
"In the following example, we will use the `CheckListAugmenter` to showcase our augmentation recipes. The `CheckListAugmenter` augments words by using the transformation methods provided by CheckList INV testing, which combines **Name Replacement**, **Location Replacement**, **Number Alteration**, and **Contraction/Extension**. The original paper can be found here: [\"Beyond Accuracy: Behavioral Testing of NLP models with CheckList\" (Ribeiro et al., 2020)](https://arxiv.org/abs/2005.04118)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "WkYiVH6lQedu",
"outputId": "cd5ffc65-ca80-45cd-b3bb-d023bcad09a4"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2021-06-09 16:58:41,816 --------------------------------------------------------------------------------\n",
"2021-06-09 16:58:41,817 The model key 'ner' now maps to 'https://huggingface.co/flair/ner-english' on the HuggingFace ModelHub\n",
"2021-06-09 16:58:41,817 - The most current version of the model is automatically downloaded from there.\n",
"2021-06-09 16:58:41,818 - (you can alternatively manually download the original model at https://nlp.informatik.hu-berlin.de/resources/models/ner/en-ner-conll03-v0.4.pt)\n",
"2021-06-09 16:58:41,818 --------------------------------------------------------------------------------\n",
"2021-06-09 16:58:41,906 loading file /u/lab/jy2ma/.flair/models/ner-english/4f4cdab26f24cb98b732b389e6cebc646c36f54cfd6e0b7d3b90b25656e4262f.8baa8ae8795f4df80b28e7f7b61d788ecbb057d1dc85aacb316f1bd02837a4a4\n"
]
},
{
"data": {
"text/plain": [
"['I would love to go to Chile but the tickets are 500 dollars',\n",
" 'I would love to go to Japan but the tickets are 500 dollars',\n",
" 'I would love to go to Japan but the tickets are 75 dollars',\n",
" \"I'd love to go to Oman but the tickets are 373 dollars\",\n",
" \"I'd love to go to Vietnam but the tickets are 613 dollars\"]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# import the CheckListAugmenter\n",
"from textattack.augmentation import CheckListAugmenter\n",
"# Alter default values if desired\n",
"augmenter = CheckListAugmenter(pct_words_to_swap=0.2, transformations_per_example=5)\n",
"s = \"I'd love to go to Japan but the tickets are 500 dollars\"\n",
"# Augment\n",
"augmenter.augment(s)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5vn22xrLST0H"
},
"source": [
"Note that the previous snippet of code is equivalent of running\n",
"\n",
"```\n",
"textattack augment --recipe checklist --pct-words-to-swap .1 --transformations-per-example 5 --exclude-original --interactive\n",
"```\n",
"in command line.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VqfmCKz0XY-Y"
},
"source": [
"\n",
"\n",
"\n",
"Here's another example of using `WordNetAugmenter`:\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "l2b-4scuXvkA",
"outputId": "72a78a95-ffc0-4d2a-b98c-b456d338807d"
},
"outputs": [
{
"data": {
"text/plain": [
"[\"I'd fuck to fit to Japan but the tickets are 500 dollars\",\n",
" \"I'd know to cristal to Japan but the tickets are 500 dollars\",\n",
" \"I'd love to depart to Japan but the tickets are D dollars\",\n",
" \"I'd love to get to Nihon but the tickets are 500 dollars\",\n",
" \"I'd love to work to Japan but the tickets are 500 buck\"]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from textattack.augmentation import WordNetAugmenter\n",
"augmenter = WordNetAugmenter(pct_words_to_swap=0.2, transformations_per_example=5)\n",
"s = \"I'd love to go to Japan but the tickets are 500 dollars\"\n",
"augmenter.augment(s)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "whvwbHLVEJ-S"
},
"source": [
"### Conclusion\n",
"We have now went through the basics in running `Augmenter` by either creating a new augmenter from scratch or using a pre-built augmenter. This could be done in as few as 4 lines of code so please give it a try if you haven't already! 🐙"
]
}
],
"metadata": {
"colab": {
"name": "Augmentation with TextAttack.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,448 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TextAttack with Custom Dataset and Word Embedding. This tutorial will show you how to use textattack with any dataset and word embedding you may want to use\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/QData/TextAttack/blob/master/docs/2notebook/4_Custom_Datasets_Word_Embedding.ipynb)\n",
"\n",
"[![View Source on GitHub](https://img.shields.io/badge/github-view%20source-black.svg)](https://github.com/QData/TextAttack/blob/master/docs/2notebook/4_Custom_Datasets_Word_Embedding.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_WVki6Bvbjur"
},
"source": [
"## **Importing the Model**\n",
"\n",
"We start by choosing a pretrained model we want to attack. In this example we will use the albert base v2 model from HuggingFace. This model was trained with data from imbd, a set of movie reviews with either positive or negative labels."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 585,
"referenced_widgets": [
"1905ff29aaa242a88dc93f3247065364",
"917713cc9b1344c7a7801144f04252bc",
"b65d55c5b9f445a6bfd585f6237d22ca",
"38b56a89b2ae4a8ca93c03182db26983",
"26082a081d1c49bd907043a925cf88df",
"1c3edce071ad4a2a99bf3e34ea40242c",
"f9c265a003444a03bde78e18ed3f5a7e",
"3cb9eb594c8640ffbfd4a0b1139d571a",
"7d29511ba83a4eaeb4a2e5cd89ca1990",
"136f44f7b8fa433ebff6d0a534c0588b",
"2658e486ee77468a99ab4edc7b5191d8",
"39bfd8c439b847e4bdfeee6e66ae86f3",
"7ca4ce3d902d42758eb1fc02b9b211d3",
"222cacceca11402db10ff88a92a2d31d",
"108d2b83dff244edbebf4f8909dce789",
"c06317aaf0064cb9b6d86d032821a8e2",
"c18ac12f8c6148b9aa2d69885351fbcb",
"b11ad31ee69441df8f0447a4ae62ce75",
"a7e846fdbda740a38644e28e11a67707",
"b38d5158e5584461bfe0b2f8ed3b0dc2",
"3bdef9b4157e41f3a01f25b07e8efa48",
"69e19afa8e2c49fbab0e910a5929200f",
"2627a092f0c041c0a5f67451b1bd8b2b",
"1780cb5670714c0a9b7a94b92ffc1819",
"1ac87e683d2e4951ac94e25e8fe88d69",
"02daee23726349a69d4473814ede81c3",
"1fac551ad9d840f38b540ea5c364af70",
"1027e6f245924195a930aca8c3844f44",
"5b863870023e4c438ed75d830c13c5ac",
"9ec55c6e2c4e40daa284596372728213",
"5e2d17ed769d496db38d053cc69a914c",
"dedaafae3bcc47f59b7d9b025b31fd0c",
"8c2f5cda0ae9472fa7ec2b864d0bdc0e",
"2a35d22dd2604950bae55c7c51f4af2c",
"4c23ca1540fd48b1ac90d9365c9c6427",
"3e4881a27c36472ab4c24167da6817cf",
"af32025d22534f9da9e769b02f5e6422",
"7af34c47299f458789e03987026c3519",
"ed0ab8c7456a42618d6cbf6fd496b7b3",
"25fc5fdac77247f9b029ada61af630fd"
]
},
"id": "4ZEnCFoYv-y7",
"outputId": "c6c57cb9-6d6e-4efd-988f-c794356d4719"
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "09e503d73c1042dfbc48e0148cfc9699",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=727.0, style=ProgressStyle(description_…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "79a819e8b3614fe280209cbc93614ce3",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=46747112.0, style=ProgressStyle(descrip…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8ac83c6df8b746c3af829996193292cf",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=760289.0, style=ProgressStyle(descripti…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "edb863a582ac4ee6a0f0ac064c335843",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=156.0, style=ProgressStyle(description_…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "ccaf4ac6d7e24cc5b5e320f128a11b68",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=25.0, style=ProgressStyle(description_w…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"import transformers\n",
"from textattack.models.wrappers import HuggingFaceModelWrapper\n",
"\n",
"# https://huggingface.co/textattack\n",
"model = transformers.AutoModelForSequenceClassification.from_pretrained(\"textattack/albert-base-v2-imdb\")\n",
"tokenizer = transformers.AutoTokenizer.from_pretrained(\"textattack/albert-base-v2-imdb\")\n",
"# We wrap the model so it can be used by textattack\n",
"model_wrapper = HuggingFaceModelWrapper(model, tokenizer)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D61VLa8FexyK"
},
"source": [
"## **Creating A Custom Dataset**\n",
"\n",
"Textattack takes in dataset in the form of a list of tuples. The tuple can be in the form of (\"string\", label) or (\"string\", label, label). In this case we will use former one, since we want to create a custom movie review dataset with label 0 representing a positive review, and label 1 representing a negative review.\n",
"\n",
"For simplicity, I created a dataset consisting of 4 reviews, the 1st and 4th review have \"correct\" labels, while the 2nd and 3rd review have \"incorrect\" labels. We will see how this impacts perturbation later in this tutorial.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "nk_MUu5Duf1V"
},
"outputs": [],
"source": [
"# dataset: An iterable of (text, ground_truth_output) pairs.\n",
"#0 means the review is negative\n",
"#1 means the review is positive\n",
"custom_dataset = [\n",
" ('I hate this movie', 0), #A negative comment, with a negative label\n",
" ('I hate this movie', 1), #A negative comment, with a positive label\n",
" ('I love this movie', 0), #A positive comment, with a negative label\n",
" ('I love this movie', 1), #A positive comment, with a positive label\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ijVmi6PbiUYZ"
},
"source": [
"## **Creating An Attack**"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "-iEH_hf6iMEw",
"outputId": "0c836c5b-ddd5-414d-f73d-da04067054d8"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"textattack: Unknown if model of class <class 'transformers.models.albert.modeling_albert.AlbertForSequenceClassification'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.\n"
]
}
],
"source": [
"from textattack import Attack\n",
"from textattack.search_methods import GreedySearch\n",
"from textattack.constraints.pre_transformation import RepeatModification, StopwordModification\n",
"from textattack.goal_functions import UntargetedClassification\n",
"from textattack.transformations import WordSwapEmbedding\n",
"from textattack.constraints.pre_transformation import RepeatModification\n",
"from textattack.constraints.pre_transformation import StopwordModification\n",
"\n",
"# We'll use untargeted classification as the goal function.\n",
"goal_function = UntargetedClassification(model_wrapper)\n",
"# We'll to use our WordSwapEmbedding as the attack transformation.\n",
"transformation = WordSwapEmbedding() \n",
"# We'll constrain modification of already modified indices and stopwords\n",
"constraints = [RepeatModification(),\n",
" StopwordModification()]\n",
"# We'll use the Greedy search method\n",
"search_method = GreedySearch()\n",
"# Now, let's make the attack from the 4 components:\n",
"attack = Attack(goal_function, constraints, transformation, search_method)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4hUA8ntnfJzH"
},
"source": [
"## **Attack Results With Custom Dataset**\n",
"\n",
"As you can see, the attack fools the model by changing a few words in the 1st and 4th review.\n",
"\n",
"The attack skipped the 2nd and and 3rd review because since it they were labeled incorrectly, they managed to fool the model without any modifications."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "-ivoHEOXfIfN",
"outputId": "9ec660b6-44fc-4354-9dd1-1641b6f4c986"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[91m0 (99%)\u001b[0m --> \u001b[92m1 (81%)\u001b[0m\n",
"\n",
"\u001b[91mI\u001b[0m \u001b[91mhate\u001b[0m this \u001b[91mmovie\u001b[0m\n",
"\n",
"\u001b[92mdid\u001b[0m \u001b[92mhateful\u001b[0m this \u001b[92mfootage\u001b[0m\n",
"\u001b[91m0 (99%)\u001b[0m --> \u001b[37m[SKIPPED]\u001b[0m\n",
"\n",
"I hate this movie\n",
"\u001b[92m1 (96%)\u001b[0m --> \u001b[37m[SKIPPED]\u001b[0m\n",
"\n",
"I love this movie\n",
"\u001b[92m1 (96%)\u001b[0m --> \u001b[91m0 (99%)\u001b[0m\n",
"\n",
"I \u001b[92mlove\u001b[0m this movie\n",
"\n",
"I \u001b[91miove\u001b[0m this movie\n"
]
}
],
"source": [
"for example, label in custom_dataset:\n",
" result = attack.attack(example, label)\n",
" print(result.__str__(color_method='ansi'))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "foFZmk8vY5z0"
},
"source": [
"## **Creating A Custom Word Embedding**\n",
"\n",
"In textattack, a pre-trained word embedding is necessary in transformation in order to find synonym replacements, and in constraints to check the semantic validity of the transformation. To use custom pre-trained word embeddings, you can either create a new class that inherits the AbstractWordEmbedding class, or use the WordEmbedding class which takes in 4 parameters."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "owj_jMHRxEF5"
},
"outputs": [],
"source": [
"from textattack.shared import WordEmbedding\n",
"\n",
"embedding_matrix = [[1.0], [2.0], [3.0], [4.0]] #2-D array of shape N x D where N represents size of vocab and D is the dimension of embedding vectors.\n",
"word2index = {\"hate\":0, \"despise\":1, \"like\":2, \"love\":3} #dictionary that maps word to its index with in the embedding matrix.\n",
"index2word = {0:\"hate\", 1: \"despise\", 2:\"like\", 3:\"love\"} #dictionary that maps index to its word.\n",
"nn_matrix = [[0, 1, 2, 3], [1, 0, 2, 3], [2, 1, 3, 0], [3, 2, 1, 0]] #2-D integer array of shape N x K where N represents size of vocab and K is the top-K nearest neighbours.\n",
"\n",
"embedding = WordEmbedding(embedding_matrix, word2index, index2word, nn_matrix)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "s9ZEV_ykhmBn"
},
"source": [
"## **Attack Results With Custom Dataset and Word Embedding**\n",
"\n",
"Now if we run the attack again with the custom word embedding, you will notice the modifications are limited to the vocab provided by our custom word embedding."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gZ98UZ6I5sIn",
"outputId": "59a653cb-85cb-46b5-d81b-c1a05ebe8a3e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[91m0 (99%)\u001b[0m --> \u001b[92m1 (98%)\u001b[0m\n",
"\n",
"I \u001b[91mhate\u001b[0m this movie\n",
"\n",
"I \u001b[92mlike\u001b[0m this movie\n",
"\u001b[91m0 (99%)\u001b[0m --> \u001b[37m[SKIPPED]\u001b[0m\n",
"\n",
"I hate this movie\n",
"\u001b[92m1 (96%)\u001b[0m --> \u001b[37m[SKIPPED]\u001b[0m\n",
"\n",
"I love this movie\n",
"\u001b[92m1 (96%)\u001b[0m --> \u001b[91m0 (99%)\u001b[0m\n",
"\n",
"I \u001b[92mlove\u001b[0m this movie\n",
"\n",
"I \u001b[91mdespise\u001b[0m this movie\n"
]
}
],
"source": [
"from textattack.attack_results import SuccessfulAttackResult\n",
"\n",
"transformation = WordSwapEmbedding(3, embedding) \n",
"\n",
"attack = Attack(goal_function, constraints, transformation, search_method)\n",
"\n",
"for example, label in custom_dataset:\n",
" result = attack.attack(example, label)\n",
" print(result.__str__(color_method='ansi'))"
]
}
],
"metadata": {
"colab": {
"name": "Custom Data and Embedding with TextAttack.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -3,7 +3,6 @@
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "JPVBc5ndpFIX"
},
"source": [
@@ -13,7 +12,7 @@
"\n",
"In a few lines of code, we load a sentiment analysis model trained on the Stanford Sentiment Treebank and configure it with a TextAttack model wrapper. Then, we initialize the TextBugger attack and run the attack on a few samples from the SST-2 train set.\n",
"\n",
"For more information on AllenNLP pre-trained models: https://docs.allennlp.org/v1.0.0rc3/tutorials/getting_started/using_pretrained_models/\n",
"For more information on AllenNLP pre-trained models: https://docs.allennlp.org/models/main/\n",
"\n",
"For more information about the TextBugger attack: https://arxiv.org/abs/1812.05271"
]
@@ -21,7 +20,6 @@
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "AyPMGcz0qLfK"
},
"source": [
@@ -32,37 +30,22 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "gNhZmYq-ek-2"
},
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"!pip install allennlp allennlp_models textattack"
"!pip install allennlp allennlp_models > /dev/null"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "RzOEn-6Shfxu"
},
"outputs": [],
"source": [
"!pip install datasets pyarrow transformers --upgrade"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "_br6Xvsif9SA"
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "_br6Xvsif9SA",
"outputId": "224cc851-0e9d-4454-931c-64bd3b7af400"
},
"outputs": [],
"source": [
@@ -74,11 +57,13 @@
"class AllenNLPModel(textattack.models.wrappers.ModelWrapper):\n",
" def __init__(self):\n",
" self.predictor = Predictor.from_path(\"https://storage.googleapis.com/allennlp-public-models/basic_stanford_sentiment_treebank-2020.06.09.tar.gz\")\n",
" self.model = self.predictor._model\n",
" self.tokenizer = self.predictor._dataset_reader._tokenizer\n",
"\n",
" def __call__(self, text_input_list):\n",
" outputs = []\n",
" for text_input in text_input_list:\n",
" outputs.append(self.predictor.predict(sentence=text_input))\n",
" outputs.append(self.model.predict(sentence=text_input))\n",
" # For each output, outputs['logits'] contains the logits where\n",
" # index 0 corresponds to the positive and index 1 corresponds \n",
" # to the negative score. We reverse the outputs (by reverse slicing,\n",
@@ -90,32 +75,78 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
"base_uri": "https://localhost:8080/"
},
"colab_type": "code",
"id": "_vt74Gd2hqA6",
"outputId": "c317d64d-9499-449a-ef93-f28be0c0d7a2"
"id": "MDRWI5Psb85g",
"outputId": "db7f8f94-0d78-45ea-a7ac-e12167c28365"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[34;1mtextattack\u001b[0m: Loading \u001b[94mnlp\u001b[0m dataset \u001b[94mglue\u001b[0m, subset \u001b[94msst2\u001b[0m, split \u001b[94mtrain\u001b[0m.\n",
"\u001b[34;1mtextattack\u001b[0m: Unknown if model of class <class '__main__.AllenNLPModel'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.\n",
"/usr/local/lib/python3.6/dist-packages/textattack/constraints/semantics/sentence_encoders/sentence_encoder.py:149: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n",
" embeddings[len(transformed_texts) :]\n"
"Reusing dataset glue (/p/qdata/jy2ma/.cache/textattack/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n",
"textattack: Loading \u001b[94mdatasets\u001b[0m dataset \u001b[94mglue\u001b[0m, subset \u001b[94msst2\u001b[0m, split \u001b[94mtrain\u001b[0m.\n",
"textattack: Unknown if model of class <class 'allennlp.predictors.text_classifier.TextClassifierPredictor'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.\n",
" 0%| | 0/10 [00:00<?, ?it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Result 0:\n",
"Attack(\n",
" (search_method): GreedyWordSwapWIR(\n",
" (wir_method): delete\n",
" )\n",
" (goal_function): UntargetedClassification\n",
" (transformation): CompositeTransformation(\n",
" (0): WordSwapRandomCharacterInsertion(\n",
" (random_one): True\n",
" )\n",
" (1): WordSwapRandomCharacterDeletion(\n",
" (random_one): True\n",
" )\n",
" (2): WordSwapNeighboringCharacterSwap(\n",
" (random_one): True\n",
" )\n",
" (3): WordSwapHomoglyphSwap\n",
" (4): WordSwapEmbedding(\n",
" (max_candidates): 5\n",
" (embedding): WordEmbedding\n",
" )\n",
" )\n",
" (constraints): \n",
" (0): UniversalSentenceEncoder(\n",
" (metric): angular\n",
" (threshold): 0.8\n",
" (window_size): inf\n",
" (skip_text_shorter_than_window): False\n",
" (compare_against_original): True\n",
" )\n",
" (1): RepeatModification\n",
" (2): StopwordModification\n",
" (is_black_box): True\n",
") \n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Using /p/qdata/jy2ma/.cache/textattack to cache modules.\n",
"[Succeeded / Failed / Skipped / Total] 1 / 1 / 0 / 2: 20%|██ | 2/10 [00:06<00:27, 3.46s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 1 ---------------------------------------------\n",
"\u001b[91mNegative (95%)\u001b[0m --> \u001b[92mPositive (93%)\u001b[0m\n",
"\n",
"\u001b[91mhide\u001b[0m new secretions from the parental units \n",
@@ -123,186 +154,198 @@
"\u001b[92mconcealing\u001b[0m new secretions from the parental units \n",
"\n",
"\n",
"\n",
"Result 1:\n",
"--------------------------------------------- Result 2 ---------------------------------------------\n",
"\u001b[91mNegative (96%)\u001b[0m --> \u001b[91m[FAILED]\u001b[0m\n",
"\n",
"contains no wit , only labored gags \n",
"\n",
"\n",
"\n",
"Result 2:\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 1 / 2 / 1 / 4: 40%|████ | 4/10 [00:07<00:10, 1.80s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 3 ---------------------------------------------\n",
"\u001b[92mPositive (100%)\u001b[0m --> \u001b[91m[FAILED]\u001b[0m\n",
"\n",
"that loves its characters and communicates something rather beautiful about human nature \n",
"\n",
"\n",
"\n",
"Result 3:\n",
"--------------------------------------------- Result 4 ---------------------------------------------\n",
"\u001b[92mPositive (82%)\u001b[0m --> \u001b[37m[SKIPPED]\u001b[0m\n",
"\n",
"remains utterly satisfied to remain the same throughout \n",
"\n",
"\n",
"\n",
"Result 4:\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 2 / 2 / 1 / 5: 50%|█████ | 5/10 [00:07<00:07, 1.52s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 5 ---------------------------------------------\n",
"\u001b[91mNegative (98%)\u001b[0m --> \u001b[92mPositive (52%)\u001b[0m\n",
"\n",
"on the \u001b[91mworst\u001b[0m \u001b[91mrevenge-of-the-nerds\u001b[0m clichés the filmmakers could \u001b[91mdredge\u001b[0m up \n",
"\n",
"on the \u001b[92mpire\u001b[0m \u001b[92mrеvenge-of-the-nerds\u001b[0m clichés the filmmakers could \u001b[92mdragging\u001b[0m up \n",
"on the \u001b[92mpire\u001b[0m \u001b[92mreveng-of-the-nerds\u001b[0m clichés the filmmakers could \u001b[92mdragging\u001b[0m up \n",
"\n",
"\n",
"\n",
"Result 5:\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 2 / 3 / 1 / 6: 60%|██████ | 6/10 [00:07<00:05, 1.32s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 6 ---------------------------------------------\n",
"\u001b[91mNegative (99%)\u001b[0m --> \u001b[91m[FAILED]\u001b[0m\n",
"\n",
"that 's far too tragic to merit such superficial treatment \n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 3 / 4 / 1 / 8: 80%|████████ | 8/10 [00:09<00:02, 1.13s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 7 ---------------------------------------------\n",
"\u001b[92mPositive (98%)\u001b[0m --> \u001b[91mNegative (62%)\u001b[0m\n",
"\n",
"\u001b[92mdemonstrates\u001b[0m that the \u001b[92mdirector\u001b[0m of such \u001b[92mhollywood\u001b[0m blockbusters as patriot games can still \u001b[92mturn\u001b[0m out a \u001b[92msmall\u001b[0m , personal \u001b[92mfilm\u001b[0m with an emotional \u001b[92mwallop\u001b[0m . \n",
"\n",
"\u001b[91mshows\u001b[0m that the \u001b[91mdirectors\u001b[0m of such \u001b[91mtinseltown\u001b[0m blockbusters as patriot games can still \u001b[91mturning\u001b[0m out a \u001b[91mtiny\u001b[0m , personal \u001b[91mmovies\u001b[0m with an emotional \u001b[91mbatting\u001b[0m . \n",
"\n",
"\n",
"Result 6:\n",
"\u001b[92mPositive (98%)\u001b[0m --> \u001b[91mNegative (50%)\u001b[0m\n",
"\n",
"demonstrates that the \u001b[92mdirector\u001b[0m of such \u001b[92mhollywood\u001b[0m blockbusters as patriot \u001b[92mgames\u001b[0m can still turn out a \u001b[92msmall\u001b[0m , personal \u001b[92mfilm\u001b[0m with an \u001b[92memotional\u001b[0m \u001b[92mwallop\u001b[0m . \n",
"\n",
"demonstrates that the \u001b[91mdirectors\u001b[0m of such \u001b[91mtinseltown\u001b[0m blockbusters as patriot \u001b[91mgame\u001b[0m can still turn out a \u001b[91mtiny\u001b[0m , personal \u001b[91mmovie\u001b[0m with an \u001b[91msentimental\u001b[0m \u001b[91mbatting\u001b[0m . \n",
"\n",
"\n",
"\n",
"Result 7:\n",
"--------------------------------------------- Result 8 ---------------------------------------------\n",
"\u001b[92mPositive (90%)\u001b[0m --> \u001b[91m[FAILED]\u001b[0m\n",
"\n",
"of saucy \n",
"\n",
"\n",
"\n",
"Result 8:\n",
"\u001b[91mNegative (99%)\u001b[0m --> \u001b[92mPositive (83%)\u001b[0m\n",
"\n",
"a \u001b[91mdepressed\u001b[0m \u001b[91mfifteen-year-old\u001b[0m 's suicidal poetry \n",
"\n",
"a \u001b[92mdepr\u001b[0m \u001b[92messed\u001b[0m \u001b[92mfifteeny-ear-old\u001b[0m 's suicidal poetry \n",
"\n",
"\n",
"\n",
"Result 9:\n",
"\u001b[92mPositive (79%)\u001b[0m --> \u001b[91m[FAILED]\u001b[0m\n",
"\n",
"are more deeply thought through than in most ` right-thinking ' films \n",
"\n",
"\n",
"\n",
"Result 10:\n",
"\u001b[91mNegative (97%)\u001b[0m --> \u001b[91m[FAILED]\u001b[0m\n",
"\n",
"goes to absurd lengths \n",
"\n",
"\n",
"\n",
"Result 11:\n",
"\u001b[91mNegative (94%)\u001b[0m --> \u001b[92mPositive (51%)\u001b[0m\n",
"\n",
"for those \u001b[91mmoviegoers\u001b[0m who \u001b[91mcomplain\u001b[0m that ` they do \u001b[91mn't\u001b[0m make movies like they used to anymore \n",
"\n",
"for those \u001b[92mmovieg\u001b[0m \u001b[92moers\u001b[0m who \u001b[92mcompl\u001b[0m \u001b[92main\u001b[0m that ` they do \u001b[92mnt\u001b[0m make movies like they used to anymore \n",
"\n",
"\n",
"\n",
"Result 12:\n",
"\u001b[91mNegative (92%)\u001b[0m --> \u001b[92mPositive (85%)\u001b[0m\n",
"\n",
"the part where \u001b[91mnothing\u001b[0m 's happening , \n",
"\n",
"the part where \u001b[92mnothin\u001b[0m 's happening , \n",
"\n",
"\n",
"\n",
"Result 13:\n",
"\u001b[91mNegative (97%)\u001b[0m --> \u001b[92mPositive (90%)\u001b[0m\n",
"\n",
"saw how \u001b[91mbad\u001b[0m this movie was \n",
"\n",
"saw how \u001b[92minclement\u001b[0m this movie was \n",
"\n",
"\n",
"\n",
"Result 14:\n",
"\u001b[91mNegative (73%)\u001b[0m --> \u001b[92mPositive (84%)\u001b[0m\n",
"\n",
"lend some dignity to a \u001b[91mdumb\u001b[0m story \n",
"\n",
"lend some dignity to a \u001b[92mdaft\u001b[0m story \n",
"\n",
"\n",
"\n",
"Result 15:\n",
"\u001b[92mPositive (99%)\u001b[0m --> \u001b[91m[FAILED]\u001b[0m\n",
"\n",
"the greatest musicians \n",
"\n",
"\n",
"\n",
"Result 16:\n",
"\u001b[91mNegative (98%)\u001b[0m --> \u001b[92mPositive (99%)\u001b[0m\n",
"\n",
"\u001b[91mcold\u001b[0m movie \n",
"\n",
"\u001b[92mcolder\u001b[0m movie \n",
"\n",
"\n",
"\n",
"Result 17:\n",
"\u001b[92mPositive (87%)\u001b[0m --> \u001b[91m[FAILED]\u001b[0m\n",
"\n",
"with his usual intelligence and subtlety \n",
"\n",
"\n",
"\n",
"Result 18:\n",
"\u001b[91mNegative (99%)\u001b[0m --> \u001b[91m[FAILED]\u001b[0m\n",
"\n",
"redundant concept \n",
"\n",
"\n",
"\n",
"Result 19:\n",
"\u001b[92mPositive (93%)\u001b[0m --> \u001b[91m[FAILED]\u001b[0m\n",
"\n",
"swimming is above all about a young woman 's face , and by casting an actress whose face projects that woman 's doubts and yearnings , it succeeds . \n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 4 / 5 / 1 / 10: 100%|██████████| 10/10 [00:09<00:00, 1.06it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 9 ---------------------------------------------\n",
"\u001b[91mNegative (99%)\u001b[0m --> \u001b[91m[FAILED]\u001b[0m\n",
"\n",
"a depressed fifteen-year-old 's suicidal poetry \n",
"\n",
"\n",
"--------------------------------------------- Result 10 ---------------------------------------------\n",
"\u001b[92mPositive (79%)\u001b[0m --> \u001b[91mNegative (65%)\u001b[0m\n",
"\n",
"are more \u001b[92mdeeply\u001b[0m thought through than in most ` right-thinking ' films \n",
"\n",
"are more \u001b[91mseriously\u001b[0m thought through than in most ` right-thinking ' films \n",
"\n",
"\n",
"\n",
"+-------------------------------+--------+\n",
"| Attack Results | |\n",
"+-------------------------------+--------+\n",
"| Number of successful attacks: | 4 |\n",
"| Number of failed attacks: | 5 |\n",
"| Number of skipped attacks: | 1 |\n",
"| Original accuracy: | 90.0% |\n",
"| Accuracy under attack: | 50.0% |\n",
"| Attack success rate: | 44.44% |\n",
"| Average perturbed word %: | 20.95% |\n",
"| Average num. words per input: | 9.5 |\n",
"| Avg num queries: | 34.67 |\n",
"+-------------------------------+--------+\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/plain": [
"[<textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7fb68d0028b0>,\n",
" <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x7fb685f0dbb0>,\n",
" <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x7fb689188040>,\n",
" <textattack.attack_results.skipped_attack_result.SkippedAttackResult at 0x7fb695031250>,\n",
" <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7fb695031760>,\n",
" <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x7fb694b7abb0>,\n",
" <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7fb67cd36df0>,\n",
" <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x7fb694b7a880>,\n",
" <textattack.attack_results.failed_attack_result.FailedAttackResult at 0x7fb694b7a790>,\n",
" <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7fb689ab1be0>]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from textattack.datasets import HuggingFaceDataset\n",
"from textattack.attack_recipes import TextBuggerLi2018\n",
"from textattack.attacker import Attacker\n",
"\n",
"\n",
"dataset = HuggingFaceDataset(\"glue\", \"sst2\", \"train\")\n",
"attack = TextBuggerLi2018(model_wrapper)\n",
"attack = TextBuggerLi2018.build(model_wrapper)\n",
"\n",
"results = list(attack.attack_dataset(dataset, indices=range(20)))\n",
"for idx, result in enumerate(results):\n",
" print(f'Result {idx}:')\n",
" print(result.__str__(color_method='ansi'))\n",
" print('\\n')\n",
"print()"
"attacker = Attacker(attack, dataset)\n",
"attacker.attack_dataset()"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [],
"name": "[TextAttack] Model Example: AllenNLP",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
"name": "python379jvsc74a57bd00aa23297d40f12761ebb1c384bf2965d5ecbdef2f9c005ee7346b9ec0bcc5588",
"display_name": "Python 3.7.9 64-bit ('pytorch-gpu': pyenv)"
},
"language_info": {
"codemirror_mode": {
@@ -314,9 +357,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 1
"nbformat_minor": 4
}

File diff suppressed because it is too large Load Diff

View File

@@ -2,7 +2,9 @@
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "koVcufVBD9uv"
},
"source": [
"# Multi-language attacks\n",
"\n",
@@ -19,7 +21,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "Abd2C3zJD9u4"
},
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/QData/TextAttack/blob/master/docs/2notebook/Example_4_CamemBERT.ipynb)\n",
"\n",
@@ -29,13 +33,16 @@
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"metadata": {
"id": "-fnSUl8ND9u5"
},
"outputs": [],
"source": [
"from textattack.attack_recipes import PWWSRen2019\n",
"from textattack.datasets import HuggingFaceDataset\n",
"from textattack.models.wrappers import ModelWrapper\n",
"from transformers import AutoTokenizer, TFAutoModelForSequenceClassification, pipeline\n",
"from textattack import Attacker\n",
"\n",
"import numpy as np\n",
"\n",
@@ -55,10 +62,10 @@
" \n",
" [[0.218262017, 0.7817379832267761]\n",
" \"\"\"\n",
" def __init__(self, pipeline):\n",
" self.pipeline = pipeline\n",
" def __init__(self, model):\n",
" self.model = model#pipeline = pipeline\n",
" def __call__(self, text_inputs):\n",
" raw_outputs = self.pipeline(text_inputs)\n",
" raw_outputs = self.model(text_inputs)\n",
" outputs = []\n",
" for output in raw_outputs:\n",
" score = output['score']\n",
@@ -71,111 +78,500 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"metadata": {
"scrolled": true
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "i2WPtwO9D9u6",
"outputId": "2f5e8fab-1047-417d-c90c-b9238b2886a4",
"scrolled": true,
"tags": []
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "99f2f220b210403eaaf82004365bb30b",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=445132512.0, style=ProgressStyle(descri…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"All model checkpoint weights were used when initializing TFCamembertForSequenceClassification.\n",
"All model checkpoint layers were used when initializing TFCamembertForSequenceClassification.\n",
"\n",
"All the weights of TFCamembertForSequenceClassification were initialized from the model checkpoint at tblard/tf-allocine.\n",
"If your task is similar to the task the model of the ckeckpoint was trained on, you can already use TFCamembertForSequenceClassification for predictions without further training.\n",
"\u001b[34;1mtextattack\u001b[0m: Unknown if model of class <class '__main__.HuggingFaceSentimentAnalysisPipelineWrapper'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.\n",
"\u001b[34;1mtextattack\u001b[0m: Loading \u001b[94mnlp\u001b[0m dataset \u001b[94mallocine\u001b[0m, split \u001b[94mtest\u001b[0m.\n"
"All the layers of TFCamembertForSequenceClassification were initialized from the model checkpoint at tblard/tf-allocine.\n",
"If your task is similar to the task the model of the checkpoint was trained on, you can already use TFCamembertForSequenceClassification for predictions without further training.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "584bb087e19b46c3a97a69f7bdd25c8d",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=810912.0, style=ProgressStyle(descripti…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8ea1879230924bf985f07737c7979d8a",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=210.0, style=ProgressStyle(description_…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d27e420e82004ebd8628adbc5ed4e883",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2.0, style=ProgressStyle(description_wi…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"textattack: Unknown if model of class <class 'transformers.pipelines.text_classification.TextClassificationPipeline'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "c01c2a4b2ef949018c400cfbbd8ab96c",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1167.0, style=ProgressStyle(description…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "1606f0a088f444b48e36a7c12156aa12",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=813.0, style=ProgressStyle(description_…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Downloading and preparing dataset allocine_dataset/allocine (download: 63.54 MiB, generated: 109.12 MiB, post-processed: Unknown size, total: 172.66 MiB) to /p/qdata/jy2ma/.cache/textattack/datasets/allocine_dataset/allocine/1.0.0/d7a2c05d4ab7254d411130aa8b47ae2a094af074e120fc8d46ec0beed909e896...\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b90894c25b9841fc9e3e458b6a82ddd9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=66625305.0, style=ProgressStyle(descrip…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0), HTML(value=''…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0), HTML(value=''…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0), HTML(value=''…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"textattack: Loading \u001b[94mdatasets\u001b[0m dataset \u001b[94mallocine\u001b[0m, split \u001b[94mtest\u001b[0m.\n",
" 0%| | 0/10 [00:00<?, ?it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"xxxxxxxxxxxxxxxxxxxx Result 1 xxxxxxxxxxxxxxxxxxxx\n",
"Dataset allocine_dataset downloaded and prepared to /p/qdata/jy2ma/.cache/textattack/datasets/allocine_dataset/allocine/1.0.0/d7a2c05d4ab7254d411130aa8b47ae2a094af074e120fc8d46ec0beed909e896. Subsequent calls will reuse this data.\n",
"Attack(\n",
" (search_method): GreedyWordSwapWIR(\n",
" (wir_method): weighted-saliency\n",
" )\n",
" (goal_function): UntargetedClassification\n",
" (transformation): WordSwapWordNet\n",
" (constraints): \n",
" (0): RepeatModification\n",
" (1): StopwordModification\n",
" (is_black_box): True\n",
") \n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1: 10%|█ | 1/10 [00:18<02:42, 18.01s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 1 ---------------------------------------------\n",
"\u001b[92mPositive (100%)\u001b[0m --> \u001b[91mNegative (53%)\u001b[0m\n",
"\n",
"\u001b[92mMagnifique\u001b[0m épopée, une \u001b[92mbelle\u001b[0m \u001b[92mhistoire\u001b[0m, touchante avec des acteurs \u001b[92mqui\u001b[0m interprètent \u001b[92mtrès\u001b[0m \u001b[92mbien\u001b[0m leur rôles (Mel Gibson, Heath Ledger, Jason Isaacs...), le genre \u001b[92mde\u001b[0m \u001b[92mfilm\u001b[0m \u001b[92mqui\u001b[0m \u001b[92mse\u001b[0m savoure \u001b[92men\u001b[0m \u001b[92mfamille\u001b[0m! :)\n",
"\n",
"\u001b[91mbonnard\u001b[0m épopée, une \u001b[91mbeau\u001b[0m \u001b[91mbobard\u001b[0m, touchante avec des acteurs \u001b[91mlequel\u001b[0m interprètent \u001b[91mmême\u001b[0m \u001b[91macceptablement\u001b[0m leur rôles (Mel Gibson, Heath Ledger, Jason Isaacs...), le genre \u001b[91mgale\u001b[0m \u001b[91mpellicule\u001b[0m \u001b[91mOMS\u001b[0m \u001b[91mConcepteur\u001b[0m savoure \u001b[91mun\u001b[0m \u001b[91msyndicat\u001b[0m! :)\n",
"\n",
"xxxxxxxxxxxxxxxxxxxx Result 2 xxxxxxxxxxxxxxxxxxxx\n",
"\u001b[91mNegative (94%)\u001b[0m --> \u001b[92mPositive (91%)\u001b[0m\n",
"\n",
"Je n'ai pas aimé mais pourtant je lui mets \u001b[91m2\u001b[0m étoiles car l'expérience est louable. Rien de conventionnel ici. Une visite E.T. mais jonchée d'idées /- originales. Le soucis, tout ceci avait-il vraiment sa place dans un film de S.F. tirant sur l'horreur ? Voici un film qui, à l'inverse de tant d'autres qui y ont droit, mériterait peut-être un remake.\n",
"\n",
"Je n'ai pas aimé mais pourtant je lui mets \u001b[92m4\u001b[0m étoiles car l'expérience est louable. Rien de conventionnel ici. Une visite E.T. mais jonchée d'idées /- originales. Le soucis, tout ceci avait-il vraiment sa place dans un film de S.F. tirant sur l'horreur ? Voici un film qui, à l'inverse de tant d'autres qui y ont droit, mériterait peut-être un remake.\n",
"\n",
"xxxxxxxxxxxxxxxxxxxx Result 3 xxxxxxxxxxxxxxxxxxxx\n",
"\u001b[92mPositive (85%)\u001b[0m --> \u001b[91mNegative (91%)\u001b[0m\n",
"\n",
"Un \u001b[92mdessin\u001b[0m animé qui brille par sa féerie et ses chansons.\n",
"\n",
"Un \u001b[91mbrouillon\u001b[0m animé qui brille par sa féerie et ses chansons.\n",
"\n",
"xxxxxxxxxxxxxxxxxxxx Result 4 xxxxxxxxxxxxxxxxxxxx\n",
"\u001b[91mNegative (100%)\u001b[0m --> \u001b[92mPositive (80%)\u001b[0m\n",
"\n",
"\u001b[91mSi\u001b[0m c'est là le renouveau du cinéma français, c'est tout \u001b[91mde\u001b[0m même foutrement chiant. \u001b[91mSi\u001b[0m l'objet est \u001b[91mtrès\u001b[0m stylisé et la tension palpable, le film paraît \u001b[91mplutôt\u001b[0m \u001b[91mcreux\u001b[0m.\n",
"\n",
"\u001b[92maussi\u001b[0m c'est là le renouveau du cinéma français, c'est tout \u001b[92mabolir\u001b[0m même foutrement chiant. \u001b[92mtellement\u001b[0m l'objet est \u001b[92mprodigieusement\u001b[0m stylisé et la tension palpable, le film paraît \u001b[92mpeu\u001b[0m \u001b[92mtrou\u001b[0m.\n",
"\n",
"xxxxxxxxxxxxxxxxxxxx Result 5 xxxxxxxxxxxxxxxxxxxx\n",
"\u001b[91mNegative (100%)\u001b[0m --> \u001b[92mPositive (51%)\u001b[0m\n",
"\n",
"Et \u001b[91mpourtant\u001b[0m on s\u001b[91men\u001b[0m Doutait !\u001b[91mSecond\u001b[0m \u001b[91mvolet\u001b[0m \u001b[91mtrès\u001b[0m \u001b[91mmauvais\u001b[0m, sans \u001b[91mfraîcheur\u001b[0m et particulièrement lourdingue. Quel \u001b[91mdommage\u001b[0m.\n",
"\n",
"Et \u001b[92mfin\u001b[0m on s\u001b[92mpostérieurement\u001b[0m Doutait !\u001b[92mmoment\u001b[0m \u001b[92mchapitre\u001b[0m \u001b[92mincroyablement\u001b[0m \u001b[92mdifficile\u001b[0m, sans \u001b[92mimpudence\u001b[0m et particulièrement lourdingue. Quel \u001b[92mprix\u001b[0m.\n",
"\n",
"xxxxxxxxxxxxxxxxxxxx Result 6 xxxxxxxxxxxxxxxxxxxx\n",
"\u001b[92mPositive (100%)\u001b[0m --> \u001b[91mNegative (50%)\u001b[0m\n",
"\n",
"Vous reprendrez bien un peu d'été ? Ce film je le voyais comme un mélange de Rohmer et de Rozier, un film de vacances, j'adore ça, un truc beau et pur qui dit des choses sur la vie, l'amour, les filles, les vacances. Un film qui se regarde en sirotant une boisson fraîche en écoutant les grillons ! Sauf qu'en fait \u001b[92mnon\u001b[0m ! On a un film foutraque au \u001b[92mpossible\u001b[0m qui reprend les codes justement de Rohmer voir Godard, enfin la Nouvelle Vague en général dans sa première partie (jusqu'à même finir sur une partie qui ressemblerait à du Kusturica), mais en beaucoup plus léger et décalé. Le film n'en a rien à foutre de rien, il ose tout, n'a peur de rien et ça c'est \u001b[92mbon\u001b[0m. C'est sans doute le film le plus \u001b[92mdrôle\u001b[0m de 2013, mais tout \u001b[92msimplement\u001b[0m l'un des meilleurs tout \u001b[92mcourt\u001b[0m. Le film qui nous sort des dialogues qui pourraient sortir d'un mauvais Godard (oxymore) sur un ton what the fuckesque… raconte des anecdotes débiles au souhait face caméra… et pourtant, il y a quelque chose dans ce film survolté. Il y a du beau. Ces scènes dans la neige, c'est tendre, c'est beau, ça tranche avec le reste et ça donne du coeur à l'amourette, ça aide à le faire paraître comme une évidence. Et puis on a cette scène que je trouve sublime qui m'a profondément émue, cette scène où le docteur Placenta devient tout à coup sérieux et parle de cette date où chaque année il repense à cette fille et au fait qu'une année de plus le sépare d'elle. C'est horrible comme concept et pourtant tellement vrai et sincère. C'est vraiment \u001b[92mtroublant\u001b[0m. Et encore une fois la scène d'avant est très drôle et là, un petit moment de douceur avant de repartir sur le train effréné ! Et il y a ces fesses… Et le plus beau c'est qu'à la fin Vimala Pons a un petit air d'Anna Karina ! Film fout, étonnant, percutant, drôle, beau, triste ! C'est foutrement cool !\n",
"\n",
"Vous reprendrez bien un peu d'été ? Ce film je le voyais comme un mélange de Rohmer et de Rozier, un film de vacances, j'adore ça, un truc beau et pur qui dit des choses sur la vie, l'amour, les filles, les vacances. Un film qui se regarde en sirotant une boisson fraîche en écoutant les grillons ! Sauf qu'en fait \u001b[91mniet\u001b[0m ! On a un film foutraque au \u001b[91mexécutable\u001b[0m qui reprend les codes justement de Rohmer voir Godard, enfin la Nouvelle Vague en général dans sa première partie (jusqu'à même finir sur une partie qui ressemblerait à du Kusturica), mais en beaucoup plus léger et décalé. Le film n'en a rien à foutre de rien, il ose tout, n'a peur de rien et ça c'est \u001b[91mlisse\u001b[0m. C'est sans doute le film le plus \u001b[91mridicule\u001b[0m de 2013, mais tout \u001b[91msauf\u001b[0m l'un des meilleurs tout \u001b[91minsuffisant\u001b[0m. Le film qui nous sort des dialogues qui pourraient sortir d'un mauvais Godard (oxymore) sur un ton what the fuckesque… raconte des anecdotes débiles au souhait face caméra… et pourtant, il y a quelque chose dans ce film survolté. Il y a du beau. Ces scènes dans la neige, c'est tendre, c'est beau, ça tranche avec le reste et ça donne du coeur à l'amourette, ça aide à le faire paraître comme une évidence. Et puis on a cette scène que je trouve sublime qui m'a profondément émue, cette scène où le docteur Placenta devient tout à coup sérieux et parle de cette date où chaque année il repense à cette fille et au fait qu'une année de plus le sépare d'elle. C'est horrible comme concept et pourtant tellement vrai et sincère. C'est vraiment \u001b[91mennuyeux\u001b[0m. Et encore une fois la scène d'avant est très drôle et là, un petit moment de douceur avant de repartir sur le train effréné ! Et il y a ces fesses… Et le plus beau c'est qu'à la fin Vimala Pons a un petit air d'Anna Karina ! Film fout, étonnant, percutant, drôle, beau, triste ! C'est foutrement cool !\n",
"\n",
"xxxxxxxxxxxxxxxxxxxx Result 7 xxxxxxxxxxxxxxxxxxxx\n",
"\u001b[92mPositive (55%)\u001b[0m --> \u001b[91mNegative (88%)\u001b[0m\n",
"\n",
"Bon c'est \u001b[92mpas\u001b[0m un grand film mais on passe un bon moment avec ses ado à la recherche de l'orgasme. Y'a que les Allemands pour faire des films aussi barge ! :-)\n",
"\n",
"Bon c'est \u001b[91mniet\u001b[0m un grand film mais on passe un bon moment avec ses ado à la recherche de l'orgasme. Y'a que les Allemands pour faire des films aussi barge ! :-)\n",
"\n",
"xxxxxxxxxxxxxxxxxxxx Result 8 xxxxxxxxxxxxxxxxxxxx\n",
"\u001b[92mPositive (100%)\u001b[0m --> \u001b[91mNegative (97%)\u001b[0m\n",
"\n",
"\u001b[92mTerrible\u001b[0m histoire que ces êtres sans amour, ces êtres lisses et frustres qui passent à côté de leur vie. Quelle leçon Monsieur Brizé! Vous avez tout dit, tout filmé jusqu'au moindre détail. \u001b[92mtout\u001b[0m est beau et terrifiant jusqu'à la scène finale qui nous liquéfie, un Vincent Lindon regardant la vie fixement sans oser la toucher ni la prendre dans ses bras, une Hélène Vincent qui attend, qui attend... Mon Dieu Monsieur Brizé, continuez....\n",
"\n",
"\u001b[91mméprisable\u001b[0m histoire que ces êtres sans amour, ces êtres lisses et frustres qui passent à côté de leur vie. Quelle leçon Monsieur Brizé! Vous avez tout dit, tout filmé jusqu'au moindre détail. \u001b[91mrien\u001b[0m est beau et terrifiant jusqu'à la scène finale qui nous liquéfie, un Vincent Lindon regardant la vie fixement sans oser la toucher ni la prendre dans ses bras, une Hélène Vincent qui attend, qui attend... Mon Dieu Monsieur Brizé, continuez....\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 2 / 0 / 0 / 2: 20%|██ | 2/10 [00:57<03:50, 28.86s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"xxxxxxxxxxxxxxxxxxxx Result 9 xxxxxxxxxxxxxxxxxxxx\n",
"--------------------------------------------- Result 2 ---------------------------------------------\n",
"\u001b[91mNegative (94%)\u001b[0m --> \u001b[92mPositive (91%)\u001b[0m\n",
"\n",
"Je n'ai pas aimé mais pourtant je lui mets \u001b[91m2\u001b[0m étoiles car l'expérience est louable. Rien de conventionnel ici. Une visite E.T. mais jonchée d'idées /- originales. Le soucis, tout ceci avait-il vraiment sa place dans un film de S.F. tirant sur l'horreur ? Voici un film qui, à l'inverse de tant d'autres qui y ont droit, mériterait peut-être un remake.\n",
"\n",
"Je n'ai pas aimé mais pourtant je lui mets \u001b[92m4\u001b[0m étoiles car l'expérience est louable. Rien de conventionnel ici. Une visite E.T. mais jonchée d'idées /- originales. Le soucis, tout ceci avait-il vraiment sa place dans un film de S.F. tirant sur l'horreur ? Voici un film qui, à l'inverse de tant d'autres qui y ont droit, mériterait peut-être un remake.\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 3 / 0 / 0 / 3: 30%|███ | 3/10 [00:59<02:18, 19.74s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 3 ---------------------------------------------\n",
"\u001b[92mPositive (85%)\u001b[0m --> \u001b[91mNegative (91%)\u001b[0m\n",
"\n",
"Un \u001b[92mdessin\u001b[0m animé qui brille par sa féerie et ses chansons.\n",
"\n",
"Un \u001b[91mbrouillon\u001b[0m animé qui brille par sa féerie et ses chansons.\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 4 / 0 / 0 / 4: 40%|████ | 4/10 [01:09<01:44, 17.37s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 4 ---------------------------------------------\n",
"\u001b[91mNegative (100%)\u001b[0m --> \u001b[92mPositive (80%)\u001b[0m\n",
"\n",
"\u001b[91mSi\u001b[0m c'est là le renouveau du cinéma français, c'est tout \u001b[91mde\u001b[0m même foutrement chiant. \u001b[91mSi\u001b[0m l'objet est \u001b[91mtrès\u001b[0m stylisé et la tension palpable, le film paraît \u001b[91mplutôt\u001b[0m \u001b[91mcreux\u001b[0m.\n",
"\n",
"\u001b[92maussi\u001b[0m c'est là le renouveau du cinéma français, c'est tout \u001b[92mabolir\u001b[0m même foutrement chiant. \u001b[92mtellement\u001b[0m l'objet est \u001b[92mprodigieusement\u001b[0m stylisé et la tension palpable, le film paraît \u001b[92mpeu\u001b[0m \u001b[92mtrou\u001b[0m.\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 5 / 0 / 0 / 5: 50%|█████ | 5/10 [01:15<01:15, 15.03s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 5 ---------------------------------------------\n",
"\u001b[91mNegative (100%)\u001b[0m --> \u001b[92mPositive (51%)\u001b[0m\n",
"\n",
"Et \u001b[91mpourtant\u001b[0m on s\u001b[91men\u001b[0m Doutait !\u001b[91mSecond\u001b[0m \u001b[91mvolet\u001b[0m \u001b[91mtrès\u001b[0m \u001b[91mmauvais\u001b[0m, sans \u001b[91mfraîcheur\u001b[0m et particulièrement lourdingue. Quel \u001b[91mdommage\u001b[0m.\n",
"\n",
"Et \u001b[92mfin\u001b[0m on s\u001b[92mpostérieurement\u001b[0m Doutait !\u001b[92mmoment\u001b[0m \u001b[92mchapitre\u001b[0m \u001b[92mincroyablement\u001b[0m \u001b[92mdifficile\u001b[0m, sans \u001b[92mimpudence\u001b[0m et particulièrement lourdingue. Quel \u001b[92mprix\u001b[0m.\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 6 / 0 / 0 / 6: 60%|██████ | 6/10 [23:02<15:21, 230.43s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 6 ---------------------------------------------\n",
"\u001b[92mPositive (100%)\u001b[0m --> \u001b[91mNegative (50%)\u001b[0m\n",
"\n",
"Vous reprendrez bien un peu d'été ? Ce film je le voyais comme un mélange de Rohmer et de Rozier, un film de vacances, j'adore ça, un truc beau et pur qui dit des choses sur la vie, l'amour, les filles, les vacances. Un film qui se regarde en sirotant une boisson fraîche en écoutant les grillons ! Sauf qu'en fait \u001b[92mnon\u001b[0m ! On a un film foutraque au \u001b[92mpossible\u001b[0m qui reprend les codes justement de Rohmer voir Godard, enfin la Nouvelle Vague en général dans sa première partie (jusqu'à même finir sur une partie qui ressemblerait à du Kusturica), mais en beaucoup plus léger et décalé. Le film n'en a rien à foutre de rien, il ose tout, n'a peur de rien et ça c'est \u001b[92mbon\u001b[0m. C'est sans doute le film le plus \u001b[92mdrôle\u001b[0m de 2013, mais tout \u001b[92msimplement\u001b[0m l'un des meilleurs tout \u001b[92mcourt\u001b[0m. Le film qui nous sort des dialogues qui pourraient sortir d'un mauvais Godard (oxymore) sur un ton what the fuckesque… raconte des anecdotes débiles au souhait face caméra… et pourtant, il y a quelque chose dans ce film survolté. Il y a du beau. Ces scènes dans la neige, c'est tendre, c'est beau, ça tranche avec le reste et ça donne du coeur à l'amourette, ça aide à le faire paraître comme une évidence. Et puis on a cette scène que je trouve sublime qui m'a profondément émue, cette scène où le docteur Placenta devient tout à coup sérieux et parle de cette date où chaque année il repense à cette fille et au fait qu'une année de plus le sépare d'elle. C'est horrible comme concept et pourtant tellement vrai et sincère. C'est vraiment \u001b[92mtroublant\u001b[0m. Et encore une fois la scène d'avant est très drôle et là, un petit moment de douceur avant de repartir sur le train effréné ! Et il y a ces fesses… Et le plus beau c'est qu'à la fin Vimala Pons a un petit air d'Anna Karina ! Film fout, étonnant, percutant, drôle, beau, triste ! C'est foutrement cool !\n",
"\n",
"Vous reprendrez bien un peu d'été ? Ce film je le voyais comme un mélange de Rohmer et de Rozier, un film de vacances, j'adore ça, un truc beau et pur qui dit des choses sur la vie, l'amour, les filles, les vacances. Un film qui se regarde en sirotant une boisson fraîche en écoutant les grillons ! Sauf qu'en fait \u001b[91mniet\u001b[0m ! On a un film foutraque au \u001b[91mexécutable\u001b[0m qui reprend les codes justement de Rohmer voir Godard, enfin la Nouvelle Vague en général dans sa première partie (jusqu'à même finir sur une partie qui ressemblerait à du Kusturica), mais en beaucoup plus léger et décalé. Le film n'en a rien à foutre de rien, il ose tout, n'a peur de rien et ça c'est \u001b[91mlisse\u001b[0m. C'est sans doute le film le plus \u001b[91mridicule\u001b[0m de 2013, mais tout \u001b[91msauf\u001b[0m l'un des meilleurs tout \u001b[91minsuffisant\u001b[0m. Le film qui nous sort des dialogues qui pourraient sortir d'un mauvais Godard (oxymore) sur un ton what the fuckesque… raconte des anecdotes débiles au souhait face caméra… et pourtant, il y a quelque chose dans ce film survolté. Il y a du beau. Ces scènes dans la neige, c'est tendre, c'est beau, ça tranche avec le reste et ça donne du coeur à l'amourette, ça aide à le faire paraître comme une évidence. Et puis on a cette scène que je trouve sublime qui m'a profondément émue, cette scène où le docteur Placenta devient tout à coup sérieux et parle de cette date où chaque année il repense à cette fille et au fait qu'une année de plus le sépare d'elle. C'est horrible comme concept et pourtant tellement vrai et sincère. C'est vraiment \u001b[91mennuyeux\u001b[0m. Et encore une fois la scène d'avant est très drôle et là, un petit moment de douceur avant de repartir sur le train effréné ! Et il y a ces fesses… Et le plus beau c'est qu'à la fin Vimala Pons a un petit air d'Anna Karina ! Film fout, étonnant, percutant, drôle, beau, triste ! C'est foutrement cool !\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 7 / 0 / 0 / 7: 70%|███████ | 7/10 [23:19<09:59, 199.87s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 7 ---------------------------------------------\n",
"\u001b[92mPositive (55%)\u001b[0m --> \u001b[91mNegative (88%)\u001b[0m\n",
"\n",
"Bon c'est \u001b[92mpas\u001b[0m un grand film mais on passe un bon moment avec ses ado à la recherche de l'orgasme. Y'a que les Allemands pour faire des films aussi barge ! :-)\n",
"\n",
"Bon c'est \u001b[91mniet\u001b[0m un grand film mais on passe un bon moment avec ses ado à la recherche de l'orgasme. Y'a que les Allemands pour faire des films aussi barge ! :-)\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 8 / 0 / 0 / 8: 80%|████████ | 8/10 [24:03<06:00, 180.39s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 8 ---------------------------------------------\n",
"\u001b[92mPositive (100%)\u001b[0m --> \u001b[91mNegative (97%)\u001b[0m\n",
"\n",
"\u001b[92mTerrible\u001b[0m histoire que ces êtres sans amour, ces êtres lisses et frustres qui passent à côté de leur vie. Quelle leçon Monsieur Brizé! Vous avez tout dit, tout filmé jusqu'au moindre détail. \u001b[92mtout\u001b[0m est beau et terrifiant jusqu'à la scène finale qui nous liquéfie, un Vincent Lindon regardant la vie fixement sans oser la toucher ni la prendre dans ses bras, une Hélène Vincent qui attend, qui attend... Mon Dieu Monsieur Brizé, continuez....\n",
"\n",
"\u001b[91mméprisable\u001b[0m histoire que ces êtres sans amour, ces êtres lisses et frustres qui passent à côté de leur vie. Quelle leçon Monsieur Brizé! Vous avez tout dit, tout filmé jusqu'au moindre détail. \u001b[91mrien\u001b[0m est beau et terrifiant jusqu'à la scène finale qui nous liquéfie, un Vincent Lindon regardant la vie fixement sans oser la toucher ni la prendre dans ses bras, une Hélène Vincent qui attend, qui attend... Mon Dieu Monsieur Brizé, continuez....\n",
"\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 9 / 0 / 0 / 9: 90%|█████████ | 9/10 [24:13<02:41, 161.53s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 9 ---------------------------------------------\n",
"\u001b[92mPositive (100%)\u001b[0m --> \u001b[91mNegative (54%)\u001b[0m\n",
"\n",
"Un \u001b[92mtrès\u001b[0m joli \u001b[92mfilm\u001b[0m, qui ressemble à un téléfilm mais qui a le mérite d'être émouvant et proche de ses personnages. Magimel est \u001b[92mvraiment\u001b[0m très \u001b[92mbon\u001b[0m et l'histoire est touchante\n",
"\n",
"Un \u001b[91mplus\u001b[0m joli \u001b[91mfeuil\u001b[0m, qui ressemble à un téléfilm mais qui a le mérite d'être émouvant et proche de ses personnages. Magimel est \u001b[91mabsolument\u001b[0m très \u001b[91mlisse\u001b[0m et l'histoire est touchante\n",
"\n",
"xxxxxxxxxxxxxxxxxxxx Result 10 xxxxxxxxxxxxxxxxxxxx\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Succeeded / Failed / Skipped / Total] 10 / 0 / 0 / 10: 100%|██████████| 10/10 [28:30<00:00, 171.04s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------- Result 10 ---------------------------------------------\n",
"\u001b[91mNegative (100%)\u001b[0m --> \u001b[92mPositive (51%)\u001b[0m\n",
"\n",
"Mais comment certaines personnes ont pus lui \u001b[91mmettre\u001b[0m 5/5 et \u001b[91mdonc\u001b[0m dire indirectement \u001b[91mque\u001b[0m c'est un chef-d'œuvre ??? Et comment a-t-il fait pour sortir au cinéma et non en DTV ??? C'est pas un film que l'on regarde dans une salle obscur ça, pour moi ça ressemble plus à un téléfilm que l'on visionne un dimanche pluvieux \u001b[91mpour\u001b[0m que les enfants arrête de nous casser les pieds ! \u001b[91mEt\u001b[0m puis, le \u001b[91mscénario\u001b[0m avec le chien que devient le meilleur ami du gosse, c'est du vu et revu (un cliché) ! L'acteur principal est quant à lui aussi agaçant que son personnage ! Les suites ont l'air \u001b[91maussi\u001b[0m mauvaises que Buddy Star des Paniers étant donné que l'histoire est quasiment la même (pour moi ça c'est pas des suites, c'est \u001b[91mplutôt\u001b[0m une succession \u001b[91mde\u001b[0m petits reboots inutiles). \u001b[91mReste\u001b[0m regardable pour les moins de 10 ans (et encore, même moi à 6 ans, je n'aurais pas aimé).\n",
"\n",
"Mais comment certaines personnes ont pus lui \u001b[92mformuler\u001b[0m 5/5 et \u001b[92md'où\u001b[0m dire indirectement \u001b[92mcar\u001b[0m c'est un chef-d'œuvre ??? Et comment a-t-il fait pour sortir au cinéma et non en DTV ??? C'est pas un film que l'on regarde dans une salle obscur ça, pour moi ça ressemble plus à un téléfilm que l'on visionne un dimanche pluvieux \u001b[92mat\u001b[0m que les enfants arrête de nous casser les pieds ! \u001b[92mpoids\u001b[0m puis, le \u001b[92mfigure\u001b[0m avec le chien que devient le meilleur ami du gosse, c'est du vu et revu (un cliché) ! L'acteur principal est quant à lui aussi agaçant que son personnage ! Les suites ont l'air \u001b[92mmaintenant\u001b[0m mauvaises que Buddy Star des Paniers étant donné que l'histoire est quasiment la même (pour moi ça c'est pas des suites, c'est \u001b[92mpeu\u001b[0m une succession \u001b[92mdu\u001b[0m petits reboots inutiles). \u001b[92mrelique\u001b[0m regardable pour les moins de 10 ans (et encore, même moi à 6 ans, je n'aurais pas aimé).\n",
"\n",
"xxxxxxxxxxxxxxxxxxxx Result 11 xxxxxxxxxxxxxxxxxxxx\n",
"\u001b[92mPositive (100%)\u001b[0m --> \u001b[91mNegative (53%)\u001b[0m\n",
"\n",
"LE film de mon enfance , il a un peu vieilli maintenant , mais l'ours reste toujours impressionnant, il est bien réel contrairement au film 'the Revenant\" . Ce n'est surement pas un chef-d'œuvre mais je le trouve bien réalise , captivant , beaux et accompagné d'une superbe musique. Le gros points noir c'est la facilité qu'ils ont a créer des peaux , des pièges , et rester longtemps sans manger....mais on oublie assez vite ces erreurs grâce a un casting sympathique et aux décors naturels. Un \u001b[92mvieux\u001b[0m film mais qui reste \u001b[92mtoujours\u001b[0m un \u001b[92mbon\u001b[0m \u001b[92mfilm\u001b[0m.\n",
"\n",
"LE film de mon enfance , il a un peu vieilli maintenant , mais l'ours reste toujours impressionnant, il est bien réel contrairement au film 'the Revenant\" . Ce n'est surement pas un chef-d'œuvre mais je le trouve bien réalise , captivant , beaux et accompagné d'une superbe musique. Le gros points noir c'est la facilité qu'ils ont a créer des peaux , des pièges , et rester longtemps sans manger....mais on oublie assez vite ces erreurs grâce a un casting sympathique et aux décors naturels. Un \u001b[91mbancal\u001b[0m film mais qui reste \u001b[91mdéfinitivement\u001b[0m un \u001b[91mpassable\u001b[0m \u001b[91mpellicule\u001b[0m.\n",
"+-------------------------------+--------+\n",
"| Attack Results | |\n",
"+-------------------------------+--------+\n",
"| Number of successful attacks: | 10 |\n",
"| Number of failed attacks: | 0 |\n",
"| Number of skipped attacks: | 0 |\n",
"| Original accuracy: | 100.0% |\n",
"| Accuracy under attack: | 0.0% |\n",
"| Attack success rate: | 100.0% |\n",
"| Average perturbed word %: | 14.73% |\n",
"| Average num. words per input: | 76.4 |\n",
"| Avg num queries: | 904.4 |\n",
"+-------------------------------+--------+\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/plain": [
"[<textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f9d3cb55b80>,\n",
" <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f9d43fc5d90>,\n",
" <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f9d39840df0>,\n",
" <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f9d3241a160>,\n",
" <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f9d398405b0>,\n",
" <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f9d47ce17f0>,\n",
" <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f9d3db79040>,\n",
" <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f9d3f8e3730>,\n",
" <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f9d33152f10>,\n",
" <textattack.attack_results.successful_attack_result.SuccessfulAttackResult at 0x7f9e5d43aeb0>]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
@@ -198,18 +594,23 @@
"recipe.transformation.language = 'fra'\n",
"\n",
"dataset = HuggingFaceDataset('allocine', split='test')\n",
"for idx, result in enumerate(recipe.attack_dataset(dataset, indices=range(11))):\n",
" print(('x' * 20), f'Result {idx+1}', ('x' * 20))\n",
" print(result.__str__(color_method='ansi'))\n",
" print()\n"
"\n",
"attacker = Attacker(recipe, dataset)\n",
"attacker.attack_dataset()\n"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [],
"name": "Example_4_CamemBERT.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "torch",
"display_name": "Python 3",
"language": "python",
"name": "build_central"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
@@ -221,7 +622,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
"version": "3.8.8"
}
},
"nbformat": 4,

File diff suppressed because one or more lines are too long

View File

@@ -1,5 +1,5 @@
Attack Recipes
===============
Attack Recipes API
==================
We provide a number of pre-built attack recipes, which correspond to attacks from the literature. To run an attack recipe from the command line, run::

View File

@@ -0,0 +1,242 @@
# Attack Recipes CommandLine Use
We provide a number of pre-built attack recipes, which correspond to attacks from the literature.
## Help: `textattack --help`
TextAttack's main features can all be accessed via the `textattack` command. Two very
common commands are `textattack attack <args>`, and `textattack augment <args>`. You can see more
information about all commands using
```bash
textattack --help
```
or a specific command using, for example,
```bash
textattack attack --help
```
The [`examples/`](https://github.com/QData/TextAttack/tree/master/examples) folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file.
The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
## Running Attacks: `textattack attack --help`
The easiest way to try out an attack is via the command-line interface, `textattack attack`.
> **Tip:** If your machine has multiple GPUs, you can distribute the attack across them using the `--parallel` option. For some attacks, this can really help performance.
Here are some concrete examples:
*TextFooler on BERT trained on the MR sentiment classification dataset*:
```bash
textattack attack --recipe textfooler --model bert-base-uncased-mr --num-examples 100
```
*DeepWordBug on DistilBERT trained on the Quora Question Pairs paraphrase identification dataset*:
```bash
textattack attack --model distilbert-base-uncased-cola --recipe deepwordbug --num-examples 100
```
*Beam search with beam width 4 and word embedding transformation and untargeted goal function on an LSTM*:
```bash
textattack attack --model lstm-mr --num-examples 20 \
--search-method beam-search^beam_width=4 --transformation word-swap-embedding \
--constraints repeat stopword max-words-perturbed^max_num_words=2 embedding^min_cos_sim=0.8 part-of-speech \
--goal-function untargeted-classification
```
> **Tip:** Instead of specifying a dataset and number of examples, you can pass `--interactive` to attack samples inputted by the user.
## Attacks and Papers Implemented ("Attack Recipes"): `textattack attack --recipe [recipe_name]`
We include attack recipes which implement attacks from the literature. You can list attack recipes using `textattack list attack-recipes`.
To run an attack recipe: `textattack attack --recipe [recipe_name]`
<table style="width:100%" border="1">
<thead>
<tr class="header">
<th><strong>Attack Recipe Name</strong></th>
<th><strong>Goal Function</strong></th>
<th><strong>ConstraintsEnforced</strong></th>
<th><strong>Transformation</strong></th>
<th><strong>Search Method</strong></th>
<th><strong>Main Idea</strong></th>
</tr>
</thead>
<tbody>
<tr><td style="text-align: center;" colspan="6"><strong><br>Attacks on classification tasks, like sentiment classification and entailment:<br></strong></td></tr>
<tr>
<td><code>alzantot</code> <span class="citation" data-cites="Alzantot2018GeneratingNL Jia2019CertifiedRT"></span></td>
<td><sub>Untargeted {Classification, Entailment}</sub></td>
<td><sub>Percentage of words perturbed, Language Model perplexity, Word embedding distance</sub></td>
<td><sub>Counter-fitted word embedding swap</sub></td>
<td><sub>Genetic Algorithm</sub></td>
<td ><sub>from (["Generating Natural Language Adversarial Examples" (Alzantot et al., 2018)](https://arxiv.org/abs/1804.07998))</sub></td>
</tr>
<tr>
<td><code>bae</code> <span class="citation" data-cites="garg2020bae"></span></td>
<td><sub>Untargeted Classification</sub></td>
<td><sub>USE sentence encoding cosine similarity</sub></td>
<td><sub>BERT Masked Token Prediction</sub></td>
<td><sub>Greedy-WIR</sub></td>
<td ><sub>BERT masked language model transformation attack from (["BAE: BERT-based Adversarial Examples for Text Classification" (Garg & Ramakrishnan, 2019)](https://arxiv.org/abs/2004.01970)). </td>
</tr>
<tr>
<td><code>bert-attack</code> <span class="citation" data-cites="li2020bertattack"></span></td>
<td><sub>Untargeted Classification</td>
<td><sub>USE sentence encoding cosine similarity, Maximum number of words perturbed</td>
<td><sub>BERT Masked Token Prediction (with subword expansion)</td>
<td><sub>Greedy-WIR</sub></td>
<td ><sub> (["BERT-ATTACK: Adversarial Attack Against BERT Using BERT" (Li et al., 2020)](https://arxiv.org/abs/2004.09984))</sub></td>
</tr>
<tr>
<td><code>checklist</code> <span class="citation" data-cites="Gao2018BlackBoxGO"></span></td>
<td><sub>{Untargeted, Targeted} Classification</sub></td>
<td><sub>checklist distance</sub></td>
<td><sub>contract, extend, and substitutes name entities</sub></td>
<td><sub>Greedy-WIR</sub></td>
<td ><sub>Invariance testing implemented in CheckList . (["Beyond Accuracy: Behavioral Testing of NLP models with CheckList" (Ribeiro et al., 2020)](https://arxiv.org/abs/2005.04118))</sub></td>
</tr>
<tr>
<td> <code>clare</code> <span class="citation" data-cites="Alzantot2018GeneratingNL Jia2019CertifiedRT"></span></td>
<td><sub>Untargeted {Classification, Entailment}</sub></td>
<td><sub>USE sentence encoding cosine similarity</sub></td>
<td><sub>RoBERTa Masked Prediction for token swap, insert and merge</sub></td>
<td><sub>Greedy</sub></td>
<td ><sub>["Contextualized Perturbation for Textual Adversarial Attack" (Li et al., 2020)](https://arxiv.org/abs/2009.07502))</sub></td>
</tr>
<tr>
<td><code>deepwordbug</code> <span class="citation" data-cites="Gao2018BlackBoxGO"></span></td>
<td><sub>{Untargeted, Targeted} Classification</sub></td>
<td><sub>Levenshtein edit distance</sub></td>
<td><sub>{Character Insertion, Character Deletion, Neighboring Character Swap, Character Substitution}</sub></td>
<td><sub>Greedy-WIR</sub></td>
<td ><sub>Greedy replace-1 scoring and multi-transformation character-swap attack (["Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers" (Gao et al., 2018)](https://arxiv.org/abs/1801.04354)</sub></td>
</tr>
<tr>
<td> <code>fast-alzantot</code> <span class="citation" data-cites="Alzantot2018GeneratingNL Jia2019CertifiedRT"></span></td>
<td><sub>Untargeted {Classification, Entailment}</sub></td>
<td><sub>Percentage of words perturbed, Language Model perplexity, Word embedding distance</sub></td>
<td><sub>Counter-fitted word embedding swap</sub></td>
<td><sub>Genetic Algorithm</sub></td>
<td ><sub>Modified, faster version of the Alzantot et al. genetic algorithm, from (["Certified Robustness to Adversarial Word Substitutions" (Jia et al., 2019)](https://arxiv.org/abs/1909.00986))</sub></td>
</tr>
<tr>
<td><code>hotflip</code> (word swap) <span class="citation" data-cites="Ebrahimi2017HotFlipWA"></span></td>
<td><sub>Untargeted Classification</sub></td>
<td><sub>Word Embedding Cosine Similarity, Part-of-speech match, Number of words perturbed</sub></td>
<td><sub>Gradient-Based Word Swap</sub></td>
<td><sub>Beam search</sub></td>
<td ><sub> (["HotFlip: White-Box Adversarial Examples for Text Classification" (Ebrahimi et al., 2017)](https://arxiv.org/abs/1712.06751))</sub></td>
</tr>
<tr>
<td><code>iga</code> <span class="citation" data-cites="iga-wang2019natural"></span></td>
<td><sub>Untargeted {Classification, Entailment}</sub></td>
<td><sub>Percentage of words perturbed, Word embedding distance</sub></td>
<td><sub>Counter-fitted word embedding swap</sub></td>
<td><sub>Genetic Algorithm</sub></td>
<td ><sub>Improved genetic algorithm -based word substitution from (["Natural Language Adversarial Attacks and Defenses in Word Level (Wang et al., 2019)"](https://arxiv.org/abs/1909.06723)</sub></td>
</tr>
<tr>
<td><code>input-reduction</code> <span class="citation" data-cites="feng2018pathologies"></span></td>
<td><sub>Input Reduction</sub></td>
<td></td>
<td><sub>Word deletion</sub></td>
<td><sub>Greedy-WIR</sub></td>
<td ><sub>Greedy attack with word importance ranking , Reducing the input while maintaining the prediction through word importance ranking (["Pathologies of Neural Models Make Interpretation Difficult" (Feng et al., 2018)](https://arxiv.org/pdf/1804.07781.pdf))</sub></td>
</tr>
<tr>
<td><code>kuleshov</code> <span class="citation" data-cites="Kuleshov2018AdversarialEF"></span></td>
<td><sub>Untargeted Classification</sub></td>
<td><sub>Thought vector encoding cosine similarity, Language model similarity probability</sub></td>
<td><sub>Counter-fitted word embedding swap</sub></td>
<td><sub>Greedy word swap</sub></td>
<td ><sub>(["Adversarial Examples for Natural Language Classification Problems" (Kuleshov et al., 2018)](https://openreview.net/pdf?id=r1QZ3zbAZ)) </sub></td>
</tr>
<tr>
<td><code>pruthi</code> <span class="citation" data-cites="pruthi2019combating"></span></td>
<td><sub>Untargeted Classification</sub></td>
<td><sub>Minimum word length, Maximum number of words perturbed</sub></td>
<td><sub>{Neighboring Character Swap, Character Deletion, Character Insertion, Keyboard-Based Character Swap}</sub></td>
<td><sub>Greedy search</sub></td>
<td ><sub>simulates common typos (["Combating Adversarial Misspellings with Robust Word Recognition" (Pruthi et al., 2019)](https://arxiv.org/abs/1905.11268) </sub></td>
</tr>
<tr>
<td><code>pso</code> <span class="citation" data-cites="pso-zang-etal-2020-word"></span></td>
<td><sub>Untargeted Classification</sub></td>
<td></td>
<td><sub>HowNet Word Swap</sub></td>
<td><sub>Particle Swarm Optimization</sub></td>
<td ><sub>(["Word-level Textual Adversarial Attacking as Combinatorial Optimization" (Zang et al., 2020)](https://www.aclweb.org/anthology/2020.acl-main.540/)) </sub></td>
</tr>
<tr>
<td><code>pwws</code> <span class="citation" data-cites="pwws-ren-etal-2019-generating"></span></td>
<td><sub>Untargeted Classification</sub></td>
<td></td>
<td><sub>WordNet-based synonym swap</sub></td>
<td><sub>Greedy-WIR (saliency)</sub></td>
<td ><sub>Greedy attack with word importance ranking based on word saliency and synonym swap scores (["Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency" (Ren et al., 2019)](https://www.aclweb.org/anthology/P19-1103/))</sub> </td>
</tr>
<tr>
<td><code>textbugger</code> : (black-box) <span class="citation" data-cites="Li2019TextBuggerGA"></span></td>
<td><sub>Untargeted Classification</sub></td>
<td><sub>USE sentence encoding cosine similarity</sub></td>
<td><sub>{Character Insertion, Character Deletion, Neighboring Character Swap, Character Substitution}</sub></td>
<td><sub>Greedy-WIR</sub></td>
<td ><sub>([(["TextBugger: Generating Adversarial Text Against Real-world Applications" (Li et al., 2018)](https://arxiv.org/abs/1812.05271)).</sub></td>
</tr>
<tr>
<td><code>textfooler</code> <span class="citation" data-cites="Jin2019TextFooler"></span></td>
<td><sub>Untargeted {Classification, Entailment}</sub></td>
<td><sub>Word Embedding Distance, Part-of-speech match, USE sentence encoding cosine similarity</sub></td>
<td><sub>Counter-fitted word embedding swap</sub></td>
<td><sub>Greedy-WIR</sub></td>
<td ><sub>Greedy attack with word importance ranking (["Is Bert Really Robust?" (Jin et al., 2019)](https://arxiv.org/abs/1907.11932))</sub> </td>
</tr>
<tr><td style="text-align: center;" colspan="6"><strong><br>Attacks on sequence-to-sequence models: <br></strong></td></tr>
<tr>
<td><code>morpheus</code> <span class="citation" data-cites="morpheus-tan-etal-2020-morphin"></span></td>
<td><sub>Minimum BLEU Score</sub> </td>
<td></td>
<td><sub>Inflection Word Swap</sub> </td>
<td><sub>Greedy search</sub> </td>
<td ><sub>Greedy to replace words with their inflections with the goal of minimizing BLEU score (["Its Morphin Time! Combating Linguistic Discrimination with Inflectional Perturbations"](https://www.aclweb.org/anthology/2020.acl-main.263.pdf)</sub> </td>
</tr>
</tr>
<tr>
<td><code>seq2sick</code> :(black-box) <span class="citation" data-cites="cheng2018seq2sick"></span></td>
<td><sub>Non-overlapping output</sub> </td>
<td></td>
<td><sub>Counter-fitted word embedding swap</sub> </td>
<td><sub>Greedy-WIR</sub></td>
<td ><sub>Greedy attack with goal of changing every word in the output translation. Currently implemented as black-box with plans to change to white-box as done in paper (["Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples" (Cheng et al., 2018)](https://arxiv.org/abs/1803.01128)) </sub> </td>
</tr>
</tbody>
</font>
</table>
## Recipe Usage Examples
Here are some examples of testing attacks from the literature from the command-line:
*TextFooler against BERT fine-tuned on SST-2:*
```bash
textattack attack --model bert-base-uncased-sst2 --recipe textfooler --num-examples 10
```
*seq2sick (black-box) against T5 fine-tuned for English-German translation:*
```bash
textattack attack --model t5-en-de --recipe seq2sick --num-examples 100
```

View File

@@ -1,8 +1,36 @@
======================
Augmenter Recipes
======================
Augmenter Recipes API
=====================
Transformations and constraints can be used for simple NLP data augmentations. Here is a list of recipes for NLP data augmentations
Summary: Transformations and constraints can be used for simple NLP data augmentations.
In addition to the command-line interface, you can augment text dynamically by importing the
`Augmenter` in your own code. All `Augmenter` objects implement `augment` and `augment_many` to generate augmentations
of a string or a list of strings. Here's an example of how to use the `EmbeddingAugmenter` in a python script:
.. code-block:: python
>>> from textattack.augmentation import EmbeddingAugmenter
>>> augmenter = EmbeddingAugmenter()
>>> s = 'What I cannot create, I do not understand.'
>>> augmenter.augment(s)
['What I notable create, I do not understand.', 'What I significant create, I do not understand.', 'What I cannot engender, I do not understand.', 'What I cannot creating, I do not understand.', 'What I cannot creations, I do not understand.', 'What I cannot create, I do not comprehend.', 'What I cannot create, I do not fathom.', 'What I cannot create, I do not understanding.', 'What I cannot create, I do not understands.', 'What I cannot create, I do not understood.', 'What I cannot create, I do not realise.']
You can also create your own augmenter from scratch by importing transformations/constraints from `textattack.transformations` and `textattack.constraints`. Here's an example that generates augmentations of a string using `WordSwapRandomCharacterDeletion`:
.. code-block:: python
>>> from textattack.transformations import WordSwapRandomCharacterDeletion
>>> from textattack.transformations import CompositeTransformation
>>> from textattack.augmentation import Augmenter
>>> transformation = CompositeTransformation([WordSwapRandomCharacterDeletion()])
>>> augmenter = Augmenter(transformation=transformation, transformations_per_example=5)
>>> s = 'What I cannot create, I do not understand.'
>>> augmenter.augment(s)
['What I cannot creae, I do not understand.', 'What I cannot creat, I do not understand.', 'What I cannot create, I do not nderstand.', 'What I cannot create, I do nt understand.', 'Wht I cannot create, I do not understand.']
Here is a list of recipes for NLP data augmentations
.. automodule:: textattack.augmentation.recipes
:members:

View File

@@ -0,0 +1,67 @@
# Augmenter Recipes CommandLine Use
Transformations and constraints can be used for simple NLP data augmentations.
The [`examples/`](https://github.com/QData/TextAttack/tree/master/examples) folder includes scripts showing common TextAttack usage for training models, running attacks, and augmenting a CSV file.
The [documentation website](https://textattack.readthedocs.io/en/latest) contains walkthroughs explaining basic usage of TextAttack, including building a custom transformation and a custom constraint..
### Augmenting Text: `textattack augment`
Many of the components of TextAttack are useful for data augmentation. The `textattack.Augmenter` class
uses a transformation and a list of constraints to augment data. We also offer built-in recipes
for data augmentation:
- `wordnet` augments text by replacing words with WordNet synonyms
- `embedding` augments text by replacing words with neighbors in the counter-fitted embedding space, with a constraint to ensure their cosine similarity is at least 0.8
- `charswap` augments text by substituting, deleting, inserting, and swapping adjacent characters
- `eda` augments text with a combination of word insertions, substitutions and deletions.
- `checklist` augments text by contraction/extension and by substituting names, locations, numbers.
- `clare` augments text by replacing, inserting, and merging with a pre-trained masked language model.
### Augmentation Command-Line Interface
The easiest way to use our data augmentation tools is with `textattack augment <args>`.
`textattack augment`
takes an input CSV file, the "text" column to augment, along with the number of words to change per augmentation
and the number of augmentations per input example. It outputs a CSV in the same format with all the augmented examples in the proper columns.
> For instance, when given the following as `examples.csv`:
```
"text",label
"the rock is destined to be the 21st century's new conan and that he's going to make a splash even greater than arnold schwarzenegger , jean- claud van damme or steven segal.", 1
"the gorgeously elaborate continuation of 'the lord of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded vision of j . r . r . tolkien's middle-earth .", 1
"take care of my cat offers a refreshingly different slice of asian cinema .", 1
"a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish line proves simply too discouraging to let slide .", 0
"it's a mystery how the movie could be released in this condition .", 0
```
The command
```
textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe embedding --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original
```
will augment the `text` column by altering 10% of each example's words, generating twice as many augmentations as original inputs, and exclude the original inputs from the
output CSV. (All of this will be saved to `augment.csv` by default.)
> **Tip:** Just as running attacks interactively, you can also pass `--interactive` to augment samples inputted by the user to quickly try out different augmentation recipes!
After augmentation, here are the contents of `augment.csv`:
```
text,label
"the rock is destined to be the 21st century's newest conan and that he's gonna to make a splashing even stronger than arnold schwarzenegger , jean- claud van damme or steven segal.",1
"the rock is destined to be the 21tk century's novel conan and that he's going to make a splat even greater than arnold schwarzenegger , jean- claud van damme or stevens segal.",1
the gorgeously elaborate continuation of 'the lord of the rings' trilogy is so huge that a column of expression significant adequately describe co-writer/director pedro jackson's expanded vision of j . rs . r . tolkien's middle-earth .,1
the gorgeously elaborate continuation of 'the lordy of the piercings' trilogy is so huge that a column of mots cannot adequately describe co-novelist/director peter jackson's expanded vision of j . r . r . tolkien's middle-earth .,1
take care of my cat offerings a pleasantly several slice of asia cinema .,1
taking care of my cat offers a pleasantly different slice of asiatic kino .,1
a technically good-made suspenser . . . but its abrupt drop in iq points as it races to the finish bloodline proves straightforward too disheartening to let slide .,0
a technically well-made suspenser . . . but its abrupt drop in iq dot as it races to the finish line demonstrates simply too disheartening to leave slide .,0
it's a enigma how the film wo be releases in this condition .,0
it's a enigma how the filmmaking wo be publicized in this condition .,0
```
The 'embedding' augmentation recipe uses counterfitted embedding nearest-neighbors to augment data.

View File

@@ -50,26 +50,26 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- AG News (`lstm-ag-news`)
- `datasets` dataset `ag_news`, split `test`
- Successes: 914/1000
- Correct/Whole: 914/1000
- Accuracy: 91.4%
- IMDB (`lstm-imdb`)
- `datasets` dataset `imdb`, split `test`
- Successes: 883/1000
- Correct/Whole: 883/1000
- Accuracy: 88.30%
- Movie Reviews [Rotten Tomatoes] (`lstm-mr`)
- `datasets` dataset `rotten_tomatoes`, split `validation`
- Successes: 807/1000
- Correct/Whole: 807/1000
- Accuracy: 80.70%
- `datasets` dataset `rotten_tomatoes`, split `test`
- Successes: 781/1000
- Correct/Whole: 781/1000
- Accuracy: 78.10%
- SST-2 (`lstm-sst2`)
- `datasets` dataset `glue`, subset `sst2`, split `validation`
- Successes: 737/872
- Correct/Whole: 737/872
- Accuracy: 84.52%
- Yelp Polarity (`lstm-yelp`)
- `datasets` dataset `yelp_polarity`, split `test`
- Successes: 922/1000
- Correct/Whole: 922/1000
- Accuracy: 92.20%
</section>
@@ -81,26 +81,26 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- AG News (`cnn-ag-news`)
- `datasets` dataset `ag_news`, split `test`
- Successes: 910/1000
- Correct/Whole: 910/1000
- Accuracy: 91.00%
- IMDB (`cnn-imdb`)
- `datasets` dataset `imdb`, split `test`
- Successes: 863/1000
- Correct/Whole: 863/1000
- Accuracy: 86.30%
- Movie Reviews [Rotten Tomatoes] (`cnn-mr`)
- `datasets` dataset `rotten_tomatoes`, split `validation`
- Successes: 794/1000
- Correct/Whole: 794/1000
- Accuracy: 79.40%
- `datasets` dataset `rotten_tomatoes`, split `test`
- Successes: 768/1000
- Correct/Whole: 768/1000
- Accuracy: 76.80%
- SST-2 (`cnn-sst2`)
- `datasets` dataset `glue`, subset `sst2`, split `validation`
- Successes: 721/872
- Correct/Whole: 721/872
- Accuracy: 82.68%
- Yelp Polarity (`cnn-yelp`)
- `datasets` dataset `yelp_polarity`, split `test`
- Successes: 913/1000
- Correct/Whole: 913/1000
- Accuracy: 91.30%
</section>
@@ -112,38 +112,38 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- AG News (`albert-base-v2-ag-news`)
- `datasets` dataset `ag_news`, split `test`
- Successes: 943/1000
- Correct/Whole: 943/1000
- Accuracy: 94.30%
- CoLA (`albert-base-v2-cola`)
- `datasets` dataset `glue`, subset `cola`, split `validation`
- Successes: 829/1000
- Correct/Whole: 829/1000
- Accuracy: 82.90%
- IMDB (`albert-base-v2-imdb`)
- `datasets` dataset `imdb`, split `test`
- Successes: 913/1000
- Correct/Whole: 913/1000
- Accuracy: 91.30%
- Movie Reviews [Rotten Tomatoes] (`albert-base-v2-mr`)
- `datasets` dataset `rotten_tomatoes`, split `validation`
- Successes: 882/1000
- Correct/Whole: 882/1000
- Accuracy: 88.20%
- `datasets` dataset `rotten_tomatoes`, split `test`
- Successes: 851/1000
- Correct/Whole: 851/1000
- Accuracy: 85.10%
- Quora Question Pairs (`albert-base-v2-qqp`)
- `datasets` dataset `glue`, subset `qqp`, split `validation`
- Successes: 914/1000
- Correct/Whole: 914/1000
- Accuracy: 91.40%
- Recognizing Textual Entailment (`albert-base-v2-rte`)
- `datasets` dataset `glue`, subset `rte`, split `validation`
- Successes: 211/277
- Correct/Whole: 211/277
- Accuracy: 76.17%
- SNLI (`albert-base-v2-snli`)
- `datasets` dataset `snli`, split `test`
- Successes: 883/1000
- Correct/Whole: 883/1000
- Accuracy: 88.30%
- SST-2 (`albert-base-v2-sst2`)
- `datasets` dataset `glue`, subset `sst2`, split `validation`
- Successes: 807/872
- Correct/Whole: 807/872
- Accuracy: 92.55%)
- STS-b (`albert-base-v2-stsb`)
- `datasets` dataset `glue`, subset `stsb`, split `validation`
@@ -151,11 +151,11 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- Spearman correlation: 0.8995912861209745
- WNLI (`albert-base-v2-wnli`)
- `datasets` dataset `glue`, subset `wnli`, split `validation`
- Successes: 42/71
- Correct/Whole: 42/71
- Accuracy: 59.15%
- Yelp Polarity (`albert-base-v2-yelp`)
- `datasets` dataset `yelp_polarity`, split `test`
- Successes: 963/1000
- Correct/Whole: 963/1000
- Accuracy: 96.30%
</section>
@@ -166,50 +166,50 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- AG News (`bert-base-uncased-ag-news`)
- `datasets` dataset `ag_news`, split `test`
- Successes: 942/1000
- Correct/Whole: 942/1000
- Accuracy: 94.20%
- CoLA (`bert-base-uncased-cola`)
- `datasets` dataset `glue`, subset `cola`, split `validation`
- Successes: 812/1000
- Correct/Whole: 812/1000
- Accuracy: 81.20%
- IMDB (`bert-base-uncased-imdb`)
- `datasets` dataset `imdb`, split `test`
- Successes: 919/1000
- Correct/Whole: 919/1000
- Accuracy: 91.90%
- MNLI matched (`bert-base-uncased-mnli`)
- `datasets` dataset `glue`, subset `mnli`, split `validation_matched`
- Successes: 840/1000
- Correct/Whole: 840/1000
- Accuracy: 84.00%
- Movie Reviews [Rotten Tomatoes] (`bert-base-uncased-mr`)
- `datasets` dataset `rotten_tomatoes`, split `validation`
- Successes: 876/1000
- Correct/Whole: 876/1000
- Accuracy: 87.60%
- `datasets` dataset `rotten_tomatoes`, split `test`
- Successes: 838/1000
- Correct/Whole: 838/1000
- Accuracy: 83.80%
- MRPC (`bert-base-uncased-mrpc`)
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
- Successes: 358/408
- Correct/Whole: 358/408
- Accuracy: 87.75%
- QNLI (`bert-base-uncased-qnli`)
- `datasets` dataset `glue`, subset `qnli`, split `validation`
- Successes: 904/1000
- Correct/Whole: 904/1000
- Accuracy: 90.40%
- Quora Question Pairs (`bert-base-uncased-qqp`)
- `datasets` dataset `glue`, subset `qqp`, split `validation`
- Successes: 924/1000
- Correct/Whole: 924/1000
- Accuracy: 92.40%
- Recognizing Textual Entailment (`bert-base-uncased-rte`)
- `datasets` dataset `glue`, subset `rte`, split `validation`
- Successes: 201/277
- Correct/Whole: 201/277
- Accuracy: 72.56%
- SNLI (`bert-base-uncased-snli`)
- `datasets` dataset `snli`, split `test`
- Successes: 894/1000
- Correct/Whole: 894/1000
- Accuracy: 89.40%
- SST-2 (`bert-base-uncased-sst2`)
- `datasets` dataset `glue`, subset `sst2`, split `validation`
- Successes: 806/872
- Correct/Whole: 806/872
- Accuracy: 92.43%)
- STS-b (`bert-base-uncased-stsb`)
- `datasets` dataset `glue`, subset `stsb`, split `validation`
@@ -217,11 +217,11 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- Spearman correlation: 0.8773251339980935
- WNLI (`bert-base-uncased-wnli`)
- `datasets` dataset `glue`, subset `wnli`, split `validation`
- Successes: 40/71
- Correct/Whole: 40/71
- Accuracy: 56.34%
- Yelp Polarity (`bert-base-uncased-yelp`)
- `datasets` dataset `yelp_polarity`, split `test`
- Successes: 963/1000
- Correct/Whole: 963/1000
- Accuracy: 96.30%
</section>
@@ -233,23 +233,23 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- CoLA (`distilbert-base-cased-cola`)
- `datasets` dataset `glue`, subset `cola`, split `validation`
- Successes: 786/1000
- Correct/Whole: 786/1000
- Accuracy: 78.60%
- MRPC (`distilbert-base-cased-mrpc`)
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
- Successes: 320/408
- Correct/Whole: 320/408
- Accuracy: 78.43%
- Quora Question Pairs (`distilbert-base-cased-qqp`)
- `datasets` dataset `glue`, subset `qqp`, split `validation`
- Successes: 908/1000
- Correct/Whole: 908/1000
- Accuracy: 90.80%
- SNLI (`distilbert-base-cased-snli`)
- `datasets` dataset `snli`, split `test`
- Successes: 861/1000
- Correct/Whole: 861/1000
- Accuracy: 86.10%
- SST-2 (`distilbert-base-cased-sst2`)
- `datasets` dataset `glue`, subset `sst2`, split `validation`
- Successes: 785/872
- Correct/Whole: 785/872
- Accuracy: 90.02%)
- STS-b (`distilbert-base-cased-stsb`)
- `datasets` dataset `glue`, subset `stsb`, split `validation`
@@ -264,31 +264,31 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- AG News (`distilbert-base-uncased-ag-news`)
- `datasets` dataset `ag_news`, split `test`
- Successes: 944/1000
- Correct/Whole: 944/1000
- Accuracy: 94.40%
- CoLA (`distilbert-base-uncased-cola`)
- `datasets` dataset `glue`, subset `cola`, split `validation`
- Successes: 786/1000
- Correct/Whole: 786/1000
- Accuracy: 78.60%
- IMDB (`distilbert-base-uncased-imdb`)
- `datasets` dataset `imdb`, split `test`
- Successes: 903/1000
- Correct/Whole: 903/1000
- Accuracy: 90.30%
- MNLI matched (`distilbert-base-uncased-mnli`)
- `datasets` dataset `glue`, subset `mnli`, split `validation_matched`
- Successes: 817/1000
- Correct/Whole: 817/1000
- Accuracy: 81.70%
- MRPC (`distilbert-base-uncased-mrpc`)
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
- Successes: 350/408
- Correct/Whole: 350/408
- Accuracy: 85.78%
- QNLI (`distilbert-base-uncased-qnli`)
- `datasets` dataset `glue`, subset `qnli`, split `validation`
- Successes: 860/1000
- Correct/Whole: 860/1000
- Accuracy: 86.00%
- Recognizing Textual Entailment (`distilbert-base-uncased-rte`)
- `datasets` dataset `glue`, subset `rte`, split `validation`
- Successes: 180/277
- Correct/Whole: 180/277
- Accuracy: 64.98%
- STS-b (`distilbert-base-uncased-stsb`)
- `datasets` dataset `glue`, subset `stsb`, split `validation`
@@ -296,7 +296,7 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- Spearman correlation: 0.8407155030382939
- WNLI (`distilbert-base-uncased-wnli`)
- `datasets` dataset `glue`, subset `wnli`, split `validation`
- Successes: 40/71
- Correct/Whole: 40/71
- Accuracy: 56.34%
</section>
@@ -307,38 +307,38 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- AG News (`roberta-base-ag-news`)
- `datasets` dataset `ag_news`, split `test`
- Successes: 947/1000
- Correct/Whole: 947/1000
- Accuracy: 94.70%
- CoLA (`roberta-base-cola`)
- `datasets` dataset `glue`, subset `cola`, split `validation`
- Successes: 857/1000
- Correct/Whole: 857/1000
- Accuracy: 85.70%
- IMDB (`roberta-base-imdb`)
- `datasets` dataset `imdb`, split `test`
- Successes: 941/1000
- Correct/Whole: 941/1000
- Accuracy: 94.10%
- Movie Reviews [Rotten Tomatoes] (`roberta-base-mr`)
- `datasets` dataset `rotten_tomatoes`, split `validation`
- Successes: 899/1000
- Correct/Whole: 899/1000
- Accuracy: 89.90%
- `datasets` dataset `rotten_tomatoes`, split `test`
- Successes: 883/1000
- Correct/Whole: 883/1000
- Accuracy: 88.30%
- MRPC (`roberta-base-mrpc`)
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
- Successes: 371/408
- Correct/Whole: 371/408
- Accuracy: 91.18%
- QNLI (`roberta-base-qnli`)
- `datasets` dataset `glue`, subset `qnli`, split `validation`
- Successes: 917/1000
- Correct/Whole: 917/1000
- Accuracy: 91.70%
- Recognizing Textual Entailment (`roberta-base-rte`)
- `datasets` dataset `glue`, subset `rte`, split `validation`
- Successes: 217/277
- Correct/Whole: 217/277
- Accuracy: 78.34%
- SST-2 (`roberta-base-sst2`)
- `datasets` dataset `glue`, subset `sst2`, split `validation`
- Successes: 820/872
- Correct/Whole: 820/872
- Accuracy: 94.04%)
- STS-b (`roberta-base-stsb`)
- `datasets` dataset `glue`, subset `stsb`, split `validation`
@@ -346,7 +346,7 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- Spearman correlation: 0.9025045272903051
- WNLI (`roberta-base-wnli`)
- `datasets` dataset `glue`, subset `wnli`, split `validation`
- Successes: 40/71
- Correct/Whole: 40/71
- Accuracy: 56.34%
</section>
@@ -357,26 +357,26 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- CoLA (`xlnet-base-cased-cola`)
- `datasets` dataset `glue`, subset `cola`, split `validation`
- Successes: 800/1000
- Correct/Whole: 800/1000
- Accuracy: 80.00%
- IMDB (`xlnet-base-cased-imdb`)
- `datasets` dataset `imdb`, split `test`
- Successes: 957/1000
- Correct/Whole: 957/1000
- Accuracy: 95.70%
- Movie Reviews [Rotten Tomatoes] (`xlnet-base-cased-mr`)
- `datasets` dataset `rotten_tomatoes`, split `validation`
- Successes: 908/1000
- Correct/Whole: 908/1000
- Accuracy: 90.80%
- `datasets` dataset `rotten_tomatoes`, split `test`
- Successes: 876/1000
- Correct/Whole: 876/1000
- Accuracy: 87.60%
- MRPC (`xlnet-base-cased-mrpc`)
- `datasets` dataset `glue`, subset `mrpc`, split `validation`
- Successes: 363/408
- Correct/Whole: 363/408
- Accuracy: 88.97%
- Recognizing Textual Entailment (`xlnet-base-cased-rte`)
- `datasets` dataset `glue`, subset `rte`, split `validation`
- Successes: 196/277
- Correct/Whole: 196/277
- Accuracy: 70.76%
- STS-b (`xlnet-base-cased-stsb`)
- `datasets` dataset `glue`, subset `stsb`, split `validation`
@@ -384,7 +384,7 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- Spearman correlation: 0.8773439961182335
- WNLI (`xlnet-base-cased-wnli`)
- `datasets` dataset `glue`, subset `wnli`, split `validation`
- Successes: 41/71
- Correct/Whole: 41/71
- Accuracy: 57.75%
</section>
@@ -400,7 +400,7 @@ All evaluations shown are on the full validation or test set up to 1000 examples
- We host all TextAttack Models at huggingface Model Hub: [https://huggingface.co/textattack](https://huggingface.co/textattack)
### Training details for each TextAttack Model
## Training details for each TextAttack Model
All of our models have model cards on the HuggingFace model hub. So for now, the easiest way to figure this out is as follows:
@@ -416,3 +416,81 @@ All of our models have model cards on the HuggingFace model hub. So for now, the
## More details on TextAttack fine-tuned NLP models (details on target NLP task, input type, output type, SOTA results on paperswithcode; model card on huggingface):
Fine-tuned Model | NLP Task | Input type | Output Type | paperswithcode.com SOTA | huggingface.co Model Card
--------------|-----------------|--------------------|--------------------|--------------------------|-------------------------------
albert-base-v2-CoLA | linguistic acceptability | single sentences | binary (1=acceptable/ 0=unacceptable) | <sub><sup>https://paperswithcode.com/sota/linguistic-acceptability-on-cola </sub></sup> | <sub><sup>https://huggingface.co/textattack/albert-base-v2-CoLA </sub></sup>
bert-base-uncased-CoLA | linguistic acceptability | single sentences | binary (1=acceptable/ 0=unacceptable) | none yet | <sub><sup>https://huggingface.co/textattack/bert-base-uncased-CoLA </sub></sup>
distilbert-base-cased-CoLA | linguistic acceptability | single sentences | binary (1=acceptable/ 0=unacceptable) | <sub><sup> https://paperswithcode.com/sota/linguistic-acceptability-on-cola </sub></sup> | <sub><sup>https://huggingface.co/textattack/distilbert-base-cased-CoLA </sub></sup>
distilbert-base-uncased-CoLA | linguistic acceptability | single sentences | binary (1=acceptable/ 0=unacceptable) | <sub><sup> https://paperswithcode.com/sota/linguistic-acceptability-on-cola </sub></sup> | <sub><sup>https://huggingface.co/textattack/distilbert-base-uncased-CoLA </sub></sup>
roberta-base-CoLA | linguistic acceptability | single sentences | binary (1=acceptable/ 0=unacceptable) | <sub><sup> https://paperswithcode.com/sota/linguistic-acceptability-on-cola </sub></sup> | <sub><sup> https://huggingface.co/textattack/roberta-base-CoLA </sub></sup>
xlnet-base-cased-CoLA | linguistic acceptability | single sentences | binary (1=acceptable/ 0=unacceptable) | <sub><sup> https://paperswithcode.com/sota/linguistic-acceptability-on-cola </sub></sup> | <sub><sup>https://huggingface.co/textattack/xlnet-base-cased-CoLA </sub></sup>
albert-base-v2-RTE | natural language inference | sentence pairs (1 premise and 1 hypothesis) | binary(0=entailed/1=not entailed) | <sub><sup> https://paperswithcode.com/sota/natural-language-inference-on-rte </sub></sup> | <sub><sup> https://huggingface.co/textattack/albert-base-v2-RTE </sub></sup>
albert-base-v2-snli | natural language inference | sentence pairs | accuracy (0=entailment, 1=neutral,2=contradiction) | none yet | <sub><sup> https://huggingface.co/textattack/albert-base-v2-snli </sub></sup>
albert-base-v2-WNLI | natural language inference | sentence pairs | binary | <sub><sup> https://paperswithcode.com/sota/natural-language-inference-on-wnli </sub></sup> | <sub><sup> https://huggingface.co/textattack/albert-base-v2-WNLI</sub></sup>
bert-base-uncased-MNLI | natural language inference | sentence pairs (1 premise and 1 hypothesis) | accuracy (0=entailment, 1=neutral,2=contradiction) | none yet | <sub><sup> https://huggingface.co/textattack/bert-base-uncased-MNLI </sub></sup>
bert-base-uncased-QNLI | natural language inference | question/answer pairs | binary (1=unanswerable/ 0=answerable) | none yet |<sub><sup> https://huggingface.co/textattack/bert-base-uncased-QNLI </sub></sup>
bert-base-uncased-RTE | natural language inference | sentence pairs (1 premise and 1 hypothesis) | binary(0=entailed/1=not entailed) | none yet | <sub><sup> https://huggingface.co/textattack/bert-base-uncased-RTE </sub></sup>
bert-base-uncased-snli | natural language inference | sentence pairs | accuracy (0=entailment, 1=neutral,2=contradiction) | none yet | <sub><sup> https://huggingface.co/textattack/bert-base-uncased-snli </sub></sup>
bert-base-uncased-WNLI | natural language inference | sentence pairs | binary | none yet | <sub><sup> https://huggingface.co/textattack/bert-base-uncased-WNLI </sub></sup>
distilbert-base-cased-snli | natural language inference | sentence pairs | accuracy (0=entailment, 1=neutral,2=contradiction) | none yet | <sub><sup> https://huggingface.co/textattack/distilbert-base-cased-snli </sub></sup>
distilbert-base-uncased-MNLI | natural language inference | sentence pairs (1 premise and 1 hypothesis) | accuracy (0=entailment,1=neutral, 2=contradiction) | none yet | <sub><sup> https://huggingface.co/textattack/distilbert-base-uncased-MNLI </sub></sup>
distilbert-base-uncased-RTE | natural language inference | sentence pairs (1 premise and 1 hypothesis) | binary(0=entailed/1=not entailed) | <sub><sup> https://paperswithcode.com/sota/natural-language-inference-on-rte </sub></sup> | <sub><sup> https://huggingface.co/textattack/distilbert-base-uncased-RTE</sub></sup>
distilbert-base-uncased-WNLI | natural language inference | sentence pairs | binary | <sub><sup> https://paperswithcode.com/sota/natural-language-inference-on-wnli </sub></sup> | <sub><sup> https://huggingface.co/textattack/distilbert-base-uncased-WNLI </sub></sup>
roberta-base-QNLI | natural language inference | question/answer pairs | binary (1=unanswerable/ 0=answerable) | <sub><sup> https://paperswithcode.com/sota/natural-language-inference-on-qnli </sub></sup> | <sub><sup> https://huggingface.co/textattack/roberta-base-QNLI </sub></sup>
roberta-base-RTE | natural language inference | sentence pairs (1 premise and 1 hypothesis) | binary(0=entailed/1=not entailed) | <sub><sup> https://paperswithcode.com/sota/natural-language-inference-on-rte </sub></sup> | <sub><sup> https://huggingface.co/textattack/roberta-base-RTE</sub></sup>
roberta-base-WNLI | natural language inference | sentence pairs | binary | <sub><sup> https://paperswithcode.com/sota/natural-language-inference-on-wnli </sub></sup> | https://huggingface.co/textattack/roberta-base-WNLI </sub></sup>
xlnet-base-cased-RTE | natural language inference | sentence pairs (1 premise and 1 hypothesis) | binary(0=entailed/1=not entailed) | <sub><sup> https://paperswithcode.com/sota/ </sub></sup>natural-language-inference-on-rte | <sub><sup> https://huggingface.co/textattack/xlnet-base-cased-RTE </sub></sup>
xlnet-base-cased-WNLI | natural language inference | sentence pairs | binary | none yet | <sub><sup> https://huggingface.co/textattack/xlnet-base-cased-WNLI </sub></sup>
albert-base-v2-QQP | paraphase similarity | question pairs | binary (1=similar/0=not similar) | <sub><sup> https://paperswithcode.com/sota/question-answering-on-quora-question-pairs </sub></sup> | <sub><sup> https://huggingface.co/textattack/albert-base-v2-QQP</sub></sup>
bert-base-uncased-QQP | paraphase similarity | question pairs | binary (1=similar/0=not similar) | <sub><sup> https://paperswithcode.com/sota/question-answering-on-quora-question-pairs </sub></sup> | <sub><sup> https://huggingface.co/textattack/bert-base-uncased-QQP </sub></sup>
distilbert-base-uncased-QNLI | question answering/natural language inference | question/answer pairs | binary (1=unanswerable/ 0=answerable) | <sub><sup> https://paperswithcode.com/sota/natural-language-inference-on-qnli </sub></sup> | <sub><sup> https://huggingface.co/textattack/distilbert-base-uncased-QNLI </sub></sup>
distilbert-base-cased-QQP | question answering/paraphase similarity | question pairs | binary (1=similar/ 0=not similar) | <sub><sup> https://paperswithcode.com/sota/question-answering-on-quora-question-pairs </sub></sup> | <sub><sup> https://huggingface.co/textattack/distilbert-base-cased-QQP </sub></sup>
albert-base-v2-STS-B | semantic textual similarity | sentence pairs | similarity (0.0 to 5.0) | <sub><sup> https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark </sub></sup> | <sub><sup> https://huggingface.co/textattack/albert-base-v2-STS-B </sub></sup>
bert-base-uncased-MRPC | semantic textual similarity | sentence pairs | binary (1=similar/0=not similar) | none yet | <sub><sup> https://huggingface.co/textattack/bert-base-uncased-MRPC </sub></sup>
bert-base-uncased-STS-B | semantic textual similarity | sentence pairs | similarity (0.0 to 5.0) | none yet | <sub><sup> https://huggingface.co/textattack/bert-base-uncased-STS-B </sub></sup>
distilbert-base-cased-MRPC | semantic textual similarity | sentence pairs | binary (1=similar/0=not similar) | <sub><sup> https://paperswithcode.com/sota/semantic-textual-similarity-on-mrpc </sub></sup> | <sub><sup> https://huggingface.co/textattack/distilbert-base-cased-MRPC </sub></sup>
distilbert-base-cased-STS-B | semantic textual similarity | sentence pairs | similarity (0.0 to 5.0) | <sub><sup> https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark </sub></sup> | <sub><sup> https://huggingface.co/textattack/distilbert-base-cased-STS-B </sub></sup>
distilbert-base-uncased-MRPC | semantic textual similarity | sentence pairs | binary (1=similar/0=not similar) | <sub><sup> https://paperswithcode.com/sota/semantic-textual-similarity-on-mrpc </sub></sup> | <sub><sup> https://huggingface.co/textattack/distilbert-base-uncased-MRPC</sub></sup>
roberta-base-MRPC | semantic textual similarity | sentence pairs | binary (1=similar/0=not similar) | <sub><sup> https://paperswithcode.com/sota/semantic-textual-similarity-on-mrpc </sub></sup> | <sub><sup> https://huggingface.co/textattack/roberta-base-MRPC </sub></sup>
roberta-base-STS-B | semantic textual similarity | sentence pairs | similarity (0.0 to 5.0) | <sub><sup> https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark </sub></sup> | <sub><sup> https://huggingface.co/textattack/roberta-base-STS-B </sub></sup>
xlnet-base-cased-MRPC | semantic textual similarity | sentence pairs | binary (1=similar/0=not similar) | <sub><sup> https://paperswithcode.com/sota/semantic-textual-similarity-on-mrpc </sub></sup> | <sub><sup> https://huggingface.co/textattack/xlnet-base-cased-MRPC </sub></sup>
xlnet-base-cased-STS-B | semantic textual similarity | sentence pairs | similarity (0.0 to 5.0) | <sub><sup> https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark </sub></sup> | <sub><sup> https://huggingface.co/textattack/xlnet-base-cased-STS-B </sub></sup>
albert-base-v2-imdb | sentiment analysis | movie reviews | binary (1=good/0=bad) | none yet | <sub><sup> https://huggingface.co/textattack/albert-base-v2-imdb </sub></sup>
albert-base-v2-rotten-tomatoes | sentiment analysis | movie reviews | binary (1=good/0=bad) | none yet | <sub><sup> https://huggingface.co/textattack/albert-base-v2-rotten-tomatoes </sub></sup>
albert-base-v2-SST-2 | sentiment analysis | phrases | accuracy (0.0000 to 1.0000) | <sub><sup> https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary </sub></sup> | <sub><sup> https://huggingface.co/textattack/albert-base-v2-SST-2 </sub></sup>
albert-base-v2-yelp-polarity | sentiment analysis | yelp reviews | binary (1=good/0=bad) | none yet | <sub><sup> https://huggingface.co/textattack/albert-base-v2-yelp-polarity </sub></sup>
bert-base-uncased-imdb | sentiment analysis | movie reviews | binary (1=good/0=bad) | none yet | <sub><sup> https://huggingface.co/textattack/bert-base-uncased-imdb </sub></sup>
bert-base-uncased-rotten-tomatoes | sentiment analysis | movie reviews | binary (1=good/0=bad) | none yet | <sub><sup> https://huggingface.co/textattack/bert-base-uncased-rotten-tomatoes </sub></sup>
bert-base-uncased-SST-2 | sentiment analysis | phrases | accuracy (0.0000 to 1.0000) | <sub><sup> https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary </sub></sup> | <sub><sup> https://huggingface.co/textattack/bert-base-uncased-SST-2 </sub></sup>
bert-base-uncased-yelp-polarity | sentiment analysis | yelp reviews | binary (1=good/0=bad) | <sub><sup> https://paperswithcode.com/sota/sentiment-analysis-on-yelp-binary </sub></sup> | <sub><sup> https://huggingface.co/textattack/bert-base-uncased-yelp-polarity </sub></sup>
cnn-imdb | sentiment analysis | movie reviews | binary (1=good/0=bad) | <sub><sup> https://paperswithcode.com/sota/sentiment-analysis-on-imdb </sub></sup> | none
cnn-mr | sentiment analysis | movie reviews | binary (1=good/0=bad) | none yet | none
cnn-sst2 | sentiment analysis | phrases | accuracy (0.0000 to 1.0000) | <sub><sup> https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary </sub></sup> | none
cnn-yelp | sentiment analysis | yelp reviews | binary (1=good/0=bad) | <sub><sup> https://paperswithcode.com/sota/sentiment-analysis-on-yelp-binary </sub></sup> | none
distilbert-base-cased-SST-2 | sentiment analysis | phrases | accuracy (0.0000 to 1.0000) | <sub><sup> https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary </sub></sup> | <sub><sup> https://huggingface.co/textattack/distilbert-base-cased-SST-2 </sub></sup>
distilbert-base-uncased-imdb | sentiment analysis | movie reviews | binary (1=good/0=bad) | <sub><sup> https://paperswithcode.com/sota/sentiment-analysis-on-imdb</sub></sup> | <sub><sup> https://huggingface.co/textattack/distilbert-base-uncased-imdb </sub></sup>
distilbert-base-uncased-rotten-tomatoes | sentiment analysis | movie reviews | binary (1=good/0=bad) | none yet | <sub><sup> https://huggingface.co/textattack/distilbert-base-uncased-rotten-tomatoes </sub></sup>
lstm-imdb | sentiment analysis | movie reviews | binary (1=good/0=bad) | <sub><sup> https://paperswithcode.com/sota/sentiment-analysis-on-imdb </sub></sup> | none
lstm-mr | sentiment analysis | movie reviews | binary (1=good/0=bad) | none yet | none
lstm-sst2 | sentiment analysis | phrases | accuracy (0.0000 to 1.0000) | none yet | none
lstm-yelp | sentiment analysis | yelp reviews | binary (1=good/0=bad) | none yet | none
roberta-base-imdb | sentiment analysis | movie reviews | binary (1=good/0=bad) | none yet | <sub><sup> https://huggingface.co/textattack/roberta-base-imdb </sub></sup>
roberta-base-rotten-tomatoes | sentiment analysis | movie reviews | binary (1=good/0=bad) | none yet | <sub><sup> https://huggingface.co/textattack/roberta-base-rotten-tomatoes </sub></sup>
roberta-base-SST-2 | sentiment analysis | phrases | accuracy (0.0000 to 1.0000) | <sub><sup> https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary </sub></sup> | <sub><sup> https://huggingface.co/textattack/roberta-base-SST-2 </sub></sup>
xlnet-base-cased-imdb | sentiment analysis | movie reviews | binary (1=good/0=bad) | none yet | <sub><sup> https://huggingface.co/textattack/xlnet-base-cased-imdb </sub></sup>
xlnet-base-cased-rotten-tomatoes | sentiment analysis | movie reviews | binary (1=good/0=bad) | none yet | <sub><sup> https://huggingface.co/textattack/xlnet-base-cased-rotten-tomatoes </sub></sup>
albert-base-v2-ag-news | text classification | news articles | news category | none yet | <sub><sup> https://huggingface.co/textattack/albert-base-v2-ag-news </sub></sup>
bert-base-uncased-ag-news | text classification | news articles | news category | none yet | <sub><sup> https://huggingface.co/textattack/bert-base-uncased-ag-news </sub></sup>
cnn-ag-news | text classification | news articles | news category | <sub><sup> https://paperswithcode.com/sota/text-classification-on-ag-news </sub></sup> | none
distilbert-base-uncased-ag-news | text classification | news articles | news category | none yet | <sub><sup> https://huggingface.co/textattack/distilbert-base-uncased-ag-news </sub></sup>
lstm-ag-news | text classification | news articles | news category | <sub><sup> https://paperswithcode.com/sota/text-classification-on-ag-news </sub></sup> | none
roberta-base-ag-news | text classification | news articles | news category | none yet | <sub><sup> https://huggingface.co/textattack/roberta-base-ag-news </sub></sup>

BIN
docs/_static/imgs/benchmark/table3.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

BIN
docs/_static/imgs/benchmark/table4.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 200 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 158 KiB

BIN
docs/_static/imgs/benchmark/table7.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 109 KiB

BIN
docs/_static/imgs/benchmark/table9.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 502 KiB

BIN
docs/_static/imgs/overview.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 220 KiB

26
docs/api/attack.rst Normal file
View File

@@ -0,0 +1,26 @@
Attack API Reference
=======================
Attack
------------
Attack is composed of four components:
- `Goal Functions <../attacks/goal_function.html>`__ stipulate the goal of the attack, like to change the prediction score of a classification model, or to change all of the words in a translation output.
- `Constraints <../attacks/constraint.html>`__ determine if a potential perturbation is valid with respect to the original input.
- `Transformations <../attacks/transformation.html>`__ take a text input and transform it by inserting and deleting characters, words, and/or phrases.
- `Search Methods <../attacks/search_method.html>`__ explore the space of possible **transformations** within the defined **constraints** and attempt to find a successful perturbation which satisfies the **goal function**.
The :class:`~textattack.Attack` class represents an adversarial attack composed of a goal function, search method, transformation, and constraints.
.. autoclass:: textattack.Attack
:members:
AttackRecipe
-------------
Attack recipe is a subclass of :class:`~textattack.Attack` class that has a special method :meth:`build` which
returns a pre-built :class:`~textattack.Attack` that correspond to attacks from the literature.
.. autoclass:: textattack.attack_recipes.AttackRecipe
:members:

View File

@@ -0,0 +1,27 @@
Attack Result API Reference
============================
AttackResult
-------------
.. autoclass:: textattack.attack_results.AttackResult
:members:
SuccessfulAttackResult
-----------------------
.. autoclass:: textattack.attack_results.SuccessfulAttackResult
:members:
FailedAttackResult
-----------------------
.. autoclass:: textattack.attack_results.FailedAttackResult
:members:
SkippedAttackResult
-----------------------
.. autoclass:: textattack.attack_results.SkippedAttackResult
:members:
MaximizedAttackResult
-----------------------
.. autoclass:: textattack.attack_results.MaximizedAttackResult
:members:

19
docs/api/attacker.rst Normal file
View File

@@ -0,0 +1,19 @@
Attacker API Reference
=======================
Attacker
-------------
While :class:`~textattack.Attack` is the main class used to carry out the adversarial attack, it is only useful for attacking one example at a time.
It lacks features that support attacking multiple samples in parallel (i.e. multi-GPU), saving checkpoints, or logging results to text file, CSV file, or wandb.
:class:`~textattack.Attacker` provides these features in an easy-to-use API.
.. autoclass:: textattack.Attacker
:members:
AttackArgs
-------------
:class:`~textattack.AttackArgs` represents arguments to be passed to :class:`~textattack.Attacker`, such as number of examples to attack, interval at which to save checkpoints, logging details.
.. autoclass:: textattack.AttackArgs
:members:

13
docs/api/constraints.rst Normal file
View File

@@ -0,0 +1,13 @@
Constraints API Reference
============================
Constraint
------------
.. autoclass:: textattack.constraints.Constraint
:members:
PreTransformationConstraint
-----------------------------
.. autoclass:: textattack.constraints.PreTransformationConstraint
:members:

16
docs/api/datasets.rst Normal file
View File

@@ -0,0 +1,16 @@
Datasets API Reference
=============================
Dataset class define the dataset object used to for carrying out attacks, augmentation, and training.
:class:`~textattack.datasets.Dataset` class is the most basic class that could be used to wrap a list of input and output pairs.
To load datasets from text, CSV, or JSON files, we recommend using 🤗 Datasets library to first
load it as a :obj:`datasets.Dataset` object and then pass it to TextAttack's :class:`~textattack.datasets.HuggingFaceDataset` class.
Dataset
----------
.. autoclass:: textattack.datasets.Dataset
:members: __getitem__, __len__
HuggingFaceDataset
-------------------
.. autoclass:: textattack.datasets.HuggingFaceDataset
:members: __getitem__, __len__

View File

@@ -0,0 +1,46 @@
Goal Functions API Reference
============================
:class:`~textattack.goal_functions.GoalFunction` determines both the conditions under which the attack is successful (in terms of the model outputs)
and the heuristic score that we want to maximize when searching for the solution.
GoalFunction
------------
.. autoclass:: textattack.goal_functions.GoalFunction
:members:
ClassificationGoalFunction
--------------------------
.. autoclass:: textattack.goal_functions.ClassificationGoalFunction
:members:
TargetedClassification
----------------------
.. autoclass:: textattack.goal_functions.TargetedClassification
:members:
UntargetedClassification
------------------------
.. autoclass:: textattack.goal_functions.UntargetedClassification
:members:
InputReduction
--------------
.. autoclass:: textattack.goal_functions.InputReduction
:members:
TextToTextGoalFunction
-----------------------
.. autoclass:: textattack.goal_functions.TextToTextGoalFunction
:members:
MinimizeBleu
-------------
.. autoclass:: textattack.goal_functions.MinimizeBleu
:members:
NonOverlappingOutput
----------------------
.. autoclass:: textattack.goal_functions.NonOverlappingOutput
:members:

View File

@@ -0,0 +1,46 @@
Search Methods API Reference
============================
:class:`~textattack.search_methods.SearchMethod` attempts to find the optimal set of perturbations that will produce an adversarial example.
Finding such optimal perturbations becomes a combinatorial optimization problem, and search methods are typically heuristic search algorithms designed
to solve the underlying combinatorial problem.
More in-depth study of search algorithms for NLP adversarial attacks can be found in the following work
`Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples <https://arxiv.org/abs/2009.06368>`_
by Jin Yong Yoo, John X. Morris, Eli Lifland, and Yanjun Qi.
SearchMethod
------------
.. autoclass:: textattack.search_methods.SearchMethod
:members:
BeamSearch
------------
.. autoclass:: textattack.search_methods.BeamSearch
:members:
GreedySearch
------------
.. autoclass:: textattack.search_methods.GreedySearch
:members:
GreedyWordSwapWIR
------------------
.. autoclass:: textattack.search_methods.GreedyWordSwapWIR
:members:
AlzantotGeneticAlgorithm
-------------------------
.. autoclass:: textattack.search_methods.AlzantotGeneticAlgorithm
:members:
ImprovedGeneticAlgorithm
-------------------------
.. autoclass:: textattack.search_methods.ImprovedGeneticAlgorithm
:members:
ParticleSwarmOptimization
--------------------------
.. autoclass:: textattack.search_methods.ParticleSwarmOptimization
:members:

60
docs/api/trainer.rst Normal file
View File

@@ -0,0 +1,60 @@
Training API Reference
==========================
Trainer
------------
The :class:`~textattack.Trainer` class provides an API for adversarial training with features builtin for standard use cases.
It is designed to be similar to the :obj:`Trainer` class provided by 🤗 Transformers library.
Custom behaviors can be added by subclassing the class and overriding these methods:
- :meth:`training_step`: Peform a single training step. Override this for custom forward pass or custom loss.
- :meth:`evaluate_step`: Peform a single evaluation step. Override this for custom foward pass.
- :meth:`get_train_dataloader`: Creates the PyTorch DataLoader for training. Override this for custom batch setup.
- :meth:`get_eval_dataloader`: Creates the PyTorch DataLoader for evaluation. Override this for custom batch setup.
- :meth:`get_optimizer_and_scheduler`: Creates the optimizer and scheduler for training. Override this for custom optimizer and scheduler.
The pseudocode for how training is done:
.. code-block::
train_preds = []
train_targets = []
for batch in train_dataloader:
loss, preds, targets = training_step(model, tokenizer, batch)
train_preds.append(preds)
train_targets.append(targets)
# clear gradients
optimizer.zero_grad()
# backward
loss.backward()
# update parameters
optimizer.step()
if scheduler:
scheduler.step()
# Calculate training accuracy using `train_preds` and `train_targets`
eval_preds = []
eval_targets = []
for batch in eval_dataloader:
loss, preds, targets = training_step(model, tokenizer, batch)
eval_preds.append(preds)
eval_targets.append(targets)
# Calculate eval accuracy using `eval_preds` and `eval_targets`
.. autoclass:: textattack.Trainer
:members:
TrainingArgs
-------------
Training arguments to be passed to :class:`~textattack.Trainer` class.
.. autoclass:: textattack.TrainingArgs
:members:

View File

@@ -7,6 +7,7 @@ textattack.attack\_recipes package
:show-inheritance:
.. automodule:: textattack.attack_recipes.attack_recipe
:members:
:undoc-members:
@@ -31,6 +32,12 @@ textattack.attack\_recipes package
:show-inheritance:
.. automodule:: textattack.attack_recipes.clare_li_2020
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.attack_recipes.deepwordbug_gao_2018
:members:
:undoc-members:

View File

@@ -1,44 +0,0 @@
textattack.commands.attack package
==================================
.. automodule:: textattack.commands.attack
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.attack.attack_args
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.attack.attack_args_helpers
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.attack.attack_command
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.attack.attack_resume_command
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.attack.run_attack_parallel
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.attack.run_attack_single_threaded
:members:
:undoc-members:
:show-inheritance:

View File

@@ -1,14 +0,0 @@
textattack.commands.eval\_model package
=======================================
.. automodule:: textattack.commands.eval_model
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.eval_model.eval_model_command
:members:
:undoc-members:
:show-inheritance:

View File

@@ -6,37 +6,45 @@ textattack.commands package
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 6
textattack.commands.attack
textattack.commands.eval_model
textattack.commands.train_model
.. automodule:: textattack.commands.augment
.. automodule:: textattack.commands.attack_command
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.benchmark_recipe
.. automodule:: textattack.commands.attack_resume_command
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.list_things
.. automodule:: textattack.commands.augment_command
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.peek_dataset
.. automodule:: textattack.commands.benchmark_recipe_command
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.eval_model_command
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.list_things_command
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.peek_dataset_command
:members:
:undoc-members:
:show-inheritance:
@@ -52,3 +60,9 @@ Subpackages
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.train_model_command
:members:
:undoc-members:
:show-inheritance:

View File

@@ -1,26 +0,0 @@
textattack.commands.train\_model package
========================================
.. automodule:: textattack.commands.train_model
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.train_model.run_training
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.train_model.train_args_helpers
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.commands.train_model.train_model_command
:members:
:undoc-members:
:show-inheritance:

View File

@@ -6,8 +6,7 @@ textattack.constraints.grammaticality.language\_models package
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 6

View File

@@ -6,8 +6,7 @@ textattack.constraints.grammaticality package
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 6

View File

@@ -14,6 +14,12 @@ textattack.constraints.pre\_transformation package
:show-inheritance:
.. automodule:: textattack.constraints.pre_transformation.max_modification_rate
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.constraints.pre_transformation.max_word_index_modification
:members:
:undoc-members:

View File

@@ -6,8 +6,7 @@ textattack.constraints package
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 6

View File

@@ -6,8 +6,7 @@ textattack.constraints.semantics package
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 6

View File

@@ -6,8 +6,7 @@ textattack.constraints.semantics.sentence\_encoders package
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 6

View File

@@ -0,0 +1,14 @@
textattack.datasets.helpers package
===================================
.. automodule:: textattack.datasets.helpers
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.datasets.helpers.ted_multi
:members:
:undoc-members:
:show-inheritance:

View File

@@ -6,13 +6,12 @@ textattack.datasets package
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 6
textattack.datasets.translation
textattack.datasets.helpers

View File

@@ -1,14 +0,0 @@
textattack.datasets.translation package
=======================================
.. automodule:: textattack.datasets.translation
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.datasets.translation.ted_multi
:members:
:undoc-members:
:show-inheritance:

View File

@@ -6,8 +6,7 @@ textattack.goal\_functions package
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 6

View File

@@ -0,0 +1,26 @@
textattack.metrics.attack\_metrics package
==========================================
.. automodule:: textattack.metrics.attack_metrics
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.metrics.attack_metrics.attack_queries
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.metrics.attack_metrics.attack_success_rate
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.metrics.attack_metrics.words_perturbed
:members:
:undoc-members:
:show-inheritance:

View File

@@ -0,0 +1,20 @@
textattack.metrics.quality\_metrics package
===========================================
.. automodule:: textattack.metrics.quality_metrics
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.metrics.quality_metrics.perplexity
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.metrics.quality_metrics.use
:members:
:undoc-members:
:show-inheritance:

View File

@@ -0,0 +1,22 @@
textattack.metrics package
==========================
.. automodule:: textattack.metrics
:members:
:undoc-members:
:show-inheritance:
.. toctree::
:maxdepth: 6
textattack.metrics.attack_metrics
textattack.metrics.quality_metrics
.. automodule:: textattack.metrics.metric
:members:
:undoc-members:
:show-inheritance:

View File

@@ -7,11 +7,6 @@ textattack.models.helpers package
:show-inheritance:
.. automodule:: textattack.models.helpers.bert_for_classification
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.models.helpers.glove_embedding_layer
:members:

View File

@@ -6,8 +6,7 @@ textattack.models package
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 6

View File

@@ -8,12 +8,6 @@ textattack.models.tokenizers package
.. automodule:: textattack.models.tokenizers.auto_tokenizer
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.models.tokenizers.glove_tokenizer
:members:
:undoc-members:

View File

@@ -1,5 +1,5 @@
Complete API Reference
========================
textattack package
==================
.. automodule:: textattack
:members:
@@ -19,7 +19,57 @@ Complete API Reference
textattack.goal_function_results
textattack.goal_functions
textattack.loggers
textattack.metrics
textattack.models
textattack.search_methods
textattack.shared
textattack.transformations
.. automodule:: textattack.attack
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.attack_args
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.attacker
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.augment_args
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.dataset_args
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.model_args
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.trainer
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.training_args
:members:
:undoc-members:
:show-inheritance:

View File

@@ -6,8 +6,7 @@ textattack.shared package
:undoc-members:
:show-inheritance:
Subpackages
-----------
.. toctree::
:maxdepth: 6
@@ -16,12 +15,6 @@ Subpackages
.. automodule:: textattack.shared.attack
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.shared.attacked_text
:members:
:undoc-members:
@@ -46,7 +39,7 @@ Subpackages
:show-inheritance:
.. automodule:: textattack.shared.word_embedding
.. automodule:: textattack.shared.word_embeddings
:members:
:undoc-members:
:show-inheritance:

View File

@@ -8,6 +8,12 @@ textattack.shared.utils package
.. automodule:: textattack.shared.utils.importing
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.shared.utils.install
:members:
:undoc-members:

View File

@@ -8,18 +8,21 @@ textattack.transformations package
.. toctree::
:maxdepth: 6
textattack.transformations.word_insertions
textattack.transformations.word_merges
textattack.transformations.word_swaps
.. automodule:: textattack.transformations.composite_transformation
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.random_synonym_insertion
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.transformation
:members:
:undoc-members:
@@ -32,115 +35,7 @@ textattack.transformations package
:show-inheritance:
.. automodule:: textattack.transformations.word_swap
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_change_location
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_change_name
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_change_number
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_contract
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_embedding
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_extend
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_gradient_based
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_homoglyph_swap
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_hownet
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_inflections
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_masked_lm
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_neighboring_character_swap
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_qwerty
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_random_character_deletion
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_random_character_insertion
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_random_character_substitution
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_random_word
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swap_wordnet
.. automodule:: textattack.transformations.word_innerswap_random
:members:
:undoc-members:
:show-inheritance:

View File

@@ -0,0 +1,26 @@
textattack.transformations.word\_insertions package
===================================================
.. automodule:: textattack.transformations.word_insertions
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_insertions.word_insertion
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_insertions.word_insertion_masked_lm
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_insertions.word_insertion_random_synonym
:members:
:undoc-members:
:show-inheritance:

View File

@@ -0,0 +1,20 @@
textattack.transformations.word\_merges package
===============================================
.. automodule:: textattack.transformations.word_merges
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_merges.word_merge
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_merges.word_merge_masked_lm
:members:
:undoc-members:
:show-inheritance:

View File

@@ -0,0 +1,116 @@
textattack.transformations.word\_swaps package
==============================================
.. automodule:: textattack.transformations.word_swaps
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_change_location
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_change_name
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_change_number
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_contract
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_embedding
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_extend
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_gradient_based
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_homoglyph_swap
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_hownet
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_inflections
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_masked_lm
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_neighboring_character_swap
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_qwerty
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_random_character_deletion
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_random_character_insertion
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_random_character_substitution
:members:
:undoc-members:
:show-inheritance:
.. automodule:: textattack.transformations.word_swaps.word_swap_wordnet
:members:
:undoc-members:
:show-inheritance:

View File

@@ -21,7 +21,7 @@ copyright = "2020, UVA QData Lab"
author = "UVA QData Lab"
# The full version, including alpha/beta/rc tags
release = "0.2.14"
release = "0.3.3"
# Set master doc to `index.rst`.
master_doc = "index"
@@ -43,6 +43,8 @@ extensions = [
"nbsphinx",
# Enable .md doc files
"recommonmark",
"sphinx_markdown_tables",
"IPython.sphinxext.ipython_console_highlighting",
]
autosummary_generate = True
@@ -80,6 +82,10 @@ html_css_files = [
"css/custom.css",
]
html_sidebars = {
"**": ["globaltoc.html", "relations.html", "sourcelink.html", "searchbox.html"]
}
# Path to favicon.
html_favicon = "favicon.png"

5
docs/environment.yml Normal file
View File

@@ -0,0 +1,5 @@
name: textattackenv
channels:
- defaults
dependencies:
- python=3.7

View File

@@ -4,23 +4,37 @@ TextAttack Documentation
.. toctree::
:maxdepth: 6
:caption: About
:caption: Get Started
Basic-Introduction <0_get_started/basic-Intro.rst>
Installation <0_get_started/installation.md>
Command-Line Usage <0_get_started/command_line_usage.md>
Quick API Usage <0_get_started/quick_api_tour.rst>
FAQ <1start/FAQ.md>
.. toctree::
:maxdepth: 6
:caption: Recipes
3recipes/attack_recipes_cmd.md
3recipes/attack_recipes.rst
3recipes/augmenter_recipes_cmd.md
3recipes/augmenter_recipes.rst
3recipes/models.md
.. toctree::
:maxdepth: 6
:caption: Using TextAttack
1start/basic-Intro.rst
1start/what_is_an_adversarial_attack.md
1start/references.md
1start/attacks4Components.md
1start/benchmark-search.md
3recipes/models.md
1start/FAQ.md
.. toctree::
:maxdepth: 6
:caption: Get Started
Installation <1start/installation>
Command-Line Usage <1start/command_line_usage.md>
1start/quality-SOTA-recipes.md
1start/api-design-tips.md
1start/multilingual-visualization.md
1start/support.md
.. toctree::
:maxdepth: 6
@@ -30,22 +44,31 @@ TextAttack Documentation
Tutorial 1: Transformations <2notebook/1_Introduction_and_Transformations.ipynb>
Tutorial 2: Constraints <2notebook/2_Constraints.ipynb>
Tutorial 3: Augmentation <2notebook/3_Augmentations.ipynb>
Tutorial 4: Attacking TensorFlow models <2notebook/Example_0_tensorflow.ipynb>
Tutorial 5: Attacking scikit-learn models <2notebook/Example_1_sklearn.ipynb>
Tutorial 6: Attacking AllenNLP models <2notebook/Example_2_allennlp.ipynb>
Tutorial 7: Attacking multilingual models <2notebook/Example_4_CamemBERT.ipynb>
Tutorial 8: Explaining Attacking BERT model using Captum <2notebook/Example_5_Explain_BERT.ipynb>
Tutorial 4: Custom Word Embeddings <2notebook/4_Custom_Datasets_Word_Embedding.ipynb>
Tutorial 5: Attacking TensorFlow models <2notebook/Example_0_tensorflow.ipynb>
Tutorial 6: Attacking scikit-learn models <2notebook/Example_1_sklearn.ipynb>
Tutorial 7: Attacking AllenNLP models <2notebook/Example_2_allennlp.ipynb>
Tutorial 8: Attacking Keras models <2notebook/Example_3_Keras.ipynb>
Tutorial 9: Attacking multilingual models <2notebook/Example_4_CamemBERT.ipynb>
Tutorial10: Explaining Attacking BERT model using Captum <2notebook/Example_5_Explain_BERT.ipynb>
.. toctree::
:maxdepth: 6
:caption: API User Guide
Attack <api/attack.rst>
Attacker <api/attacker.rst>
AttackResult <api/attack_results.rst>
Trainer <api/trainer.rst>
Datasets <api/datasets.rst>
GoalFunction <api/goal_functions.rst>
Constraints <api/constraints.rst>
SearchMethod <api/search_methods.rst>
.. toctree::
:maxdepth: 6
:glob:
:caption: Developer Guide
1start/support.md
1start/api-design-tips.md
3recipes/attack_recipes
3recipes/augmenter_recipes
:caption: Full Reference
apidoc/textattack

View File

@@ -1,4 +1,35 @@
recommonmark
nbsphinx
sphinx - autobuild
sphinx - rtd - theme
recommonmark==0.7.1
Sphinx==4.1.2
sphinx-autobuild==2021.3.14
sphinx-markdown-tables==0.0.15
sphinx-rtd-theme==0.5.2
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
nbclient==0.5.4
nbconvert==6.1.0
nbformat==5.1.3
nbsphinx==0.8.7
widgetsnbextension==3.5.1
ipykernel==6.4.1
ipython==7.27.0
ipython-genutils==0.2.0
ipywidgets==7.6.4
scipy==1.7.1
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorboardX==2.4
tensorflow==2.6.0
tensorflow-estimator==2.6.0
tensorflow-hub==0.12.0
tensorflow-text==2.6.0
sentence-transformers==2.0.0
transformers==4.10.1
textattack==0.3.3
sqlitedict==1.7.0
stanza==1.2.3
Cython==0.29.24

View File

@@ -4,6 +4,7 @@ import os
import numpy as np
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification, pipeline
from textattack import Attacker
from textattack.attack_recipes import PWWSRen2019
from textattack.datasets import HuggingFaceDataset
from textattack.models.wrappers import ModelWrapper
@@ -20,11 +21,11 @@ class HuggingFaceSentimentAnalysisPipelineWrapper(ModelWrapper):
[[0.218262017, 0.7817379832267761]
"""
def __init__(self, pipeline):
self.pipeline = pipeline
def __init__(self, model):
self.model = model
def __call__(self, text_inputs):
raw_outputs = self.pipeline(text_inputs)
raw_outputs = self.model(text_inputs)
outputs = []
for output in raw_outputs:
score = output["score"]
@@ -55,7 +56,6 @@ recipe = PWWSRen2019.build(model_wrapper)
recipe.transformation.language = "fra"
dataset = HuggingFaceDataset("allocine", split="test")
for idx, result in enumerate(recipe.attack_dataset(dataset)):
print(("-" * 20), f"Result {idx+1}", ("-" * 20))
print(result.__str__(color_method="ansi"))
print()
attacker = Attacker(recipe, dataset)
results = attacker.attack_dataset()

View File

@@ -3,5 +3,5 @@
# model on the Yelp dataset.
textattack attack --attack-n --goal-function untargeted-classification \
--model bert-base-uncased-yelp --num-examples 8 --transformation word-swap-wordnet \
--constraints edit-distance^12 max-words-perturbed:max_percent=0.75 repeat stopword \
--constraints edit-distance^12 max-words-perturbed^max_percent=0.75 repeat stopword \
--search greedy

View File

@@ -0,0 +1,142 @@
"""
Recent upgrade of keras versions in TF 2.5+, keras has been moved to tf.keras
This has resulted in certain exceptions when keras models are attacked in parallel
This script fixes this behavior by adding an official hotfix for this situation detailed here:
https://github.com/tensorflow/tensorflow/issues/34697
All models/dataset are similar to keras attack tutorial at :
https://textattack.readthedocs.io/en/latest/2notebook/Example_3_Keras.html#
NOTE: This fix might be deprecated in future TF releases
NOTE: This script is not designed to run in a Jupyter notebook due to conflicting namespace issues
We recommend running it as a script only
"""
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.python.keras.layers import deserialize, serialize
from tensorflow.python.keras.saving import saving_utils
import torch
from textattack import AttackArgs, Attacker
from textattack.attack_recipes import PWWSRen2019
from textattack.datasets import HuggingFaceDataset
from textattack.models.wrappers import ModelWrapper
NUM_WORDS = 1000
def unpack(model, training_config, weights):
restored_model = deserialize(model)
if training_config is not None:
restored_model.compile(
**saving_utils.compile_args_from_training_config(training_config)
)
restored_model.set_weights(weights)
return restored_model
# Hotfix function
def make_keras_picklable():
def __reduce__(self):
model_metadata = saving_utils.model_metadata(self)
training_config = model_metadata.get("training_config", None)
model = serialize(self)
weights = self.get_weights()
return (unpack, (model, training_config, weights))
cls = Model
cls.__reduce__ = __reduce__
# Run the function
make_keras_picklable()
def transform(x):
x_transform = []
for i, word_indices in enumerate(x):
BoW_array = np.zeros((NUM_WORDS,))
for index in word_indices:
if index < len(BoW_array):
BoW_array[index] += 1
x_transform.append(BoW_array)
return np.array(x_transform)
class CustomKerasModelWrapper(ModelWrapper):
def __init__(self, model):
self.model = model
def __call__(self, text_input_list):
x_transform = []
for i, review in enumerate(text_input_list):
tokens = [x.strip(",") for x in review.split()]
BoW_array = np.zeros((NUM_WORDS,))
for word in tokens:
if word in vocabulary:
if vocabulary[word] < len(BoW_array):
BoW_array[vocabulary[word]] += 1
x_transform.append(BoW_array)
x_transform = np.array(x_transform)
prediction = self.model.predict(x_transform)
return prediction
model = Sequential()
model.add(Dense(512, activation="relu", input_dim=NUM_WORDS))
model.add(Dropout(0.3))
model.add(Dense(100, activation="relu"))
model.add(Dense(2, activation="sigmoid"))
opt = tf.keras.optimizers.Adam(learning_rate=0.00001)
model.compile(optimizer=opt, loss="binary_crossentropy", metrics=["accuracy"])
(x_train_tokens, y_train), (x_test_tokens, y_test) = tf.keras.datasets.imdb.load_data(
path="imdb.npz",
num_words=NUM_WORDS,
skip_top=0,
maxlen=None,
seed=113,
start_char=1,
oov_char=2,
index_from=3,
)
index = int(0.9 * len(x_train_tokens))
x_train = transform(x_train_tokens)[:index]
x_test = transform(x_test_tokens)[index:]
y_train = np.array(y_train[:index])
y_test = np.array(y_test[index:])
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
vocabulary = tf.keras.datasets.imdb.get_word_index(path="imdb_word_index.json")
results = model.fit(
x_train, y_train, epochs=1, batch_size=512, validation_data=(x_test, y_test)
)
if __name__ == "__main__":
torch.multiprocessing.freeze_support()
model_wrapper = CustomKerasModelWrapper(model)
dataset = HuggingFaceDataset("rotten_tomatoes", None, "test", shuffle=True)
attack = PWWSRen2019.build(model_wrapper)
attack_args = AttackArgs(
num_examples=10,
checkpoint_dir="checkpoints",
parallel=True,
num_workers_per_device=2,
)
attacker = Attacker(attack, dataset, attack_args)
attacker.attack_dataset()

View File

@@ -1,11 +1,11 @@
text,label
"the rock is destined to be the 21st century's novel conan and that he's go to make a splash yet greater than arnold schwarzenegger , jean- claud van damme or steven segal.",1
"the rock is destined to be the 21st century's novo conan and that he's going to make a splash yet greater than arnold schwarzenegger , jean- claud van damme or stephens segal.",1
the gorgeously elaborate continuation of 'the lord of the rings' triad is so massive that a column of words cannot adequately describe co-writer/director pete jackson's expanded vision of j . r . r . tolkien's middle-earth .,1
the gorgeously elaborate continuation of 'the lordy of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/superintendent peter jackson's enlargements vision of j . r . r . tolkien's middle-earth .,1
take care of my cat offers a cheerfully different slice of asian cinema .,1
take care of my cat offers a refreshingly different slice of asian cinemas .,1
a technically well-made suspenser . . . but its abrupt fall in iq points as it races to the finish line demonstrating simply too discouraging to let slide .,0
a technologically well-made suspenser . . . but its abrupt dip in iq points as it races to the finish line proves simply too discouraging to let slide .,0
it's a mystery how the cinematography could be released in this condition .,0
it's a mystery how the movies could be released in this condition .,0
"the rock is destined to be the new conan and that he's going to make a splash even greater than arnold , jean- claud van damme or steven segal.",1
"the rock is destined to be the 21st century's new conan and that he's going to caravan make a splash even greater than arnold schwarzenegger , jean- claud van damme or steven segal.",1
the gorgeously rarify continuation of 'the lord of the rings' trilogy is so huge that a column of give-and-take cannot adequately describe co-writer/director shaft jackson's expanded vision of j . r . r . tolkien's middle-earth .,1
the gorgeously elaborate of 'the of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded of j . r . r . tolkien's middle-earth .,1
take care different my cat offers a refreshingly of slice of asian cinema .,1
take care of my cat offers a different slice of asian cinema .,1
a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish IT line proves simply too discouraging to let slide .,0
a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish demarcation proves plainly too discouraging to let slide .,0
it's pic a mystery how the movie could be released in this condition .,0
it's a mystery how the movie could in released be this condition .,0
1 text label
2 the rock is destined to be the 21st century's novel conan and that he's go to make a splash yet greater than arnold schwarzenegger , jean- claud van damme or steven segal. the rock is destined to be the new conan and that he's going to make a splash even greater than arnold , jean- claud van damme or steven segal. 1
3 the rock is destined to be the 21st century's novo conan and that he's going to make a splash yet greater than arnold schwarzenegger , jean- claud van damme or stephens segal. the rock is destined to be the 21st century's new conan and that he's going to caravan make a splash even greater than arnold schwarzenegger , jean- claud van damme or steven segal. 1
4 the gorgeously elaborate continuation of 'the lord of the rings' triad is so massive that a column of words cannot adequately describe co-writer/director pete jackson's expanded vision of j . r . r . tolkien's middle-earth . the gorgeously rarify continuation of 'the lord of the rings' trilogy is so huge that a column of give-and-take cannot adequately describe co-writer/director shaft jackson's expanded vision of j . r . r . tolkien's middle-earth . 1
5 the gorgeously elaborate continuation of 'the lordy of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/superintendent peter jackson's enlargements vision of j . r . r . tolkien's middle-earth . the gorgeously elaborate of 'the of the rings' trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded of j . r . r . tolkien's middle-earth . 1
6 take care of my cat offers a cheerfully different slice of asian cinema . take care different my cat offers a refreshingly of slice of asian cinema . 1
7 take care of my cat offers a refreshingly different slice of asian cinemas . take care of my cat offers a different slice of asian cinema . 1
8 a technically well-made suspenser . . . but its abrupt fall in iq points as it races to the finish line demonstrating simply too discouraging to let slide . a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish IT line proves simply too discouraging to let slide . 0
9 a technologically well-made suspenser . . . but its abrupt dip in iq points as it races to the finish line proves simply too discouraging to let slide . a technically well-made suspenser . . . but its abrupt drop in iq points as it races to the finish demarcation proves plainly too discouraging to let slide . 0
10 it's a mystery how the cinematography could be released in this condition . it's pic a mystery how the movie could be released in this condition . 0
11 it's a mystery how the movies could be released in this condition . it's a mystery how the movie could in released be this condition . 0

View File

@@ -1,2 +1,2 @@
#!/bin/bash
textattack augment --csv examples.csv --input-column text --recipe eda --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original --overwrite
textattack augment --input-csv examples.csv --output-csv output.csv --input-column text --recipe eda --pct-words-to-swap .1 --transformations-per-example 2 --exclude-original --overwrite

View File

@@ -1,2 +0,0 @@
"text",label
"it's a mystery how the movie could be released in this condition .", 0
1 text label
2 it's a mystery how the movie could be released in this condition . 0

View File

@@ -2,4 +2,4 @@
# Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a
# demonstration of how our training script can handle different `transformers`
# models and customize for different datasets.
textattack train --model albert-base-v2 --dataset snli --batch-size 128 --epochs 5 --max-length 128 --learning-rate 1e-5 --allowed-labels 0 1 2
textattack train --model-name-or-path albert-base-v2 --dataset snli --per-device-train-batch-size 8 --epochs 5 --learning-rate 1e-5

View File

@@ -1,4 +1,4 @@
#!/bin/bash
# Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a demonstration
# of how our training script handles regression.
textattack train --model bert-base-cased --dataset glue^stsb --batch-size 128 --epochs 3 --max-length 128 --learning-rate 1e-5
textattack train --model-name-or-path bert-base-cased --dataset glue^stsb --epochs 3 --learning-rate 1e-5

View File

@@ -0,0 +1,4 @@
#!/bin/bash
# Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a basic
# demonstration of our training script and `datasets` integration.
textattack train --model-name-or-path lstm --dataset imdb --epochs 50 --learning-rate 1e-5

View File

@@ -1,4 +1,4 @@
#!/bin/bash
# Trains `bert-base-cased` on the STS-B task for 3 epochs. This is a basic
# demonstration of our training script and `datasets` integration.
textattack train --model lstm --dataset rotten_romatoes --batch-size 64 --epochs 50 --learning-rate 1e-5
textattack train --model-name-or-path lstm --dataset rotten_tomatoes --epochs 50 --learning-rate 1e-5

View File

@@ -1,19 +1,18 @@
bert-score>=0.3.5
editdistance
flair==0.6.1.post1
flair
filelock
language_tool_python
lemminflect
lru-dict
datasets
nltk
numpy<1.19.0 #TF 2.0 requires this
numpy>=1.19.2
pandas>=1.0.1
scipy==1.4.1
torch
transformers==3.3.0
scipy>=1.4.1
torch>=1.7.0,!=1.8
transformers>=3.3.0
terminaltables
tokenizers==0.8.1-rc2
tqdm>=4.27,<4.50.0
word2number
num2words

View File

@@ -8,7 +8,13 @@ with open("README.md", "r", encoding="utf8") as fh:
extras = {}
# Packages required for installing docs.
extras["docs"] = ["recommonmark", "nbsphinx", "sphinx-autobuild", "sphinx-rtd-theme"]
extras["docs"] = [
"recommonmark",
"nbsphinx",
"sphinx-autobuild",
"sphinx-rtd-theme",
"sphinx-markdown-tables",
]
# Packages required for formatting code & running tests.
extras["test"] = [
"black==20.8b1",

View File

@@ -6,6 +6,4 @@ def Attack(model):
search_method = textattack.search_methods.GreedyWordSwapWIR()
transformation = textattack.transformations.WordSwapRandomCharacterSubstitution()
constraints = []
return textattack.shared.Attack(
goal_function, constraints, transformation, search_method
)
return textattack.Attack(goal_function, constraints, transformation, search_method)

Some files were not shown because too many files have changed in this diff Show More