152 Commits

Author SHA1 Message Date
generatedunixname89002005287564
71bb0d1de3 Pyre Configurationless migration for] [batch:19/29]
Reviewed By: connernilsen

Differential Revision: D56349111

fbshipit-source-id: 62c7beae354013bfe3a0e1a514eef86ede430395
2024-04-19 08:14:38 -07:00
Carl Parker
d6b507814c Set the execute bit on the download.sh script (Llama-Guard2) (#26)
Summary:
The new download.sh script for Llama Guard 2 should have the execute bit set.

Pull Request resolved: https://github.com/meta-llama/PurpleLlama/pull/26

Reviewed By: varunfb

Differential Revision: D56312443

Pulled By: litesaber15

fbshipit-source-id: f9dd03b359a524fce38b84f973833246dced4a18
2024-04-18 11:10:46 -07:00
Yue Li
8a39557052 small fix to example notebook
Summary: ^

Reviewed By: csahana95

Differential Revision: D56311355

fbshipit-source-id: d4bcee0badecafa4762f43557d0ed663ef8de70b
2024-04-18 10:51:10 -07:00
Kartikeya Upasani
74b9b09229 Update README.md (#25)
Summary:
Fix typo

Pull Request resolved: https://github.com/meta-llama/PurpleLlama/pull/25

Reviewed By: litesaber15, SimonWan

Differential Revision: D56310279

Pulled By: Darktex

fbshipit-source-id: cba114d9b25625828630e0ecce664de4e87d99f9
2024-04-18 10:48:30 -07:00
Shengye Wan
f81aea914d CyberSecEval V2: first update
Summary: Update link for the new paper

Reviewed By: YueLi28

Differential Revision: D56308610

fbshipit-source-id: eefee9149cbc3455e9980c8152b30f311dbdc5ca
2024-04-18 10:01:38 -07:00
Yue Li
59c6ead904 launch CodeShield to OSS
Summary: copied codeshield to be under purple llama

Reviewed By: tryrobbo

Differential Revision: D56303446

fbshipit-source-id: 64cf2003434b495409688603efccea464858855f
2024-04-18 08:47:58 -07:00
Kartikeya Upasani
aaa0c57137 Make more edits
Summary: Thanks ujjwalkarn for the feedback

Reviewed By: JFChi

Differential Revision: D56305410

fbshipit-source-id: 08ca27d020df0341cb10b7dac390a080d0363a90
2024-04-18 08:24:05 -07:00
Yue Li
49dfce8679 fix link in README
Summary: as title

Reviewed By: tryrobbo

Differential Revision: D56303723

fbshipit-source-id: 0fc6b1879c926052f9b0b303edcb6bda984aaf1e
2024-04-18 08:06:55 -07:00
Kartikeya Upasani
e70e2661cc Fix download.sh path
Summary: This is xingjia01's work, just checking it into the repo :)

Reviewed By: JFChi

Differential Revision: D56287914

fbshipit-source-id: 11293ac1854f64f90e3862bebdc662c4cb0cfea2
2024-04-18 07:31:21 -07:00
Kartikeya Upasani
d5bd57b390 Update PurpleLlama and Llama Guard 2 GitHub repo internal copy
Summary: ^

Reviewed By: JFChi

Differential Revision: D56283492

fbshipit-source-id: 0a1d4139a56922aa00d5259514c1d4637fc0181e
2024-04-17 20:12:35 -07:00
Yue Li
dc094f0818 manually merge purplellama github with intern
Summary:
There were 2 PRs that were merged to purplellama but failed to syncup back to internal.

Now they have diverged. This diff manually merged the 2 versions

Reviewed By: SimonWan

Differential Revision: D56281518

fbshipit-source-id: 0b47c5a0b0a2689aa79ac73b33fcc26a634b08c7
2024-04-17 19:00:21 -07:00
Yue Li
adfd4b36df update read me to add python requirement
Summary: as title

Reviewed By: SimonWan

Differential Revision: D56279838

fbshipit-source-id: 74a0863c2f62492bcc30cdbba422bf751840ff83
2024-04-17 18:00:56 -07:00
Shengye Wan
d2c2c52c6b query with system prompt fix
Summary:
Current method will try to look for the os variable
```
Did not find together_api_key, please add an environment variable TOGETHER_API_KEY which contains it, or pass  together_api_key as a named parameter. (type=value_error). Sleeping for 1.0 seconds...
```

I followed [this quick start](https://docs.together.ai/docs/quickstart) to update the function so the prompt injection can run as expected.

Reviewed By: YueLi28

Differential Revision: D56274355

fbshipit-source-id: 9682d9517141ee722ad8059be09dcd09c96a1661
2024-04-17 16:12:57 -07:00
Daniel Song
c1c483de9e Scoring mechanism change
Summary: If it's build failure, it's 0.

Reviewed By: mbhatt1

Differential Revision: D56270710

fbshipit-source-id: 1c580dee07a4e0fe6d5795c9020be55a65147e4e
2024-04-17 15:34:17 -07:00
Yue Li
bd7da82592 more fixes in readme
Summary: as title

Reviewed By: csahana95

Differential Revision: D56263380

fbshipit-source-id: 9f46bd6f302a88cd5bbcb6521caced8bbd658eca
2024-04-17 13:05:33 -07:00
Daniel Song
3d3ffa44a5 Simple hack removal
Summary: Skipped gemini and mistral score for speed up. It should have been a local commit.

Reviewed By: DhavalKapil

Differential Revision: D56258601

fbshipit-source-id: fb752b08d03f8a4ec4006c505576437447dd8dd1
2024-04-17 12:14:13 -07:00
Shengye Wan
e3669164e3 CyberSecEval README small updates
Summary: Unify titles' style and fix one path.

Reviewed By: csahana95

Differential Revision: D56244469

fbshipit-source-id: b25f86483a688972a914aeebdc7ee6c5a9a59107
2024-04-17 09:30:19 -07:00
generatedunixname89002005287564
1df6f1ca0c Pyre Configurationless migration for] [batch:19/29]
Reviewed By: connernilsen

Differential Revision: D56235925

fbshipit-source-id: 5011318ae162b532dc0bc66d73a37b64c4862c74
2024-04-17 08:20:16 -07:00
Yue Li
b04b2901f8 add a link to CodeShield in ICD v1
Summary: As required, we are adding a link from ICD v1 to link to codeshield to avoid confusions

Reviewed By: csahana95

Differential Revision: D56229122

fbshipit-source-id: 4e95dd6db459855d8f99cc94d54d692394e0a4d0
2024-04-17 00:13:45 -07:00
Yue Li
aee689942d remove numpy from requirement as it has issues with python3.12
Summary: As title, numpy is not really needed, and is introducing compatibility issues with python 3.12

Reviewed By: csahana95

Differential Revision: D56229050

fbshipit-source-id: ed8e8553919651dbf6e6b5c1f310489f7d647e46
2024-04-17 00:13:45 -07:00
Daniel Song
b11f50fb6a README update
Summary: Adding explanation for canary exploits

Reviewed By: YueLi28

Differential Revision: D56227184

fbshipit-source-id: 46ced4fd93fa0d50509c7bf3b75e799e1a5c4994
2024-04-16 21:30:56 -07:00
Dhaval Kapil
7aab8b49ad Update imports treating genai as 'root' and also updates readme
Summary: As title

Reviewed By: SimonWan, YueLi28

Differential Revision: D56218818

fbshipit-source-id: 920475e09289d384ccefe629c38f87efe00b9ff1
2024-04-16 18:41:31 -07:00
Shengye Wan
f9aac12ebc Update README for interpreter and FRR benchmarks
Summary: As titled.

Reviewed By: csahana95

Differential Revision: D56211594

fbshipit-source-id: 42d6dc7587507be403e134e28b193195b97b5908
2024-04-16 16:30:02 -07:00
Manish Bhatt
683336b587 get rid of generation metadata
Summary:
these fields were used for generation, get rid of them.

 {F1488822137}

Reviewed By: YueLi28

Differential Revision: D56214992

fbshipit-source-id: b532ad9229c73181b56d32b9cf6eabe8a5ba01d4
2024-04-16 15:29:32 -07:00
Shengye Wan
ea6dc4dfe7 simplify dir name for README
Summary: As titled.

Reviewed By: mbhatt1

Differential Revision: D56208078

fbshipit-source-id: 37d32191fc267570cdcb8253c4e625a1a3c500c9
2024-04-16 14:37:27 -07:00
Daniel Song
e45cdf302b End-to-end Canary Exploit Pipeline
Summary: Merge prompting and scoring into one pipeline. Now a single command can generate reponses as well as get scores and stats

Reviewed By: YueLi28

Differential Revision: D56208736

fbshipit-source-id: 7ca2382fe823852f3a68232f1a4fac286a838f74
2024-04-16 14:04:40 -07:00
Daniel Song
75db4248d6 Verify the results
Summary: Run the challenge program with reponse from llms as input

Reviewed By: YueLi28

Differential Revision: D56208735

fbshipit-source-id: 6b98c17cc0ed0a4df1dc894aa51980d0501eda31
2024-04-16 14:04:40 -07:00
Daniel Song
20be98c9ff Buffer overflow generator
Summary: Added Josh's c buffer overflow test generator

Reviewed By: YueLi28

Differential Revision: D56204968

fbshipit-source-id: e1faaa96afe54bd4e36a66e3a75c06668ce5a4bf
2024-04-16 14:04:40 -07:00
Cyrus Nikolaidis
dc01c01e42 Last minute improved test cases
Summary:
Changes:

- Generally make system prompts more "watertight" as absolute rules rather than general direction.
- Make some examples focus much more on "application logic" instead of just "do not talk about X"
- Prioritize more scary/security relevant prompts in the file, and add several more of these throuought.
- Clean up some of the judge prompts so they are more watertight in terms of checking whether the injection was successful.

Reviewed By: SimonWan, csahana95

Differential Revision: D56193957

fbshipit-source-id: 2828f021fc0740ebc206caf0bccd58ebcaa50cc1
2024-04-16 13:14:22 -07:00
Shengye Wan
056f7dc553 Move interpreter to public
Summary: As titled.

Reviewed By: YueLi28

Differential Revision: D56190763

fbshipit-source-id: ec0e4aebc1768e69bb9e2636766a488db34c1114
2024-04-16 09:00:56 -07:00
Yue Li
c7983f4f72 prompt injection readme
Reviewed By: csahana95

Differential Revision: D56150545

fbshipit-source-id: aeeebeeb9269fdb76b3e684e915eec16164c5cdf
2024-04-16 00:00:57 -07:00
Sahana CB
905fd6a686 Fix broken ICD setup in OSS
Summary: Patches the same fix from D56081967.

Reviewed By: SimonWan

Differential Revision: D56170555

fbshipit-source-id: 7599f465332b67755656a703e69618b7422be5a8
2024-04-15 18:57:50 -07:00
Shengye Wan
53551ea58a FRR: unify the refusal judge
Summary: The previous code committed two functions for judging refusal, but we only want the updated version, as this diff does.

Reviewed By: mbhatt1, YueLi28

Differential Revision: D56168296

fbshipit-source-id: be929e2c3e64039c6eb87b0409226d11fe595a80
2024-04-15 18:18:54 -07:00
Sahana CB
6dbd5b59aa Extend the LLM app with access to sensitive db case to indirect prompt injection
Summary: - Adds few more test cases simulating bank app bot having access to user db

Reviewed By: cynikolai

Differential Revision: D56132094

fbshipit-source-id: 6da1060339969c3e9b3e569c12009176453645b1
2024-04-15 09:09:00 -07:00
Yue Li
3460f4306a fix a few coemstic things
Summary:
As title
Keeping sending out small fixes / changes

Reviewed By: csahana95

Differential Revision: D56118331

fbshipit-source-id: da10e2d8b533d732c8cf647ccfa431da2c38ad7d
2024-04-15 00:54:40 -07:00
Sahana CB
7c2e3f7eaf Adding case simulating LLM app with access to sensitive user data through db access
Summary: - Adds few more test cases simulating bank app bot having access to user db

Reviewed By: cynikolai

Differential Revision: D56123874

fbshipit-source-id: 450f71a078f6c744034eedfa67f6bc845fb973c7
2024-04-14 19:42:13 -07:00
Sahana CB
83dbcf6873 Add hidden ASCII type test cases for prompt injection
Summary:
- Adds another injection technique - hidden ASCII tokens. More details here - https://embracethered.com/blog/posts/2024/hiding-and-finding-text-with-unicode-tags/
- The test cases have hidden payloads which are invisible

Reviewed By: cynikolai

Differential Revision: D56123172

fbshipit-source-id: 2f9adcb430d6d04ee2130e13f9d703cf4ed0eb53
2024-04-14 19:42:13 -07:00
Cyrus Nikolaidis
9755947bb0 Add System prompts for together and anyscale
Summary: Needed for running the prompt injections properly in the open-source context

Reviewed By: csahana95

Differential Revision: D56119213

fbshipit-source-id: 0667e8416b5f7e529e8257e40e4ac57c521f57d2
2024-04-14 13:14:12 -07:00
Cyrus Nikolaidis
cacd5a565c Add per-risk category stat breakdown
Summary:
- Changes risk categories concrete/abstract => security violating and logic violating

- Adds the logic to aggregate the breakdown into the JSON

Reviewed By: SimonWan

Differential Revision: D56118456

fbshipit-source-id: c458154cf90b141910e683cdc98de1b9e208bbc1
2024-04-14 09:52:07 -07:00
Dhaval Kapil
a0ed23ef4c Handle multiple roots generated in main()
Summary:
We were facing issues like:

```
/tmp/tmpav2irf2b/source.c:145:44: error: ‘rd_3’ was not declared in this scope
  145 |     SE_TARGET_STATE(!parse_content(content,rd_3));
      |                                            ^~~~
```

This happened when we had a string + reader being used in main. finalize() only returned the former. Now, we return a list and handle both of them.

Reviewed By: joshsaxe

Differential Revision: D56090345

fbshipit-source-id: 00a6dead5369d1770202a8a8bdd9cfdc78361e06
2024-04-12 19:21:34 -07:00
Yue Li
3d0c4aed07 move canary exploit to external
Summary: Move canary exploit to external

Reviewed By: dwjsong

Differential Revision: D56071128

fbshipit-source-id: 2a948e1017b35843e9a829e4d1eaea4dc1abda53
2024-04-12 19:13:01 -07:00
Manish Bhatt
955e26bd73 Bump retries to 100
Summary:
Sometimes things fail because of retries. Bumping to a 100

Created from CodeHub with https://fburl.com/edit-in-codehub

Reviewed By: YueLi28

Differential Revision: D56082197

fbshipit-source-id: 16ef29bb5b43ce9e22fabff790d055463d3ac22d
2024-04-12 15:06:24 -07:00
Yue Li
4d65ae2396 move prompt injection benchmark and data to outside of internal
Summary: As title, we are moving internal benchmark and data out

Reviewed By: cynikolai

Differential Revision: D56040506

fbshipit-source-id: 02318fd5b5792ef9c25d9b7ec594d68ffcdbf449
2024-04-12 13:55:36 -07:00
Manish Bhatt
5d8bfa9988 Bugfix:- interpreter tests has attack_type
Summary:
So adding a field in the test_case_fields

Created from CodeHub with https://fburl.com/edit-in-codehub

Reviewed By: dwjsong, csahana95

Differential Revision: D56050013

fbshipit-source-id: 445023978e898db38c9d56dd58601424b72cb610
2024-04-11 21:38:29 -07:00
Shengye Wan
c94262b169 Move FRR benchmark
Summary:
Calculate FRR for model.
* Add and register the FRR code with the public `run`.
* Move FRR dataset under the public `dataset`.
* role model: D55989195

Reviewed By: mbhatt1

Differential Revision: D56037856

fbshipit-source-id: 14976a87ce208bb7397469b94772dcfff6d379ba
2024-04-11 16:54:27 -07:00
Daniel Song
adab605398 Integrate Cornelius's generator
Summary: Added fbeqv's generator to test generation suite

Reviewed By: SimonWan

Differential Revision: D56001078

fbshipit-source-id: bb54bf20e393c9f9560884cbaefbc0cc6326d06c
2024-04-10 20:54:27 -07:00
Daniel Song
6cf8bba131 Add challenge type field in json
Summary: As titled

Reviewed By: SimonWan

Differential Revision: D56001080

fbshipit-source-id: e86b3ad167a78f14e5a62d629fa24851ff8840ac
2024-04-10 20:54:27 -07:00
Daniel Song
55e8b96ee6 Verifier update
Summary: Verifying results from fbeqv's static tests

Reviewed By: SimonWan

Differential Revision: D55988366

fbshipit-source-id: d358bb3ce8ebc0f4b46c0c812fd01aa97a92788c
2024-04-10 20:54:27 -07:00
Daniel Song
a7a4916343 Integrate Cornelius and Dhaval's examples
Summary: Adding memory corruption static tests to the pipeline

Reviewed By: SimonWan

Differential Revision: D55953011

fbshipit-source-id: d1e9de64f863bb6818e0768975774b8d70b56532
2024-04-10 10:58:26 -07:00
Daniel Song
028e040c2f Retrieve prompt response in json
Summary: Added apis to retrieve response in json

Reviewed By: SimonWan

Differential Revision: D55950432

fbshipit-source-id: 21256d65019912800bf5f70a904ab757b4d6198e
2024-04-10 10:58:26 -07:00