Summary:
The new download.sh script for Llama Guard 2 should have the execute bit set.
Pull Request resolved: https://github.com/meta-llama/PurpleLlama/pull/26
Reviewed By: varunfb
Differential Revision: D56312443
Pulled By: litesaber15
fbshipit-source-id: f9dd03b359a524fce38b84f973833246dced4a18
Summary: Update link for the new paper
Reviewed By: YueLi28
Differential Revision: D56308610
fbshipit-source-id: eefee9149cbc3455e9980c8152b30f311dbdc5ca
Summary: This is xingjia01's work, just checking it into the repo :)
Reviewed By: JFChi
Differential Revision: D56287914
fbshipit-source-id: 11293ac1854f64f90e3862bebdc662c4cb0cfea2
Summary:
There were 2 PRs that were merged to purplellama but failed to syncup back to internal.
Now they have diverged. This diff manually merged the 2 versions
Reviewed By: SimonWan
Differential Revision: D56281518
fbshipit-source-id: 0b47c5a0b0a2689aa79ac73b33fcc26a634b08c7
Summary:
Current method will try to look for the os variable
```
Did not find together_api_key, please add an environment variable TOGETHER_API_KEY which contains it, or pass together_api_key as a named parameter. (type=value_error). Sleeping for 1.0 seconds...
```
I followed [this quick start](https://docs.together.ai/docs/quickstart) to update the function so the prompt injection can run as expected.
Reviewed By: YueLi28
Differential Revision: D56274355
fbshipit-source-id: 9682d9517141ee722ad8059be09dcd09c96a1661
Summary: Skipped gemini and mistral score for speed up. It should have been a local commit.
Reviewed By: DhavalKapil
Differential Revision: D56258601
fbshipit-source-id: fb752b08d03f8a4ec4006c505576437447dd8dd1
Summary: As required, we are adding a link from ICD v1 to link to codeshield to avoid confusions
Reviewed By: csahana95
Differential Revision: D56229122
fbshipit-source-id: 4e95dd6db459855d8f99cc94d54d692394e0a4d0
Summary: As title, numpy is not really needed, and is introducing compatibility issues with python 3.12
Reviewed By: csahana95
Differential Revision: D56229050
fbshipit-source-id: ed8e8553919651dbf6e6b5c1f310489f7d647e46
Summary:
these fields were used for generation, get rid of them.
{F1488822137}
Reviewed By: YueLi28
Differential Revision: D56214992
fbshipit-source-id: b532ad9229c73181b56d32b9cf6eabe8a5ba01d4
Summary: Merge prompting and scoring into one pipeline. Now a single command can generate reponses as well as get scores and stats
Reviewed By: YueLi28
Differential Revision: D56208736
fbshipit-source-id: 7ca2382fe823852f3a68232f1a4fac286a838f74
Summary: Run the challenge program with reponse from llms as input
Reviewed By: YueLi28
Differential Revision: D56208735
fbshipit-source-id: 6b98c17cc0ed0a4df1dc894aa51980d0501eda31
Summary:
Changes:
- Generally make system prompts more "watertight" as absolute rules rather than general direction.
- Make some examples focus much more on "application logic" instead of just "do not talk about X"
- Prioritize more scary/security relevant prompts in the file, and add several more of these throuought.
- Clean up some of the judge prompts so they are more watertight in terms of checking whether the injection was successful.
Reviewed By: SimonWan, csahana95
Differential Revision: D56193957
fbshipit-source-id: 2828f021fc0740ebc206caf0bccd58ebcaa50cc1
Summary: Patches the same fix from D56081967.
Reviewed By: SimonWan
Differential Revision: D56170555
fbshipit-source-id: 7599f465332b67755656a703e69618b7422be5a8
Summary: The previous code committed two functions for judging refusal, but we only want the updated version, as this diff does.
Reviewed By: mbhatt1, YueLi28
Differential Revision: D56168296
fbshipit-source-id: be929e2c3e64039c6eb87b0409226d11fe595a80
Summary: - Adds few more test cases simulating bank app bot having access to user db
Reviewed By: cynikolai
Differential Revision: D56132094
fbshipit-source-id: 6da1060339969c3e9b3e569c12009176453645b1
Summary:
As title
Keeping sending out small fixes / changes
Reviewed By: csahana95
Differential Revision: D56118331
fbshipit-source-id: da10e2d8b533d732c8cf647ccfa431da2c38ad7d
Summary: - Adds few more test cases simulating bank app bot having access to user db
Reviewed By: cynikolai
Differential Revision: D56123874
fbshipit-source-id: 450f71a078f6c744034eedfa67f6bc845fb973c7
Summary:
- Adds another injection technique - hidden ASCII tokens. More details here - https://embracethered.com/blog/posts/2024/hiding-and-finding-text-with-unicode-tags/
- The test cases have hidden payloads which are invisible
Reviewed By: cynikolai
Differential Revision: D56123172
fbshipit-source-id: 2f9adcb430d6d04ee2130e13f9d703cf4ed0eb53
Summary: Needed for running the prompt injections properly in the open-source context
Reviewed By: csahana95
Differential Revision: D56119213
fbshipit-source-id: 0667e8416b5f7e529e8257e40e4ac57c521f57d2
Summary:
We were facing issues like:
```
/tmp/tmpav2irf2b/source.c:145:44: error: ‘rd_3’ was not declared in this scope
145 | SE_TARGET_STATE(!parse_content(content,rd_3));
| ^~~~
```
This happened when we had a string + reader being used in main. finalize() only returned the former. Now, we return a list and handle both of them.
Reviewed By: joshsaxe
Differential Revision: D56090345
fbshipit-source-id: 00a6dead5369d1770202a8a8bdd9cfdc78361e06
Summary:
Sometimes things fail because of retries. Bumping to a 100
Created from CodeHub with https://fburl.com/edit-in-codehub
Reviewed By: YueLi28
Differential Revision: D56082197
fbshipit-source-id: 16ef29bb5b43ce9e22fabff790d055463d3ac22d
Summary: As title, we are moving internal benchmark and data out
Reviewed By: cynikolai
Differential Revision: D56040506
fbshipit-source-id: 02318fd5b5792ef9c25d9b7ec594d68ffcdbf449
Summary:
So adding a field in the test_case_fields
Created from CodeHub with https://fburl.com/edit-in-codehub
Reviewed By: dwjsong, csahana95
Differential Revision: D56050013
fbshipit-source-id: 445023978e898db38c9d56dd58601424b72cb610
Summary:
Calculate FRR for model.
* Add and register the FRR code with the public `run`.
* Move FRR dataset under the public `dataset`.
* role model: D55989195
Reviewed By: mbhatt1
Differential Revision: D56037856
fbshipit-source-id: 14976a87ce208bb7397469b94772dcfff6d379ba