Commit Graph

59 Commits

Author SHA1 Message Date
Kyle Corbitt
a1b03ddad1 Merge pull request #109 from OpenPipe/debug-prompts
Add debug modal for output cells
2023-08-01 22:51:39 -07:00
Kyle Corbitt
6be32bea4c Add debug modal for output cells
See the actual input that a model got for a specific cell. The formatting isn't great right now; should probably iterate on that.
2023-08-01 22:49:38 -07:00
arcticfly
72c70e2a55 Improve conversion to/from Claude (#108)
* Increase min width of prompt variant

* Increase width of custom instructions input

* Start recording API docs

* Provide better instructions for converting to/from Claude

* Fix prettier
2023-08-01 21:03:23 -07:00
arcticfly
026532f2c2 Model selection styling changes (#107)
* Model selection styling changes

* Fix prettier
2023-08-01 18:45:15 -07:00
Kyle Corbitt
6316eaae6d dummy key at build time 2023-07-31 18:03:51 -07:00
Kyle Corbitt
8513924ea5 give the openai lib a dummy default value to try to fix the build 2023-07-31 17:39:45 -07:00
arcticfly
26b6fa4f0c Requeue rate-limited query model tasks (#99)
* Continue polling stats until all evals complete

* Return evaluation changes early, before it has run

* Add task for running new eval

* requeue rate-limited tasks

* Fix prettier
2023-07-26 16:30:50 -07:00
arcticfly
d4fb8b689a Ensure evals run properly (#96)
* Run evals against llama output

* Continue polling in OutputCell until evals complete

* Remove unnecessary check
2023-07-25 20:01:58 -07:00
arcticfly
98b231c8bd Store multiple ModelResponses (#95)
* Store multiple ModelResponses

* Fix prettier

* Add CellContent container
2023-07-25 18:54:38 -07:00
Kyle Corbitt
e1cbeccb90 Better streaming
- Always stream the visible scenarios, if the modelProvider supports it
 - Never stream the invisible scenarios

Also actually runs our query tasks in a background worker, which we weren't quite doing before.
2023-07-24 18:34:30 -07:00
arcticfly
d6b97b29f7 Allow experiment forking (#89)
* Move DeleteButton into a separate file

* Rename plural relations

* Add ability to fork

* Fork automatically after auth upon return

* Add experiment card skeleton

* Create HeaderButtons component

* return no header buttons while experiment loading

* Fix prettier

* Remove unused variable

* Remove newline

* Default json values to undefined

* Change header styles

* Fix prettier

* Give AddScenario icon less width

* Move useEffect

* Skip invalidating experiments list after forking

* Require user to be able to view experiment to fork it

* Move experiment creation into same transaction

* Only return the forked experiment id

* Put delete button in experiment settings drawer

* Move useEffect hook
2023-07-24 18:10:59 -07:00
arcticfly
2b2e0ab8ee Define refinement actions in the model providers (#87)
* Add descriptions of fields in llama 2 input schema

* Let GPT-4 know when the provider stays the same

* Allow refetching in the event of any errors

* Define refinement actions in model providers

* Fix prettier
2023-07-23 17:37:08 -07:00
arcticfly
6fb7a82d72 Add support for switching to Llama models (#80)
* Add support for switching to Llama models

* Fix prettier
2023-07-21 20:10:59 -07:00
Kyle Corbitt
52d1d5c7ee Copy over evals when new cell created
Fixes a bug where new cells generated as clones of existing cells didn't get the eval results cloned as well.
2023-07-21 18:40:40 -07:00
Kyle Corbitt
7e1fbb3767 Slightly better typings for ModelProviders
Still not great because the `any`s loosen some call sites up more than I'd like, but better than the broken types before.
2023-07-21 06:50:05 -07:00
David Corbitt
a5d972005e Add user's current prompt to prompt derivation 2023-07-21 00:43:39 -07:00
Kyle Corbitt
847753c32b replicate/llama2 provider
Still need to fix the types but it runs
2023-07-20 19:55:03 -07:00
Kyle Corbitt
332a2101c0 More work on modelProviders
I think everything that's OpenAI-specific is inside modelProviders at this point, so we can get started adding more providers.
2023-07-20 18:54:26 -07:00
Kyle Corbitt
ded6678e97 Prep for more model providers
Adds a `modelProvider` field to `promptVariants`, currently just set to "openai/ChatCompletion" for all variants for now.

Adds a `modelProviders/` directory where we can define and store pluggable model providers. Currently just OpenAI. Not everything is pluggable yet -- notably the code to actually generate completions hasn't been migrated to this setup yet.

Does a lot of work to get the types working. Prompts are now defined with a function `definePrompt(modelProvider, config)` instead of `prompt = config`. Added a script to migrate old prompt definitions.

This is still partial work, but the diff is large enough that I want to get it in. I don't think anything is broken but I haven't tested thoroughly.
2023-07-20 14:49:22 -07:00
arcticfly
86dc36a656 Improve refinement (#69)
* Format construction function on return

* Add more refinement examples

* Treat 503 like 429

* Define prompt as object

* Fix prettier
2023-07-20 13:05:27 -07:00
arcticfly
e598e454d0 Add new predefined refinement options (#67)
* Add new predefined refinement options

* Fix prettier

* Add icon to SelectModelModal title
2023-07-19 20:10:08 -07:00
David Corbitt
6e3f90cd2f Add more info to refinement 2023-07-19 18:10:23 -07:00
arcticfly
e6e2c706c2 Change up refinement UI (#66)
* Remove unused ScenarioVariantCell fields

* Refine deriveNewConstructFn

* Fix prettier

* Remove migration script

* Add refine modal

* Fix prettier

* Fix diff checker overflow

* Decrease diff height

* Add more context to prompt refining

* Auto-expand prompt when refining
2023-07-19 17:19:45 -07:00
Kyle Corbitt
60765e51ac Remove model from promptVariant and add cost
Storing the model on promptVariant is problematic because it isn't always in sync with the actual prompt definition. I'm removing it for now to see if we can get away with that -- might have to add it back in later if this causes trouble.

Added `cost` to modelOutput as well so we can cache that, which is important given that the cost calculations won't be the same between different API providers.
2023-07-19 16:20:53 -07:00
arcticfly
4c97b9f147 Refine prompt (#63)
* Remove unused ScenarioVariantCell fields

* Refine deriveNewConstructFn

* Fix prettier

* Remove migration script

* Add refine modal

* Fix prettier

* Fix diff checker overflow

* Decrease diff height
2023-07-19 15:31:40 -07:00
arcticfly
58892d8b63 Remove unused fields, refine model translation (#62)
* Remove unused ScenarioVariantCell fields

* Refine deriveNewConstructFn

* Fix prettier
2023-07-19 13:59:11 -07:00
Kyle Corbitt
1dcdba04a6 User accounts
Allows for the creation of user accounts. A few notes on the specifics:

 - Experiments are the main access control objects. If you can view an experiment, you can view all its prompts/scenarios/evals. If you can edit it, you can edit or delete all of those as well.
 - Experiments are owned by Organizations in the database. Organizations can have multiple members and members can have roles of ADMIN, MEMBER or VIEWER.
 - Organizations can either be "personal" or general. Each user has a "personal" organization created as soon as they try to create an experiment. There's currently no UI support for creating general orgs or adding users to them; they're just in the database to future-proof all the ACL logic.
 - You can require that a user is signed-in to see a route using the `protectedProcedure` helper. When you use `protectedProcedure`, you also have to call `ctx.markAccessControlRun()` (or delegate to a function that does it for you; see accessControl.ts). This is to remind us to actually check for access control when we define a new endpoint.
2023-07-18 21:19:03 -07:00
arcticfly
e0e64c4207 Allow user to create a version of their current prompt with a new model (#58)
* Add dropdown header for model switching

* Allow variant duplication

* Fix prettier

* Use env variable to restrict prisma logs

* Fix env.mjs

* Remove unnecessary scroll bar from function call output

* Properly record when 404 error occurs in queryLLM task

* Add SelectedModelInfo in SelectModelModal

* Add react-select

* Calculate new prompt after switching model

* Send newly selected model with creation request

* Get new prompt construction function back from GPT-4

* Fix prettier

* Fix prettier
2023-07-18 18:24:04 -07:00
arcticfly
fa5b1ab1c5 Allow user to duplicate prompt (#57)
* Add dropdown header for model switching

* Allow variant duplication

* Fix prettier
2023-07-18 13:49:33 -07:00
David Corbitt
999a4c08fa Fix lint and prettier 2023-07-18 11:11:20 -07:00
arcticfly
374d0237ee Escape characters in Regex evaluations, minor UI fixes (#56)
* Fix ScenariosHeader stickiness

* Move meta tag from _app.tsx to _document.tsx

* Show spinner when saving variant

* Escape quotes and regex in evaluations
2023-07-18 11:07:04 -07:00
Kyle Corbitt
7d41e94ca2 cache eval outputs and add gpt4 eval 2023-07-17 17:55:36 -07:00
Kyle Corbitt
011b12abb9 cache output evals 2023-07-17 17:52:30 -07:00
Kyle Corbitt
54369dba54 Fix seeds and update eval field names 2023-07-17 14:14:20 -07:00
Kyle Corbitt
26ee8698be Make it so you can't delete the last prompt or scenario
No reason for an experiment to have 0 prompts or 0 scenarios and it makes the UI look bad.
2023-07-14 15:49:42 -07:00
arcticfly
b98eb9b729 Trigger llm output retrieval on server (#39)
* Rename tables, add graphile workers, update types

* Add dev:worker command

* Update pnpm-lock.yaml

* Remove sentry config import from worker.ts

* Stop generating new cells in cell router get query

* Generate new cells for new scenarios, variants, and experiments

* Remove most error throwing from queryLLM.task.ts

* Remove promptVariantId and testScenarioId from ModelOutput

* Remove duplicate index from ModelOutput

* Move inputHash from cell to output

* Add TODO

* Add todo

* Show cost and time for each cell

* Always show output stats if there is output

* Trigger LLM outputs when scenario variables are updated

* Add newlines to ends of files

* Add another newline

* Cascade ModelOutput deletion

* Fix linting and prettier

* Return instead of throwing for non-pending cell

* Remove pnpm dev:worker from pnpm:dev

* Update pnpm-lock.yaml
2023-07-14 16:38:46 -06:00
Kyle Corbitt
a5378b106b store model and use to calculate completion costs 2023-07-14 11:06:07 -07:00
Kyle Corbitt
4770ea34a8 Use javascript functions for prompt completions instead of templated json 2023-07-13 18:01:07 -07:00
arcticfly
187d6492f8 Reevaluate all prompt stats when scenario is hidden (#32)
* Reevaluate when scenario is hidden

* Add newline
2023-07-10 13:51:40 -06:00
arcticfly
e64a94e06e Record experiment updated in more places (#24)
* Record experiment updated in more places

* Update experiment updatedAt in same transaction
2023-07-10 12:00:24 -06:00
arcticfly
32a80f8475 Limit evaluations to visible test scenarios (#28) 2023-07-10 02:10:23 -06:00
Kyle Corbitt
a8db6cadfd format with prettier 3 2023-07-08 22:12:47 -07:00
Kyle Corbitt
8e0722cd22 wrong denominator 2023-07-07 17:48:34 -07:00
Kyle Corbitt
46344d8fc4 small bugfixes 2023-07-07 12:22:27 -07:00
arcticfly
a2c7ef73ec Retry requests that receive 429 (#15)
* List number of scenarios

* Retry requests after 429

* Rename requestCallback

* Add sleep function

* Allow manual retry on frontend

* Remove unused utility functions

* Auto refetch

* Display wait time with Math.ceil

* Take one second modulo into account

* Add pluralize
2023-07-06 21:39:23 -07:00
arcticfly
fe501a80cb Add total token cost to variant stats (#13)
* Add total token cost to variant stats

* Copy over token counts for new variants

* Update invalidate call
2023-07-06 15:33:49 -07:00
Kyle Corbitt
1fa0d7bc62 bugfixes 2023-07-06 15:22:35 -07:00
arcticfly
92c240e7b8 Add request cost to OutputStats (#12) 2023-07-06 14:36:31 -07:00
Kyle Corbitt
f728027ef6 add evaluations 2023-07-06 13:44:03 -07:00
arcticfly
1ae5612d55 Add promptTokens and completionTokens to model output (#11)
* Default to streaming in config

* Add tokens to database

* Add NEXT_PUBLIC_SOCKET_URL to .env.example

* Disable streaming for functions

* Add newline to types
2023-07-06 13:12:59 -07:00