mirror of
https://github.com/lm-sys/RouteLLM.git
synced 2024-07-11 08:05:43 +03:00
Add indep benchmark plots
This commit is contained in:
BIN
assets/indep-benchmarks-llama.png
Normal file
BIN
assets/indep-benchmarks-llama.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 46 KiB |
BIN
assets/indep-benchmarks.png
Normal file
BIN
assets/indep-benchmarks.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 46 KiB |
@@ -4,7 +4,12 @@
|
||||
|
||||
In our [blog](https://lmsys.org/blog/2024-07-01-routellm/), we report MT Bench results comparing our routers to [Martian](https://withmartian.com) and [Unify AI](https://unify.ai), two commercial offerings for routing. We conduct these benchmarks using the official [FastChat](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) repository, replacing model calls to either Martian or Unify AI using an OpenAI-compatible server.
|
||||
|
||||
For Unify AI, we pick the best-performing router on the user dashboard `router@q:1|c:1.71e-03|t:1.10e-05|i:1.09e-03` and use it for benchmarking, allowing calls to either GPT-4 Turbo or Mixtral 8x7B. Using this router, we obtain a MT Bench score of `8.757862` with `45.625%` of calls routed to GPT-4.
|
||||
<p align="center">
|
||||
<img src="../assets/indep-benchmarks.png" width="50%" />
|
||||
</p>
|
||||
|
||||
For Unify AI, we pick the best-performing router on the user dashboard `router@q:1|c:1.71e-03|t:1.10e-05|i:1.09e-03` and use it for benchmarking, allowing calls to either GPT-4 Turbo or Mixtral 8x7B. Using this router, we obtain a MT Bench score of `8.757862` with `45.625%` of calls routed to GPT-4. In comparison, our best-performing router achieves the same performance with 25.40% GPT-4 calls.
|
||||
|
||||
```python
|
||||
client = openai.OpenAI(
|
||||
base_url="https://api.unify.ai/v0/",
|
||||
@@ -16,7 +21,11 @@ response = client.chat.completions.create(
|
||||
)
|
||||
```
|
||||
|
||||
For Martian, we allow calls to either GPT-4 Turbo or Llama 2 70B Chat based on the [list of supported models](https://docs.withmartian.com/martian-model-router/model-router/supported-models-router). Because the API does not return which model each request is routed to, we use the `max_cost_per_million_tokens` parameter to estimate the % of GPT-4 calls. Specifically, we set the `max_cost_per_million_tokens` to be $10.45, a value approximated using public inference costs for `llama-2-70b-chat` and `gpt-4-turbo-2024-04-09` from [Together.AI](https://www.together.ai/pricing) and [OpenAI](https://openai.com/api/pricing/). Given a per M tokens cost of $0.90 for Llama 2 70B Chat and per M tokens cost of $20 for GPT-4 Turbo (assuming a 1:1 input:output ratio), we calculate `($20 + $0.90) / 2 = $10.45` so that approximately 50% of calls are routed to GPT-4. Using this, we obtain a MT Bench score of `8.3125`.
|
||||
<p align="center">
|
||||
<img src="../assets/indep-benchmarks-llama.png" width="50%" />
|
||||
</p>
|
||||
|
||||
For Martian, we allow calls to either GPT-4 Turbo or Llama 2 70B Chat based on the [list of supported models](https://docs.withmartian.com/martian-model-router/model-router/supported-models-router). Because the API does not return which model each request is routed to, we use the `max_cost_per_million_tokens` parameter to estimate the % of GPT-4 calls. Specifically, we set the `max_cost_per_million_tokens` to be $10.45, a value approximated using public inference costs for `llama-2-70b-chat` and `gpt-4-turbo-2024-04-09` from [Together.AI](https://www.together.ai/pricing) and [OpenAI](https://openai.com/api/pricing/). Given a per M tokens cost of $0.90 for Llama 2 70B Chat and per M tokens cost of $20 for GPT-4 Turbo (assuming a 1:1 input:output ratio), we calculate `($20 + $0.90) / 2 = $10.45` so that approximately 50% of calls are routed to GPT-4. Using this, we obtain a MT Bench score of `8.3125`. In comparison, our best-performing router achieves the same performance with 29.66% GPT-4 calls.
|
||||
|
||||
```python
|
||||
client = openai.OpenAI(
|
||||
|
||||
Reference in New Issue
Block a user