1562 Commits

Author SHA1 Message Date
Varshith Bathini
6ed76b3493 Merge branch 'main' into main 2024-07-28 01:00:08 +05:30
Varshith
54993995dc conflicts 2024-07-28 00:44:41 +05:30
Varshith
9d2616b9cf shareded inference 2024-07-28 00:30:34 +05:30
Alex Cheema
faa1319470 disable chatgpt api integration test, github changed something in their mac runners? perhaps time to switch over to circleci like mlx 2024-07-26 23:52:22 -07:00
Alex Cheema
67a1aaa823 check processes in github workflow 2024-07-26 23:37:26 -07:00
Alex Cheema
9a3ac273a9 Merge pull request #77 from Cloud1590/main
Update device_capabilities.py (first PR ever... hope I did okay lol)
2024-07-26 22:52:15 -07:00
Alex Cheema
628d8679b0 force mlx inference engine in github workflow, where it defaults to tinygrad because it's running on 'model': 'Apple Virtual Machine 1', 'chip': 'Apple M1 (Virtual)' 2024-07-26 21:46:41 -07:00
Alex Cheema
e856d7f7f9 log chatgpt integration test output from each process on github workflow failure 2024-07-26 21:37:42 -07:00
Mark Kockerbeck
d2fa7b247e Showing the message only if successfully decoded, #75 2024-07-26 12:06:17 -07:00
Mark Kockerbeck
f1cd5ae7a6 Merge branch 'main' of github.com:xeb/exo 2024-07-26 12:04:18 -07:00
Mark Kockerbeck
4f5ab78d9d Addressing issue #75 to avoid decoding binary packets 2024-07-26 12:03:49 -07:00
Varshith
7cbf6a35bd working test 2024-07-26 19:12:42 +05:30
Alex Cheema
5a23376059 add log_request middleware if DEBUG>=2 to chatgpt api to debug api issues, default always to llama-3.1-8b 2024-07-25 20:33:26 -07:00
Varshith
803a442141 init 2024-07-26 06:00:29 +05:30
Alex Cheema
2084784470 per-request kv cache, remove all explicit reset functionality as it wasnt used. fixes #67 2024-07-25 17:09:34 -07:00
Alex Cheema
dd8c5d63a9 add support for mistral nemo and mistral large 2024-07-25 16:40:23 -07:00
Alex Cheema
03fe7a058c more robust message parsing fixes #81 2024-07-25 12:58:55 -07:00
Cloud1590
0770c59d5f Update main.py 2024-07-25 00:25:47 -05:00
Cloud1590
e1792e29b9 chore: Update argparse action for --disable-tui flag 2024-07-25 00:15:35 -05:00
Cloud1590
2c71a4b1ac Update device_capabilities.py
Added flops for most modern NVIDIA and AMD GPUs.
2024-07-25 00:00:27 -05:00
Alex Cheema
942012577a styling for tinychat model selector 2024-07-24 14:27:58 -07:00
Alex Cheema
5ac6b6a717 clearer documentation on accessing web UI and chatgpt-api 2024-07-24 14:27:37 -07:00
Alex Cheema
9a373c2bb0 make configurable discovery timeout 2024-07-23 20:04:13 -07:00
Alex Cheema
63a05d5b4f make configurable discovery timeout 2024-07-23 20:03:31 -07:00
Alex Cheema
8d2bb819bf add llama-3.1 notice to README 2024-07-23 15:51:29 -07:00
Alex Cheema
7a2fbf22b9 add model selection to tinychat 2024-07-23 15:51:19 -07:00
Alex Cheema
bbfd5adc20 add support for llama3.1 (8b, 70b, 405b). bump mlx up to 0.16.0 and mlx-lm up to 0.16.1. fixes #66 2024-07-23 14:41:34 -07:00
Alex Cheema
5496cd85f5 Revert "smart model downloading for mlx #16"
This reverts commit 3a230f3b44.
2024-07-22 22:32:20 -07:00
Alex Cheema
3a230f3b44 smart model downloading for mlx #16 2024-07-22 22:10:10 -07:00
Alex Cheema
174cff071e Merge pull request #58 from jakobdylanc/main
Inference engine selection improvements
2024-07-22 12:11:12 -07:00
Alex Cheema
b0e7dd9d2d add max-generate-tokens flag fixes #54 2024-07-22 11:47:49 -07:00
JakobDylanC
f2f61ccee6 inference engine selection improvements 2024-07-22 10:13:52 -04:00
Alex Cheema
4e46232364 add simple prometheus metrics collection, with a prometheus / grafana instance for live dashboard. related: #22 2024-07-22 02:38:37 -07:00
Alex Cheema
2e419ba211 Merge pull request #48 from itsknk/intel-mac
Implement dynamic inference engine selection #45
2024-07-22 00:29:47 -07:00
itsknk
e934664168 implement dynamic inference engine selection
implement the system detection and inference engine selection

implement dynamic inference engine selection

implement dynamic inference engine selection

implement dynamic inference engine selection

remove inconsistency

implement dynamic inference engine selection
2024-07-21 21:56:13 -07:00
Alex Cheema
1fcbe18baa fix m2 ultra flops 2024-07-20 21:37:27 -07:00
Alex Cheema
9d9d257eb2 reduce chatgpt api response timeout in test 2024-07-20 19:19:28 -07:00
Alex Cheema
8850187b8a tell the mofo in the workflow to keep responses concise 2024-07-20 18:11:47 -07:00
Alex Cheema
052ee1c7e9 cache isolation per workflow job 2024-07-20 17:55:42 -07:00
Alex Cheema
ce41e653c0 check cached files in workflow 2024-07-20 17:50:56 -07:00
Alex Cheema
3d82338c21 debug cached files in workflow 2024-07-20 17:49:42 -07:00
Alex Cheema
aec58b3b36 remove redaudant discovery check in automated test 2024-07-20 16:13:01 -07:00
Alex Cheema
9785e250c0 formatting if 2024-07-20 15:15:05 -07:00
Alex Cheema
7708b47020 Merge pull request #49 from apotl/disable-viz-flag
Flag to disable Viz TUI
2024-07-20 15:13:44 -07:00
Alex Cheema
08b2f37532 test output spacing 2024-07-20 15:12:05 -07:00
Alec Potluri
db583a863f disable tui flag 2024-07-20 17:46:15 -04:00
Alex Cheema
821f114bf9 add tests badge 2024-07-20 14:02:18 -07:00
Alex Cheema
71b8c660be test workflow 2024-07-20 13:21:36 -07:00
Alex Cheema
6c871562e4 fix huggingface cache 2024-07-20 13:20:39 -07:00
Alex Cheema
cf98cc50fa trigger workflow 2024-07-20 12:45:34 -07:00