Varshith Bathini
|
6ed76b3493
|
Merge branch 'main' into main
|
2024-07-28 01:00:08 +05:30 |
|
Varshith
|
54993995dc
|
conflicts
|
2024-07-28 00:44:41 +05:30 |
|
Varshith
|
9d2616b9cf
|
shareded inference
|
2024-07-28 00:30:34 +05:30 |
|
Alex Cheema
|
faa1319470
|
disable chatgpt api integration test, github changed something in their mac runners? perhaps time to switch over to circleci like mlx
|
2024-07-26 23:52:22 -07:00 |
|
Alex Cheema
|
67a1aaa823
|
check processes in github workflow
|
2024-07-26 23:37:26 -07:00 |
|
Alex Cheema
|
9a3ac273a9
|
Merge pull request #77 from Cloud1590/main
Update device_capabilities.py (first PR ever... hope I did okay lol)
|
2024-07-26 22:52:15 -07:00 |
|
Alex Cheema
|
628d8679b0
|
force mlx inference engine in github workflow, where it defaults to tinygrad because it's running on 'model': 'Apple Virtual Machine 1', 'chip': 'Apple M1 (Virtual)'
|
2024-07-26 21:46:41 -07:00 |
|
Alex Cheema
|
e856d7f7f9
|
log chatgpt integration test output from each process on github workflow failure
|
2024-07-26 21:37:42 -07:00 |
|
Mark Kockerbeck
|
d2fa7b247e
|
Showing the message only if successfully decoded, #75
|
2024-07-26 12:06:17 -07:00 |
|
Mark Kockerbeck
|
f1cd5ae7a6
|
Merge branch 'main' of github.com:xeb/exo
|
2024-07-26 12:04:18 -07:00 |
|
Mark Kockerbeck
|
4f5ab78d9d
|
Addressing issue #75 to avoid decoding binary packets
|
2024-07-26 12:03:49 -07:00 |
|
Varshith
|
7cbf6a35bd
|
working test
|
2024-07-26 19:12:42 +05:30 |
|
Alex Cheema
|
5a23376059
|
add log_request middleware if DEBUG>=2 to chatgpt api to debug api issues, default always to llama-3.1-8b
|
2024-07-25 20:33:26 -07:00 |
|
Varshith
|
803a442141
|
init
|
2024-07-26 06:00:29 +05:30 |
|
Alex Cheema
|
2084784470
|
per-request kv cache, remove all explicit reset functionality as it wasnt used. fixes #67
|
2024-07-25 17:09:34 -07:00 |
|
Alex Cheema
|
dd8c5d63a9
|
add support for mistral nemo and mistral large
|
2024-07-25 16:40:23 -07:00 |
|
Alex Cheema
|
03fe7a058c
|
more robust message parsing fixes #81
|
2024-07-25 12:58:55 -07:00 |
|
Cloud1590
|
0770c59d5f
|
Update main.py
|
2024-07-25 00:25:47 -05:00 |
|
Cloud1590
|
e1792e29b9
|
chore: Update argparse action for --disable-tui flag
|
2024-07-25 00:15:35 -05:00 |
|
Cloud1590
|
2c71a4b1ac
|
Update device_capabilities.py
Added flops for most modern NVIDIA and AMD GPUs.
|
2024-07-25 00:00:27 -05:00 |
|
Alex Cheema
|
942012577a
|
styling for tinychat model selector
|
2024-07-24 14:27:58 -07:00 |
|
Alex Cheema
|
5ac6b6a717
|
clearer documentation on accessing web UI and chatgpt-api
|
2024-07-24 14:27:37 -07:00 |
|
Alex Cheema
|
9a373c2bb0
|
make configurable discovery timeout
|
2024-07-23 20:04:13 -07:00 |
|
Alex Cheema
|
63a05d5b4f
|
make configurable discovery timeout
|
2024-07-23 20:03:31 -07:00 |
|
Alex Cheema
|
8d2bb819bf
|
add llama-3.1 notice to README
|
2024-07-23 15:51:29 -07:00 |
|
Alex Cheema
|
7a2fbf22b9
|
add model selection to tinychat
|
2024-07-23 15:51:19 -07:00 |
|
Alex Cheema
|
bbfd5adc20
|
add support for llama3.1 (8b, 70b, 405b). bump mlx up to 0.16.0 and mlx-lm up to 0.16.1. fixes #66
|
2024-07-23 14:41:34 -07:00 |
|
Alex Cheema
|
5496cd85f5
|
Revert "smart model downloading for mlx #16"
This reverts commit 3a230f3b44.
|
2024-07-22 22:32:20 -07:00 |
|
Alex Cheema
|
3a230f3b44
|
smart model downloading for mlx #16
|
2024-07-22 22:10:10 -07:00 |
|
Alex Cheema
|
174cff071e
|
Merge pull request #58 from jakobdylanc/main
Inference engine selection improvements
|
2024-07-22 12:11:12 -07:00 |
|
Alex Cheema
|
b0e7dd9d2d
|
add max-generate-tokens flag fixes #54
|
2024-07-22 11:47:49 -07:00 |
|
JakobDylanC
|
f2f61ccee6
|
inference engine selection improvements
|
2024-07-22 10:13:52 -04:00 |
|
Alex Cheema
|
4e46232364
|
add simple prometheus metrics collection, with a prometheus / grafana instance for live dashboard. related: #22
|
2024-07-22 02:38:37 -07:00 |
|
Alex Cheema
|
2e419ba211
|
Merge pull request #48 from itsknk/intel-mac
Implement dynamic inference engine selection #45
|
2024-07-22 00:29:47 -07:00 |
|
itsknk
|
e934664168
|
implement dynamic inference engine selection
implement the system detection and inference engine selection
implement dynamic inference engine selection
implement dynamic inference engine selection
implement dynamic inference engine selection
remove inconsistency
implement dynamic inference engine selection
|
2024-07-21 21:56:13 -07:00 |
|
Alex Cheema
|
1fcbe18baa
|
fix m2 ultra flops
|
2024-07-20 21:37:27 -07:00 |
|
Alex Cheema
|
9d9d257eb2
|
reduce chatgpt api response timeout in test
|
2024-07-20 19:19:28 -07:00 |
|
Alex Cheema
|
8850187b8a
|
tell the mofo in the workflow to keep responses concise
|
2024-07-20 18:11:47 -07:00 |
|
Alex Cheema
|
052ee1c7e9
|
cache isolation per workflow job
|
2024-07-20 17:55:42 -07:00 |
|
Alex Cheema
|
ce41e653c0
|
check cached files in workflow
|
2024-07-20 17:50:56 -07:00 |
|
Alex Cheema
|
3d82338c21
|
debug cached files in workflow
|
2024-07-20 17:49:42 -07:00 |
|
Alex Cheema
|
aec58b3b36
|
remove redaudant discovery check in automated test
|
2024-07-20 16:13:01 -07:00 |
|
Alex Cheema
|
9785e250c0
|
formatting if
|
2024-07-20 15:15:05 -07:00 |
|
Alex Cheema
|
7708b47020
|
Merge pull request #49 from apotl/disable-viz-flag
Flag to disable Viz TUI
|
2024-07-20 15:13:44 -07:00 |
|
Alex Cheema
|
08b2f37532
|
test output spacing
|
2024-07-20 15:12:05 -07:00 |
|
Alec Potluri
|
db583a863f
|
disable tui flag
|
2024-07-20 17:46:15 -04:00 |
|
Alex Cheema
|
821f114bf9
|
add tests badge
|
2024-07-20 14:02:18 -07:00 |
|
Alex Cheema
|
71b8c660be
|
test workflow
|
2024-07-20 13:21:36 -07:00 |
|
Alex Cheema
|
6c871562e4
|
fix huggingface cache
|
2024-07-20 13:20:39 -07:00 |
|
Alex Cheema
|
cf98cc50fa
|
trigger workflow
|
2024-07-20 12:45:34 -07:00 |
|