Alex Cheema
|
b01f69bb6b
|
add support for multiple concurrent requests with request ids
|
2024-07-13 23:11:01 -07:00 |
|
Alex Cheema
|
7077652c8e
|
graceful node shutdown
|
2024-07-13 20:43:37 -07:00 |
|
Alex Cheema
|
ca6095c04d
|
a generic test for every inference engine
|
2024-07-13 18:25:26 -07:00 |
|
Alex Cheema
|
850b72d3ea
|
make StatefulShardedModel callable, add some tests for mlx sharded inference
|
2024-07-13 15:41:15 -07:00 |
|
Alex Cheema
|
6ee0547eff
|
fix layer calculation for sharded llama
|
2024-07-13 15:39:31 -07:00 |
|
Alex Cheema
|
445eda156c
|
dynamically assign shards to nodes deterministically weighted by memory
|
2024-06-25 21:17:58 +01:00 |
|
Alex Cheema
|
36b8456798
|
collect global topology with local peer visibility, ring memory weighted partitioning strategy
|
2024-06-25 12:32:16 +01:00 |
|
Alex Cheema
|
3a66a0a4a8
|
add requirements.txt
|
2024-06-24 21:00:04 +01:00 |
|
Alex Cheema
|
ee96c6b023
|
add another test for device capabiities on MacBook Air
|
2024-06-24 20:59:55 +01:00 |
|
Alex Cheema
|
6c8c9ee7b1
|
topology with partitioning strategy
|
2024-06-24 20:56:50 +01:00 |
|
Alex Cheema
|
563dcb56b0
|
mlx sharded implementation with example of distributed inference
|
2024-06-24 19:35:57 +01:00 |
|
Alex Cheema
|
a21f59ff45
|
scaffolding for networking, inference and orchestration
|
2024-06-23 23:28:10 +01:00 |
|