1562 Commits

Author SHA1 Message Date
Alex Cheema
b01f69bb6b add support for multiple concurrent requests with request ids 2024-07-13 23:11:01 -07:00
Alex Cheema
7077652c8e graceful node shutdown 2024-07-13 20:43:37 -07:00
Alex Cheema
ca6095c04d a generic test for every inference engine 2024-07-13 18:25:26 -07:00
Alex Cheema
850b72d3ea make StatefulShardedModel callable, add some tests for mlx sharded inference 2024-07-13 15:41:15 -07:00
Alex Cheema
6ee0547eff fix layer calculation for sharded llama 2024-07-13 15:39:31 -07:00
Alex Cheema
445eda156c dynamically assign shards to nodes deterministically weighted by memory 2024-06-25 21:17:58 +01:00
Alex Cheema
36b8456798 collect global topology with local peer visibility, ring memory weighted partitioning strategy 2024-06-25 12:32:16 +01:00
Alex Cheema
3a66a0a4a8 add requirements.txt 2024-06-24 21:00:04 +01:00
Alex Cheema
ee96c6b023 add another test for device capabiities on MacBook Air 2024-06-24 20:59:55 +01:00
Alex Cheema
6c8c9ee7b1 topology with partitioning strategy 2024-06-24 20:56:50 +01:00
Alex Cheema
563dcb56b0 mlx sharded implementation with example of distributed inference 2024-06-24 19:35:57 +01:00
Alex Cheema
a21f59ff45 scaffolding for networking, inference and orchestration 2024-06-23 23:28:10 +01:00