Olivier Dehaene fa9a088467 Add load testing
2022-10-11 10:36:51 +02:00
2022-10-11 10:36:51 +02:00
2022-10-08 12:30:12 +02:00
2022-10-11 10:36:51 +02:00
2022-10-08 12:34:25 +02:00
2022-10-11 10:36:51 +02:00
2022-10-08 12:30:12 +02:00

BLOOM Inference

A Rust and gRPC server for BLOOM Inference.

Install

cd server
pip install .
cd router
cargo build --release

Run

python server/bloom_inference/main.py bigscience/bloom --num-gpus 8 --shard-directory /dev/shm/models
./router/target/release/router

TODO:

  • Improve model download
    • Store "shardable" layers separately and layer by layer
  • Add batching args to router CLI
  • Add docstrings + comments everywhere as the codebase is fairly complicated
  • Add tests
  • Add shutdown logic in router and server
  • Improve multi-processing logic in server
  • Improve error handling everywhere
  • Improve past key layer indexing?
Description
Large Language Model Text Generation Inference
Readme 4.1 MiB
Languages
Python 69.5%
Rust 22.7%
Cuda 5.8%
Dockerfile 0.7%
C++ 0.7%
Other 0.5%