mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2023-08-15 01:09:35 +03:00
fa9a08846739ed7508f5c802df577242c5f44768
BLOOM Inference
A Rust and gRPC server for BLOOM Inference.
Install
cd server
pip install .
cd router
cargo build --release
Run
python server/bloom_inference/main.py bigscience/bloom --num-gpus 8 --shard-directory /dev/shm/models
./router/target/release/router
TODO:
- Improve model download
- Store "shardable" layers separately and layer by layer
- Add batching args to router CLI
- Add docstrings + comments everywhere as the codebase is fairly complicated
- Add tests
- Add shutdown logic in router and server
- Improve multi-processing logic in server
- Improve error handling everywhere
- Improve past key layer indexing?
Description
Languages
Python
69.5%
Rust
22.7%
Cuda
5.8%
Dockerfile
0.7%
C++
0.7%
Other
0.5%