mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2023-08-23 10:47:54 +03:00
1d986983d54c011cd2dd716e21219dd8cc2f45eb
BLOOM Inference
A Rust and gRPC server for BLOOM Inference.
Install
cd server
pip install .
cd router
cargo build --release
Run
python server/bloom_inference/main.py bigscience/bloom --num-gpus 8 --shard-directory /dev/shm/models
./router/target/release/router
TODO:
- Improve model download
- Store "shardable" layers separately and layer by layer
- Add batching args to router CLI
- Add docstrings + comments everywhere as the codebase is fairly complicated
- Add tests
- Add shutdown logic in router and server
- Improve multi-processing logic in server
- Improve error handling everywhere
- Improve past key layer indexing?
Description
Languages
Python
73.9%
Rust
19.3%
Cuda
4.9%
Dockerfile
0.6%
C++
0.6%
Other
0.6%