alihan/llm-text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2023-08-15 01:09:35 +03:00

Go to file

Olivier Dehaene fa9a088467 Add load testing

2022-10-11 10:36:51 +02:00

Add load testing

2022-10-11 10:36:51 +02:00

Init

2022-10-08 12:30:12 +02:00

Add load testing

2022-10-11 10:36:51 +02:00

fix: cleanup

2022-10-08 12:34:25 +02:00

.gitignore

Add load testing

2022-10-11 10:36:51 +02:00

README.md

Init

2022-10-08 12:30:12 +02:00

README.md

BLOOM Inference

A Rust and gRPC server for BLOOM Inference.

Install

cd server
pip install .

cd router
cargo build --release

Run

python server/bloom_inference/main.py bigscience/bloom --num-gpus 8 --shard-directory /dev/shm/models

./router/target/release/router

TODO:

Improve model download
- Store "shardable" layers separately and layer by layer
Add batching args to router CLI
Add docstrings + comments everywhere as the codebase is fairly complicated
Add tests
Add shutdown logic in router and server
Improve multi-processing logic in server
Improve error handling everywhere
Improve past key layer indexing?

Languages

Python 69.5%

Rust 22.7%

Cuda 5.8%

Dockerfile 0.7%

C++ 0.7%

Other 0.5%