241 Commits

Author SHA1 Message Date
OlivierDehaene
1883d8ecde feat(docker): improve flash_attention caching (#160) 2023-04-09 19:59:16 +02:00
OlivierDehaene
3f2542bb6a fix(server): fix escape characters in stop sequence (#155) 2023-04-05 19:37:41 +02:00
OlivierDehaene
c0aeb32583 feat(server): flash santacoder (#153) 2023-04-03 19:06:42 +02:00
OlivierDehaene
fef1a1c381 v0.4.3 (#152) 2023-03-30 17:28:14 +02:00
OlivierDehaene
84722f3e33 v0.4.2 (#151) 2023-03-30 17:10:01 +02:00
OlivierDehaene
08b7e4a282 fix(server): fix flash neox rotary embeddings (#150) 2023-03-30 16:12:23 +02:00
OlivierDehaene
610bb1f978 feat(benchmark): tui based benchmarking tool (#149) 2023-03-30 15:26:27 +02:00
OlivierDehaene
c9bdaa8b73 feat(server): reduce mlp and attn in one op for flash neox (#145) 2023-03-28 16:51:41 +02:00
OlivierDehaene
f000068944 feat(server): clear cache on error (#143) 2023-03-28 11:29:35 +02:00
Nick Hill
8e8dd984d8 feat(server): Add mypy-protobuf (#141)
Generates .pyi files for protobuf stubs which provide strong typing
information. Very helpful for IDE auto-completion, etc.
2023-03-27 09:25:15 +02:00
Nick Hill
462530c2b0 fix(server): Avoid using try/except to determine kind of AutoModel (#142) 2023-03-27 09:23:22 +02:00
OlivierDehaene
ab5fd8cf93 v0.4.1 (#140) 2023-03-26 16:37:51 +02:00
OlivierDehaene
678b2f3900 feat(server): cleanup flash neox loading (#139) 2023-03-26 16:37:21 +02:00
OlivierDehaene
d6a93fe992 fix(server): fix flash-neox scores warping (#137) 2023-03-24 18:21:41 +01:00
OlivierDehaene
05e9a796cc feat(server): flash neoX (#133) 2023-03-24 14:02:14 +01:00
OlivierDehaene
b49dbf2d88 fix(server): use server tokenizer as gt (#128) 2023-03-16 12:12:26 +01:00
OlivierDehaene
8ad60b752f fix(server): add position ids to neox (#126) 2023-03-15 13:12:49 +01:00
OlivierDehaene
cbd36aa4d1 fix(server): revert gpt-neox optims (#123) 2023-03-13 22:57:08 +01:00
OlivierDehaene
411d6247f4 v0.4.0 (#119) 2023-03-09 16:07:01 +01:00
OlivierDehaene
c0795de2f2 fix(server): do not warp prefill logits (#116) 2023-03-09 13:00:10 +01:00
OlivierDehaene
1a2d68250a feat: support typical sampling (#114)
closes #112
2023-03-09 11:33:57 +01:00
OlivierDehaene
941cd42e0c fix(server): fix index out of range for watermarking (#110) 2023-03-08 18:29:08 +01:00
OlivierDehaene
b1485e18c5 fix(server): fix galactica batch (#106)
closes #105
2023-03-07 20:05:21 +01:00
OlivierDehaene
3fef90d50f feat(clients): Python client (#103) 2023-03-07 18:52:22 +01:00
OlivierDehaene
cd5961b5da feat: allow local models (#101)
closes #99
2023-03-06 14:39:36 +01:00
OlivierDehaene
9b205d33cc fix(server): fix generate_stream by forcing tokens to be decoded correctly (#100) 2023-03-06 13:22:58 +01:00
OlivierDehaene
1c19b0934e v0.3.2 (#97) 2023-03-03 18:42:20 +01:00
OlivierDehaene
0b6807caa4 feat(server): fix transformers commit (#96) 2023-03-03 17:56:27 +01:00
OlivierDehaene
2d39f199ae feat(server): update to hf_transfer==0.1.2 (#93) 2023-03-03 11:26:27 +01:00
OlivierDehaene
9b8ea6a6c7 feat(server): add logits watermark (#90) 2023-03-02 12:30:41 +01:00
OlivierDehaene
65e2f1624e fix(server): fix token_is_special (#87) 2023-02-24 17:20:00 +01:00
OlivierDehaene
0ac184ce77 feat(server): add special token bool (#85) 2023-02-24 15:55:57 +01:00
OlivierDehaene
4b1c9720c0 v0.3.1 (#84) 2023-02-24 13:27:41 +01:00
OlivierDehaene
44ce098c10 feat(server): pre-allocate max attention mask (#75) 2023-02-24 12:49:21 +01:00
OlivierDehaene
78063c0569 fix(server): remove position_ids from galactica forward (#82)
closes #80
2023-02-20 19:28:57 +01:00
OlivierDehaene
17bc841b1b feat(server): enable hf-transfer (#76) 2023-02-18 14:04:11 +01:00
OlivierDehaene
c720555adc v0.3.0 (#72) 2023-02-16 17:28:29 +01:00
OlivierDehaene
439fcaf810 feat(router): add prometheus metrics scrape endpoint (#71) 2023-02-16 17:18:53 +01:00
OlivierDehaene
c5a4a1faf3 feat(server): improve download logging (#66) 2023-02-15 16:11:32 +01:00
OlivierDehaene
0fbc691946 feat: add safetensors conversion (#63) 2023-02-14 13:02:16 +01:00
OlivierDehaene
9af454142a feat: add distributed tracing (#62) 2023-02-13 13:02:45 +01:00
OlivierDehaene
1ad3250b89 fix(docker): increase shm size (#60) 2023-02-08 17:53:33 +01:00
OlivierDehaene
c503a639b1 feat(server): support t5 (#59) 2023-02-07 18:25:17 +01:00
OlivierDehaene
2fe5e1b30e V0.2.1 (#58) 2023-02-07 15:40:25 +01:00
OlivierDehaene
4acc42a605 fix(server): better handling of inference mode (#57) 2023-02-07 15:38:22 +01:00
OlivierDehaene
20c3c5940c feat(router): refactor API and add openAPI schemas (#53) 2023-02-03 12:43:37 +01:00
OlivierDehaene
b1482d9048 breaking(router): modify /generate API to only return generated text (#50)
@njhill, @yk FYI

generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.

We also remove the unused Vec.
2023-02-02 15:02:04 +01:00
OlivierDehaene
df227ac20d fix(server): allow greedy repetition penalty (#51) 2023-02-02 10:34:35 +01:00
OlivierDehaene
775115e3a5 feat(server): allow the server to use a local weight cache (#49) 2023-02-01 16:22:10 +01:00
OlivierDehaene
313194f6d7 feat(server): support repetition penalty (#47) 2023-02-01 15:58:42 +01:00