Commit Graph

  • 05dd14fdb9 Fix tokenizers==0.13.4 . (#838) main Nicolas Patry 2023-08-14 19:26:19 +02:00
  • 5fa4676221 Tp ready. idefics Nicolas Patry 2023-08-14 17:05:21 +00:00
  • 0e992ff615 Adding Idefics multi modal model. Nicolas Patry 2023-08-14 16:05:47 +00:00
  • d8f1337e7e README edit -- running the service with no GPU or CUDA support (#773) Pasquale Minervini 2023-08-14 15:41:13 +02:00
  • a072660bf5 fix: LlamaTokenizerFast to AutoTokenizer at flash_llama.py (#619) Dong Shin 2023-08-14 21:20:18 +09:00
  • b5087c4f4e Fix rope dynamic + factor (#822) Nicolas Patry 2023-08-14 14:09:51 +02:00
  • 3ffcd9d311 Added two more features in readme.md file (#831) sawan Rawat 2023-08-14 17:39:20 +05:30
  • d71237fc8b Have snippets in Python/JavaScript in quicktour (#809) Omar Sanseviero 2023-08-14 13:47:32 +02:00
  • 09eca64227 Version 1.0.1 (#836) v1.0.1 Nicolas Patry 2023-08-14 11:23:11 +02:00
  • 89a4e723d2 Attempting to fix torch leak. fix_leak Nicolas Patry 2023-08-12 09:06:49 +02:00
  • a2a913eec5 Added streaming for InferenceClient (#821) Merve Noyan 2023-08-11 18:05:19 +03:00
  • cc7bb5084d Upgrade transformers (fix protobuf==3.20 issue) (#795) Nicolas Patry 2023-08-11 16:46:08 +02:00
  • d0e30771c2 Added ChatUI Screenshot to Docs (#823) Merve Noyan 2023-08-11 17:42:43 +03:00
  • 4a9615e8ff Add to ToC streaming_conceptual osanseviero 2023-08-11 15:05:10 +02:00
  • 6daee77c09 Add embedded space osanseviero 2023-08-11 15:03:56 +02:00
  • 5df4c7c0d7 [docs] Build docs only when doc files change (#812) Mishig 2023-08-11 07:07:53 +02:00
  • e58ad6dd66 Added CLI docs (#799) Merve Noyan 2023-08-10 16:00:30 +03:00
  • 7dbaef3f5b Minor docs style fixes (#806) Omar Sanseviero 2023-08-10 14:32:51 +02:00
  • 04f7c2d86b Fix gated docs (#805) Omar Sanseviero 2023-08-10 14:32:07 +02:00
  • 8bdb16ee9a Use destructuring in router arguments to avoid '.0' (#798) ivarflakstad 2023-08-10 10:52:50 +02:00
  • 43ed6c217a Dummy commit test_docs osanseviero 2023-08-10 10:33:52 +02:00
  • 647ae7a7d3 Setup for doc-builder and docs for TGI (#740) Merve Noyan 2023-08-10 11:24:52 +03:00
  • 0e8b47811e Llama change. (#793) Nicolas Patry 2023-08-08 13:43:40 +02:00
  • c4dac9f3dc Update __init__.py (#794) Nicolas Patry 2023-08-08 12:09:51 +02:00
  • 4ddb6681ac Add workflow to upload documentation osanseviero-patch-1 Omar Sanseviero 2023-08-08 07:49:45 +02:00
  • 1fdc88ee90 Fixing non 4bits quantization. (#785) Nicolas Patry 2023-08-07 13:02:00 +02:00
  • 891e19cc51 Fix dynamic rope. (#783) Nicolas Patry 2023-08-07 12:28:19 +02:00
  • 16fadcec57 Merge BNB 4bit. (#770) Nicolas Patry 2023-08-03 23:00:59 +02:00
  • f91e9d282d fix build tokenizer in quantize and remove duplicate import (#768) zspo 2023-08-04 04:21:33 +08:00
  • 6ec5288ab7 This should prevent the PyTorch overriding. (#767) Nicolas Patry 2023-08-03 21:54:39 +02:00
  • ac736fd89c feat(server): Add native support for PEFT Lora models (#762) Nicolas Patry 2023-08-03 17:22:45 +02:00
  • 8b0d608f1f Automatically map deduplicated safetensors weights to their original values (#501) (#761) Nicolas Patry 2023-08-02 20:24:37 +02:00
  • bd3088748e add FastLinear import (#750) zspo 2023-08-03 02:04:46 +08:00
  • e994ad1172 Added InferenceClient model_compat_log Merve Noyan 2023-08-02 17:57:01 +03:00
  • bb83f333b7 Added consuming TGI with ChatUI Merve Noyan 2023-08-02 17:40:56 +03:00
  • 564bc99a7b fix toc Merve Noyan 2023-08-01 14:13:28 +03:00
  • 470dcdfe7b Separated querying section and emphasized self generating docs Merve Noyan 2023-08-01 14:10:45 +03:00
  • 21ca70e0eb Added supported models and hardware Merve Noyan 2023-08-01 14:02:14 +03:00
  • 2675d934e5 Update local_launch.md Merve Noyan 2023-08-01 12:44:25 +03:00
  • 7766fee9b1 fix typo for dynamic rotary (#745) compat_logger Florian Zimmermeister 2023-07-31 18:58:46 +02:00
  • d3d8f1bd6b Typo fix. (#746) Nicolas Patry 2023-07-31 18:57:29 +02:00
  • 15fc64668f fix(server): Failing quantize config after local read. (#743) Nicolas Patry 2023-07-31 17:51:26 +02:00
  • c86dcbeeb1 Update build_pr_documentation.yml Merve Noyan 2023-07-31 18:16:29 +03:00
  • d65bbb333d Update build_pr_documentation.yml Merve Noyan 2023-07-31 18:13:32 +03:00
  • b2268272ad Added installation and launch notes and re-structured toc Merve Noyan 2023-07-31 17:35:36 +03:00
  • 2a13f1a046 chore: fix typo in mpt_modeling.py (#737) Ikko Eltociear Ashimine 2023-07-31 22:43:44 +09:00
  • 932bdd93ff Adding Rope scaling. (#741) Nicolas Patry 2023-07-31 15:38:47 +02:00
  • 41bd0e4af1 Added index.md and other initial files Merve Noyan 2023-07-31 15:56:29 +03:00
  • b9633c46d0 Fix typing in Model.generate_token (#733) Jae-Won Chung 2023-07-31 08:35:14 -04:00
  • dc631b5be5 Setup for doc-builder and added TOC Merve Noyan 2023-07-31 14:18:20 +03:00
  • 92bb56b0c1 Local gptq support. (#738) Nicolas Patry 2023-07-31 10:32:52 +02:00
  • 66cea49d57 Cargo fmt dev Nicolas Patry 2023-07-31 09:57:18 +02:00
  • 4b3e24f843 feat(server): Add bitsandbytes 4bit quantization (#626) krzim 2023-07-21 03:53:05 -04:00
  • 3ef5ffbc64 v1.0.0 (#727) v1.0.0 OlivierDehaene 2023-07-28 17:43:46 +02:00
  • bde25e62b3 chore: update license to HFOIL (#725) OlivierDehaene 2023-07-28 15:59:46 +02:00
  • afd04dc71e feat(server): update vllm version (#723) OlivierDehaene 2023-07-28 15:36:38 +02:00
  • f848decee6 docs: Add hardware section to TOC in README (#721) regisss 2023-07-28 11:20:03 +02:00
  • 5a1cccbb98 Add section about TGI on other AI hardware accelerators in README (#715) regisss 2023-07-28 09:14:03 +02:00
  • 9f18f4c006 v0.9.4 (#713) v0.9.4 OlivierDehaene 2023-07-27 19:25:15 +02:00
  • ab96b9aec3 feat(server): support new falcon config (#712) OlivierDehaene 2023-07-27 18:38:57 +02:00
  • 2efd46ef95 fix(server): fix missing datasets in quantize OlivierDehaene 2023-07-27 14:50:45 +02:00
  • 8bd0adb135 fix(server): fix quantization python requirements (#708) OlivierDehaene 2023-07-27 12:28:10 +02:00
  • e64a65891b docs(README): update readme OlivierDehaene 2023-07-25 19:45:25 +02:00
  • a0d55358d2 feat(server): Using quantize_config.json instead of GPTQ_BITS env variables. (#671) Nicolas Patry 2023-07-25 12:00:27 +01:00
  • 37df6df38e fix(server): fix exllama buffers (#689) OlivierDehaene 2023-07-24 14:25:43 +02:00
  • 73a4d65d26 feat: add cuda memory fraction (#659) OlivierDehaene 2023-07-24 11:43:58 +02:00
  • 1da642bd0e feat(server): add local prom and health routes if running w/ ngrok OlivierDehaene 2023-07-21 16:56:30 +02:00
  • 15b3e9ffb0 Directly load GPTBigCode to specified device (#618) Yang, Bo 2023-07-21 02:27:31 -07:00
  • d5b5bc750f feat(server): Add exllama GPTQ CUDA kernel support #553 (#666) Nicolas Patry 2023-07-21 10:59:00 +02:00
  • f555dabca8 Putting back header inclusion (seems unused but still) simpler_exllama Nicolas Patry 2023-07-20 15:46:51 +00:00
  • 5ca0508d02 Simpler exllama Nicolas Patry 2023-07-20 15:36:53 +00:00
  • bf94df3c71 fix(server): use mem_get_info to get kv cache size (#664) OlivierDehaene 2023-07-20 17:23:49 +02:00
  • 08b8eec1d7 fix(server): Fixing non parameters in quantize script bigcode/starcoder was an example. (#661) Nicolas Patry 2023-07-20 16:04:15 +02:00
  • 362883f259 fix(server): llama v2 GPTQ (#648) fxmarty 2023-07-20 15:02:54 +02:00
  • 214c06f510 Add trust_remote_code to quantize script (#647) cdawg 2023-07-20 13:53:08 +02:00
  • 6bf7090ecd fix per-column quantization Felix Marty 2023-07-19 17:55:41 +00:00
  • edfbfdfb3f Merge branch 'main' into gptq-cuda-kernels Félix Marty 2023-07-19 16:58:54 +02:00
  • 5a1512c025 docs: Update README.md (#643) Nicolas Patry 2023-07-19 13:39:12 +02:00
  • 1c81df15cd docs: Update README.md (#639) Nicolas Patry 2023-07-19 13:38:52 +02:00
  • b66b190403 feat(router): ngrok edge (#642) OlivierDehaene 2023-07-19 11:59:58 +02:00
  • fe80f5360c feat(server): auto max_batch_total_tokens for flash att models (#630) OlivierDehaene 2023-07-19 09:31:25 +02:00
  • 5e6ddfd6a4 fix(server): fix llamav2 config (#635) v0.9.3 OlivierDehaene 2023-07-18 18:49:42 +02:00
  • cf83f9b66f v0.9.3 (#634) OlivierDehaene 2023-07-18 18:11:20 +02:00
  • 211b211ec0 feat(server): add support for llamav2 (#633) Nicolas Patry 2023-07-18 18:09:53 +02:00
  • 3b71c38558 feat(server): flash attention v2 (#624) OlivierDehaene 2023-07-18 16:21:18 +02:00
  • 4d38a1c4ad feat(server): Reworking the quantization script so it's still universal (not llama specific) (#587) Nicolas Patry 2023-07-18 12:19:05 +02:00
  • 44acf72a73 fea(launcher): debug logs (#623) OlivierDehaene 2023-07-17 19:03:07 +02:00
  • bc2873246c fix(launcher): Rename b-float16 to bfloat16 in the launcher arg (#621) Nicolas Patry 2023-07-17 18:38:16 +02:00
  • a2cf1bdb2f fix(server): empty_cache when stopped OlivierDehaene 2023-07-15 13:57:31 +02:00
  • c58a0c185b v0.9.2 (#616) v0.9.2 OlivierDehaene 2023-07-14 16:31:48 +02:00
  • 5b9de4a1d3 fix(server): blacklist local files (#609) OlivierDehaene 2023-07-13 21:54:55 +02:00
  • c8b077be79 docs: README: Add logo + baseline (#611) Victor Muštar 2023-07-13 21:45:20 +02:00
  • 982ce3227b feat(router): explicit warning if revision is not set (#608) OlivierDehaene 2023-07-13 18:59:38 +02:00
  • 74e6d6e54e fix the usual merge mess Felix Marty 2023-07-13 15:48:55 +00:00
  • 9401e10210 Merge branch 'main' into gptq-cuda-kernels Félix Marty 2023-07-13 17:45:52 +02:00
  • 0036084294 support all, test llama Felix Marty 2023-07-13 15:41:57 +00:00
  • b7327205a6 feat(launcher): add arg validation and drop subprocess (#595) OlivierDehaene 2023-07-13 14:22:37 +02:00
  • 2ae65b45a8 fix tests Felix Marty 2023-07-13 10:38:08 +00:00
  • 38c2be5926 fix test Felix Marty 2023-07-12 18:31:49 +00:00
  • 3628559516 GPTQ Env vars: catch correct type of error (#596) ssmi153 2023-07-13 01:57:46 +08:00