Commit Graph

  • b940f4ce64 Update streaming.md fix_stream Omar Sanseviero 2023-08-22 23:41:47 +02:00
  • cf0453182e Restructure custom_model_docs Merve Noyan 2023-08-22 23:45:56 +03:00
  • bdf36659a6 Update non_core_models.md Merve Noyan 2023-08-22 23:44:53 +03:00
  • 98a5e6f26a Update docs/source/basic_tutorials/non_core_models.md Merve Noyan 2023-08-22 23:44:18 +03:00
  • abde90c493 Update docs/source/basic_tutorials/non_core_models.md Merve Noyan 2023-08-22 23:44:13 +03:00
  • ef5a99ffc9 Update consuming_tgi.md add_gradio Merve Noyan 2023-08-22 23:38:00 +03:00
  • b4b52c6f32 Update _toctree.yml Merve Noyan 2023-08-22 23:35:48 +03:00
  • b33a66148c Update and rename custom_models.md to non_core_models.md Merve Noyan 2023-08-22 23:35:16 +03:00
  • 5b995926f8 Update docs/source/basic_tutorials/consuming_tgi.md Merve Noyan 2023-08-22 23:32:40 +03:00
  • 7098f37ddd Update docs/source/basic_tutorials/consuming_tgi.md Merve Noyan 2023-08-22 23:30:22 +03:00
  • 7dcd953969 Initial commit Merve Noyan 2023-08-22 23:26:08 +03:00
  • ee19513cf7 Initial commit safetensors_docs Merve Noyan 2023-08-22 23:02:17 +03:00
  • 98afdbbc1d Add to toctree paged-attention-docs Merve Noyan 2023-08-22 22:04:38 +03:00
  • 3a522aa6e1 Explained HBM & SRAM flash-attention-docs Merve Noyan 2023-08-22 21:49:23 +03:00
  • 7037d0259f Update flash_attention.md Merve Noyan 2023-08-22 21:43:52 +03:00
  • 3fe2836a54 paged attention initial commit Merve Noyan 2023-08-22 21:18:50 +03:00
  • 172d262adf Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-22 17:53:26 +03:00
  • 7ee207f75c Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-22 17:53:20 +03:00
  • abc4fda615 Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-22 17:53:14 +03:00
  • 27baaeffe0 Update tensor_parallelism.md tp-docs Merve Noyan 2023-08-22 14:50:16 +03:00
  • 095b6d9178 Added to toctree Merve Noyan 2023-08-22 14:20:29 +03:00
  • 048d44cfcd Added paper Merve Noyan 2023-08-22 14:19:17 +03:00
  • df8330194f Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-22 13:57:33 +03:00
  • 2035f3b7bc Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-21 11:31:30 +03:00
  • 181dcb6219 Nits Merve Noyan 2023-08-21 11:14:05 +03:00
  • 4bac76241d Update server.rs self-generating-docs Merve Noyan 2023-08-21 11:10:57 +03:00
  • 00cd4d0b2f Update server.rs Merve Noyan 2023-08-21 11:09:45 +03:00
  • 6541d8d8d9 Update server.rs Merve Noyan 2023-08-21 11:08:38 +03:00
  • 09fee2f6fb fix Merve Noyan 2023-08-21 10:43:15 +03:00
  • bb8c24f5b7 Update docs/source/conceptual/tensor_parallelism.md Merve Noyan 2023-08-21 10:39:14 +03:00
  • 08bf10ca17 initial commit Merve Noyan 2023-08-21 00:23:10 +03:00
  • 2fa5e31839 Update server.rs Merve Noyan 2023-08-20 23:13:29 +03:00
  • 11d5c603ee Add to toctree Merve Noyan 2023-08-20 23:02:06 +03:00
  • 4446d0f838 Create tensor_parallelism.md Merve Noyan 2023-08-20 01:42:56 +03:00
  • 2416cc66cf Remove redundant content Omar Sanseviero 2023-08-19 18:17:35 +02:00
  • c4422e5678 Adding small benchmark script. (#881) main Nicolas Patry 2023-08-18 19:28:56 +02:00
  • 8f1d266e69 Update consuming_tgi.md Merve Noyan 2023-08-18 17:14:52 +03:00
  • 3a2a13ecd5 Added diff and dark/light mode for demo Merve Noyan 2023-08-18 17:13:31 +03:00
  • 117425564e Added space and replaced screenshots with llama Merve Noyan 2023-08-18 16:13:20 +03:00
  • e49ecbf4e5 Update docs/source/basic_tutorials/consuming_tgi.md Merve Noyan 2023-08-18 16:03:23 +03:00
  • 16a4c8f8b4 Cargo fmt. feat/return-top-tokens Nicolas Patry 2023-08-18 14:35:58 +02:00
  • bfa070611d Add streaming guide (#858) Omar Sanseviero 2023-08-18 13:27:08 +02:00
  • 08593dc180 Addressed Omar's comments Merve Noyan 2023-08-18 14:05:33 +03:00
  • cf43528538 remove stream since its a separate PR improve-docs philschmid 2023-08-18 12:57:36 +02:00
  • 7b349f9b13 Update docs/source/basic_tutorials/request_parameters.md Philipp Schmid 2023-08-18 12:54:57 +02:00
  • 1a67050c5c Removing dead code + "Fix" test. Nicolas Patry 2023-08-18 12:41:10 +02:00
  • 2c6e07395f remove image philschmid 2023-08-18 09:27:48 +02:00
  • 52cacff4a4 fix wording philschmid 2023-08-18 09:27:13 +02:00
  • eccb8a0099 fix library philschmid 2023-08-18 09:16:26 +02:00
  • 69c3d79a1c add docs philschmid 2023-08-18 09:13:39 +02:00
  • 5b9488f713 Fix after rebase. Nicolas Patry 2023-08-17 15:18:14 +00:00
  • 6606b481b6 Fix typo in batch concatination Vincent Brouwers 2023-08-09 08:39:18 +00:00
  • d89b99ef70 Only return top_tokens field when requested Vincent Brouwers 2023-08-02 13:03:19 +00:00
  • be9585a5e4 Add max_top_n_tokens CLI argument Vincent Brouwers 2023-08-02 12:42:59 +00:00
  • 338713d70f Defer building top-token objects to Rust Vincent Brouwers 2023-08-01 15:02:30 +00:00
  • 4ec593e23c Skip top-n tokens in prefill Vincent Brouwers 2023-08-01 13:55:38 +00:00
  • 010c7e78fa Allocate top_n_token tensor in Batch Vincent Brouwers 2023-07-31 13:09:45 +00:00
  • e6235fd67d Return more top-n-tokens when probabilities are equal Vincent Brouwers 2023-07-28 14:21:11 +00:00
  • 8c60077a55 Implement top-n-tokens for all models Vincent Brouwers 2023-07-26 15:12:57 +00:00
  • 16eb3891f9 Share computation for top-n-token decoding Vincent Brouwers 2023-07-25 14:55:32 +00:00
  • 66722c088a Add batched top-n-tokens to FlashCausalLM Vincent Brouwers 2023-07-25 14:17:25 +00:00
  • e8d66797e0 Add top-n-tokens support to benchmark Vincent Brouwers 2023-07-24 14:02:56 +00:00
  • 1350c3b589 Add WIP support for returning top tokens Vincent Brouwers 2023-07-14 19:48:15 +00:00
  • 452f8f3c2b Update consuming_tgi.md Merve Noyan 2023-08-17 16:31:11 +03:00
  • bce5e22444 Adding Idefics multi modal model. (#842) Nicolas Patry 2023-08-17 14:38:49 +02:00
  • b9e33c4953 Upgrading versions of python client. (#862) Nicolas Patry 2023-08-17 09:15:35 +02:00
  • 2e68ac01c0 "Fix" for rw-1b. (#860) Nicolas Patry 2023-08-17 09:05:41 +02:00
  • d9bceb8e6b Misc improvements for InferenceClient docs (#852) Omar Sanseviero 2023-08-16 14:29:54 +02:00
  • 2774b0ab44 Fixing watermark. (#851) Nicolas Patry 2023-08-16 07:17:26 +02:00
  • 737d5781e4 Update README.md (#848) Adarsh Shirawalmath 2023-08-15 22:43:52 +05:30
  • 05dd14fdb9 Fix tokenizers==0.13.4 . (#838) Nicolas Patry 2023-08-14 19:26:19 +02:00
  • d8f1337e7e README edit -- running the service with no GPU or CUDA support (#773) Pasquale Minervini 2023-08-14 15:41:13 +02:00
  • a072660bf5 fix: LlamaTokenizerFast to AutoTokenizer at flash_llama.py (#619) Dong Shin 2023-08-14 21:20:18 +09:00
  • b5087c4f4e Fix rope dynamic + factor (#822) Nicolas Patry 2023-08-14 14:09:51 +02:00
  • 3ffcd9d311 Added two more features in readme.md file (#831) sawan Rawat 2023-08-14 17:39:20 +05:30
  • d71237fc8b Have snippets in Python/JavaScript in quicktour (#809) Omar Sanseviero 2023-08-14 13:47:32 +02:00
  • 09eca64227 Version 1.0.1 (#836) v1.0.1 Nicolas Patry 2023-08-14 11:23:11 +02:00
  • 89a4e723d2 Attempting to fix torch leak. fix_leak Nicolas Patry 2023-08-12 09:06:49 +02:00
  • a2a913eec5 Added streaming for InferenceClient (#821) Merve Noyan 2023-08-11 18:05:19 +03:00
  • cc7bb5084d Upgrade transformers (fix protobuf==3.20 issue) (#795) Nicolas Patry 2023-08-11 16:46:08 +02:00
  • d0e30771c2 Added ChatUI Screenshot to Docs (#823) Merve Noyan 2023-08-11 17:42:43 +03:00
  • 4a9615e8ff Add to ToC streaming_conceptual osanseviero 2023-08-11 15:05:10 +02:00
  • 6daee77c09 Add embedded space osanseviero 2023-08-11 15:03:56 +02:00
  • 5df4c7c0d7 [docs] Build docs only when doc files change (#812) Mishig 2023-08-11 07:07:53 +02:00
  • e58ad6dd66 Added CLI docs (#799) Merve Noyan 2023-08-10 16:00:30 +03:00
  • 7dbaef3f5b Minor docs style fixes (#806) Omar Sanseviero 2023-08-10 14:32:51 +02:00
  • 04f7c2d86b Fix gated docs (#805) Omar Sanseviero 2023-08-10 14:32:07 +02:00
  • 8bdb16ee9a Use destructuring in router arguments to avoid '.0' (#798) ivarflakstad 2023-08-10 10:52:50 +02:00
  • 43ed6c217a Dummy commit test_docs osanseviero 2023-08-10 10:33:52 +02:00
  • 647ae7a7d3 Setup for doc-builder and docs for TGI (#740) Merve Noyan 2023-08-10 11:24:52 +03:00
  • 0e8b47811e Llama change. (#793) Nicolas Patry 2023-08-08 13:43:40 +02:00
  • c4dac9f3dc Update __init__.py (#794) Nicolas Patry 2023-08-08 12:09:51 +02:00
  • 4ddb6681ac Add workflow to upload documentation osanseviero-patch-1 Omar Sanseviero 2023-08-08 07:49:45 +02:00
  • 1fdc88ee90 Fixing non 4bits quantization. (#785) Nicolas Patry 2023-08-07 13:02:00 +02:00
  • 891e19cc51 Fix dynamic rope. (#783) Nicolas Patry 2023-08-07 12:28:19 +02:00
  • 16fadcec57 Merge BNB 4bit. (#770) Nicolas Patry 2023-08-03 23:00:59 +02:00
  • f91e9d282d fix build tokenizer in quantize and remove duplicate import (#768) zspo 2023-08-04 04:21:33 +08:00
  • 6ec5288ab7 This should prevent the PyTorch overriding. (#767) Nicolas Patry 2023-08-03 21:54:39 +02:00
  • ac736fd89c feat(server): Add native support for PEFT Lora models (#762) Nicolas Patry 2023-08-03 17:22:45 +02:00
  • 8b0d608f1f Automatically map deduplicated safetensors weights to their original values (#501) (#761) Nicolas Patry 2023-08-02 20:24:37 +02:00