text-generation-inference

alihan/text-generation-inference

Fork 0

mirror of https://github.com/huggingface/text-generation-inference.git synced 2023-08-23 10:47:54 +03:00

Commit Graph

Select branches

Hide Pull Requests

add_gptq_docs

add_gradio

add_integration_test

bnb4

compat_logger

custom_model_docs

deploy/aml

dev

enable_non_divisible_embeddings

feat/better_tokens

feat/improve_max_tokens

feat/parse_logs

feat/return-top-tokens

feat/support_deepspeed

fix_leak

fix_stream

flash-attention-docs

improve-docs

main

megatron

model_compat_log

osanseviero-patch-1

paged-attention-docs

quantization

remove_post_load_weights

safetensors_docs

self-generating-docs

simpler_exllama

streaming_conceptual

test_docs

tp-docs

#1

#100

#101

#102

#103

#106

#107

#108

#109

#11

#110

#114

#115

#116

#117

#118

#119

#122

#123

#126

#128

#129

#13

#130

#132

#133

#134

#135

#136

#137

#138

#138

#139

#14

#140

#141

#142

#143

#144

#145

#147

#148

#149

#15

#150

#151

#152

#153

#154

#155

#159

#16

#160

#161

#162

#163

#164

#167

#168

#17

#170

#173

#174

#175

#178

#179

#18

#180

#181

#183

#184

#185

#186

#187

#19

#190

#191

#193

#194

#194

#196

#2

#20

#200

#201

#202

#203

#205

#207

#208

#210

#210

#212

#213

#214

#215

#216

#217

#218

#219

#22

#220

#221

#222

#226

#227

#228

#23

#233

#234

#235

#237

#24

#242

#244

#246

#248

#25

#250

#251

#252

#255

#257

#258

#259

#26

#261

#262

#264

#266

#267

#269

#27

#272

#272

#274

#275

#276

#277

#278

#28

#282

#284

#285

#286

#287

#29

#292

#294

#297

#298

#299

#30

#302

#303

#304

#305

#308

#31

#310

#313

#313

#317

#318

#32

#325

#327

#328

#329

#33

#334

#335

#336

#34

#340

#340

#341

#341

#343

#344

#348

#35

#351

#352

#353

#356

#357

#358

#359

#36

#360

#362

#363

#364

#367

#368

#37

#370

#373

#379

#384

#385

#388

#39

#393

#394

#395

#396

#4

#40

#400

#404

#406

#407

#407

#41

#411

#412

#42

#434

#438

#44

#441

#443

#45

#453

#46

#462

#465

#47

#470

#470

#472

#475

#477

#477

#48

#480

#480

#483

#485

#488

#49

#498

#5

#50

#501

#502

#502

#51

#513

#514

#516

#519

#52

#520

#521

#522

#525

#529

#53

#534

#54

#543

#544

#545

#55

#550

#553

#557

#558

#56

#561

#562

#567

#57

#575

#578

#578

#579

#58

#580

#581

#582

#583

#585

#586

#587

#588

#59

#590

#595

#596

#6

#60

#600

#605

#605

#608

#609

#61

#611

#616

#617

#617

#618

#619

#62

#621

#623

#624

#626

#63

#630

#633

#634

#635

#639

#64

#642

#643

#647

#648

#659

#66

#661

#664

#665

#666

#67

#670

#671

#678

#68

#684

#689

#698

#698

#7

#70

#704

#708

#71

#712

#713

#715

#719

#72

#721

#723

#725

#727

#73

#733

#737

#738

#740

#741

#743

#745

#746

#748

#748

#75

#750

#76

#761

#762

#767

#768

#770

#773

#783

#785

#789

#791

#793

#794

#795

#797

#798

#799

#8

#803

#805

#806

#809

#810

#810

#812

#82

#820

#820

#821

#822

#823

#829

#829

#831

#836

#838

#84

#842

#848

#85

#851

#852

#853

#854

#858

#86

#860

#862

#867

#867

#868

#868

#87

#872

#872

#88

#881

#884

#884

#886

#886

#889

#889

#89

#892

#892

#893

#898

#898

#9

#90

#900

#900

#901

#901

#905

#905

#906

#906

#91

#93

#94

#95

#96

#97

v0.2.0

v0.2.1

v0.3.0

v0.3.1

v0.3.2

v0.4.0

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.6.0

v0.7.0

v0.8.0

v0.8.1

v0.8.2

v0.9.0

v0.9.1

v0.9.2

v0.9.3

v0.9.4

v1.0.0

v1.0.1

b940f4ce64 Update streaming.md fix_stream Omar Sanseviero 2023-08-22 23:41:47 +02:00
cf0453182e Restructure custom_model_docs Merve Noyan 2023-08-22 23:45:56 +03:00
bdf36659a6 Update non_core_models.md Merve Noyan 2023-08-22 23:44:53 +03:00
98a5e6f26a Update docs/source/basic_tutorials/non_core_models.md Merve Noyan 2023-08-22 23:44:18 +03:00
abde90c493 Update docs/source/basic_tutorials/non_core_models.md Merve Noyan 2023-08-22 23:44:13 +03:00
ef5a99ffc9 Update consuming_tgi.md add_gradio Merve Noyan 2023-08-22 23:38:00 +03:00
b4b52c6f32 Update _toctree.yml Merve Noyan 2023-08-22 23:35:48 +03:00
b33a66148c Update and rename custom_models.md to non_core_models.md Merve Noyan 2023-08-22 23:35:16 +03:00
5b995926f8 Update docs/source/basic_tutorials/consuming_tgi.md Merve Noyan 2023-08-22 23:32:40 +03:00
7098f37ddd Update docs/source/basic_tutorials/consuming_tgi.md Merve Noyan 2023-08-22 23:30:22 +03:00
7dcd953969 Initial commit Merve Noyan 2023-08-22 23:26:08 +03:00
ee19513cf7 Initial commit safetensors_docs Merve Noyan 2023-08-22 23:02:17 +03:00
98afdbbc1d Add to toctree paged-attention-docs Merve Noyan 2023-08-22 22:04:38 +03:00
3a522aa6e1 Explained HBM & SRAM flash-attention-docs Merve Noyan 2023-08-22 21:49:23 +03:00
7037d0259f Update flash_attention.md Merve Noyan 2023-08-22 21:43:52 +03:00
3fe2836a54 paged attention initial commit Merve Noyan 2023-08-22 21:18:50 +03:00
172d262adf Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-22 17:53:26 +03:00
7ee207f75c Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-22 17:53:20 +03:00
abc4fda615 Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-22 17:53:14 +03:00
27baaeffe0 Update tensor_parallelism.md tp-docs Merve Noyan 2023-08-22 14:50:16 +03:00
095b6d9178 Added to toctree Merve Noyan 2023-08-22 14:20:29 +03:00
048d44cfcd Added paper Merve Noyan 2023-08-22 14:19:17 +03:00
df8330194f Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-22 13:57:33 +03:00
2035f3b7bc Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-21 11:31:30 +03:00
181dcb6219 Nits Merve Noyan 2023-08-21 11:14:05 +03:00
4bac76241d Update server.rs self-generating-docs Merve Noyan 2023-08-21 11:10:57 +03:00
00cd4d0b2f Update server.rs Merve Noyan 2023-08-21 11:09:45 +03:00
6541d8d8d9 Update server.rs Merve Noyan 2023-08-21 11:08:38 +03:00
09fee2f6fb fix Merve Noyan 2023-08-21 10:43:15 +03:00
bb8c24f5b7 Update docs/source/conceptual/tensor_parallelism.md Merve Noyan 2023-08-21 10:39:14 +03:00
08bf10ca17 initial commit Merve Noyan 2023-08-21 00:23:10 +03:00
2fa5e31839 Update server.rs Merve Noyan 2023-08-20 23:13:29 +03:00
11d5c603ee Add to toctree Merve Noyan 2023-08-20 23:02:06 +03:00
4446d0f838 Create tensor_parallelism.md Merve Noyan 2023-08-20 01:42:56 +03:00
2416cc66cf Remove redundant content Omar Sanseviero 2023-08-19 18:17:35 +02:00
c4422e5678 Adding small benchmark script. (#881) main Nicolas Patry 2023-08-18 19:28:56 +02:00
8f1d266e69 Update consuming_tgi.md Merve Noyan 2023-08-18 17:14:52 +03:00
3a2a13ecd5 Added diff and dark/light mode for demo Merve Noyan 2023-08-18 17:13:31 +03:00
117425564e Added space and replaced screenshots with llama Merve Noyan 2023-08-18 16:13:20 +03:00
e49ecbf4e5 Update docs/source/basic_tutorials/consuming_tgi.md Merve Noyan 2023-08-18 16:03:23 +03:00
16a4c8f8b4 Cargo fmt. feat/return-top-tokens Nicolas Patry 2023-08-18 14:35:58 +02:00
bfa070611d Add streaming guide (#858) Omar Sanseviero 2023-08-18 13:27:08 +02:00
08593dc180 Addressed Omar's comments Merve Noyan 2023-08-18 14:05:33 +03:00
cf43528538 remove stream since its a separate PR improve-docs philschmid 2023-08-18 12:57:36 +02:00
7b349f9b13 Update docs/source/basic_tutorials/request_parameters.md Philipp Schmid 2023-08-18 12:54:57 +02:00
1a67050c5c Removing dead code + "Fix" test. Nicolas Patry 2023-08-18 12:41:10 +02:00
2c6e07395f remove image philschmid 2023-08-18 09:27:48 +02:00
52cacff4a4 fix wording philschmid 2023-08-18 09:27:13 +02:00
eccb8a0099 fix library philschmid 2023-08-18 09:16:26 +02:00
69c3d79a1c add docs philschmid 2023-08-18 09:13:39 +02:00
5b9488f713 Fix after rebase. Nicolas Patry 2023-08-17 15:18:14 +00:00
6606b481b6 Fix typo in batch concatination Vincent Brouwers 2023-08-09 08:39:18 +00:00
d89b99ef70 Only return top_tokens field when requested Vincent Brouwers 2023-08-02 13:03:19 +00:00
be9585a5e4 Add max_top_n_tokens CLI argument Vincent Brouwers 2023-08-02 12:42:59 +00:00
338713d70f Defer building top-token objects to Rust Vincent Brouwers 2023-08-01 15:02:30 +00:00
4ec593e23c Skip top-n tokens in prefill Vincent Brouwers 2023-08-01 13:55:38 +00:00
010c7e78fa Allocate top_n_token tensor in Batch Vincent Brouwers 2023-07-31 13:09:45 +00:00
e6235fd67d Return more top-n-tokens when probabilities are equal Vincent Brouwers 2023-07-28 14:21:11 +00:00
8c60077a55 Implement top-n-tokens for all models Vincent Brouwers 2023-07-26 15:12:57 +00:00
16eb3891f9 Share computation for top-n-token decoding Vincent Brouwers 2023-07-25 14:55:32 +00:00
66722c088a Add batched top-n-tokens to FlashCausalLM Vincent Brouwers 2023-07-25 14:17:25 +00:00
e8d66797e0 Add top-n-tokens support to benchmark Vincent Brouwers 2023-07-24 14:02:56 +00:00
1350c3b589 Add WIP support for returning top tokens Vincent Brouwers 2023-07-14 19:48:15 +00:00
452f8f3c2b Update consuming_tgi.md Merve Noyan 2023-08-17 16:31:11 +03:00
bce5e22444 Adding Idefics multi modal model. (#842) Nicolas Patry 2023-08-17 14:38:49 +02:00
b9e33c4953 Upgrading versions of python client. (#862) Nicolas Patry 2023-08-17 09:15:35 +02:00
2e68ac01c0 "Fix" for rw-1b. (#860) Nicolas Patry 2023-08-17 09:05:41 +02:00
d9bceb8e6b Misc improvements for InferenceClient docs (#852) Omar Sanseviero 2023-08-16 14:29:54 +02:00
2774b0ab44 Fixing watermark. (#851) Nicolas Patry 2023-08-16 07:17:26 +02:00
737d5781e4 Update README.md (#848) Adarsh Shirawalmath 2023-08-15 22:43:52 +05:30
05dd14fdb9 Fix tokenizers==0.13.4 . (#838) Nicolas Patry 2023-08-14 19:26:19 +02:00
d8f1337e7e README edit -- running the service with no GPU or CUDA support (#773) Pasquale Minervini 2023-08-14 15:41:13 +02:00
a072660bf5 fix: LlamaTokenizerFast to AutoTokenizer at flash_llama.py (#619) Dong Shin 2023-08-14 21:20:18 +09:00
b5087c4f4e Fix rope dynamic + factor (#822) Nicolas Patry 2023-08-14 14:09:51 +02:00
3ffcd9d311 Added two more features in readme.md file (#831) sawan Rawat 2023-08-14 17:39:20 +05:30
d71237fc8b Have snippets in Python/JavaScript in quicktour (#809) Omar Sanseviero 2023-08-14 13:47:32 +02:00
09eca64227 Version 1.0.1 (#836) v1.0.1 Nicolas Patry 2023-08-14 11:23:11 +02:00
89a4e723d2 Attempting to fix torch leak. fix_leak Nicolas Patry 2023-08-12 09:06:49 +02:00
a2a913eec5 Added streaming for InferenceClient (#821) Merve Noyan 2023-08-11 18:05:19 +03:00
cc7bb5084d Upgrade transformers (fix protobuf==3.20 issue) (#795) Nicolas Patry 2023-08-11 16:46:08 +02:00
d0e30771c2 Added ChatUI Screenshot to Docs (#823) Merve Noyan 2023-08-11 17:42:43 +03:00
4a9615e8ff Add to ToC streaming_conceptual osanseviero 2023-08-11 15:05:10 +02:00
6daee77c09 Add embedded space osanseviero 2023-08-11 15:03:56 +02:00
5df4c7c0d7 [docs] Build docs only when doc files change (#812) Mishig 2023-08-11 07:07:53 +02:00
e58ad6dd66 Added CLI docs (#799) Merve Noyan 2023-08-10 16:00:30 +03:00
7dbaef3f5b Minor docs style fixes (#806) Omar Sanseviero 2023-08-10 14:32:51 +02:00
04f7c2d86b Fix gated docs (#805) Omar Sanseviero 2023-08-10 14:32:07 +02:00
8bdb16ee9a Use destructuring in router arguments to avoid '.0' (#798) ivarflakstad 2023-08-10 10:52:50 +02:00
43ed6c217a Dummy commit test_docs osanseviero 2023-08-10 10:33:52 +02:00
647ae7a7d3 Setup for doc-builder and docs for TGI (#740) Merve Noyan 2023-08-10 11:24:52 +03:00
0e8b47811e Llama change. (#793) Nicolas Patry 2023-08-08 13:43:40 +02:00
c4dac9f3dc Update __init__.py (#794) Nicolas Patry 2023-08-08 12:09:51 +02:00
4ddb6681ac Add workflow to upload documentation osanseviero-patch-1 Omar Sanseviero 2023-08-08 07:49:45 +02:00
1fdc88ee90 Fixing non 4bits quantization. (#785) Nicolas Patry 2023-08-07 13:02:00 +02:00
891e19cc51 Fix dynamic rope. (#783) Nicolas Patry 2023-08-07 12:28:19 +02:00
16fadcec57 Merge BNB 4bit. (#770) Nicolas Patry 2023-08-03 23:00:59 +02:00
f91e9d282d fix build tokenizer in quantize and remove duplicate import (#768) zspo 2023-08-04 04:21:33 +08:00
6ec5288ab7 This should prevent the PyTorch overriding. (#767) Nicolas Patry 2023-08-03 21:54:39 +02:00
ac736fd89c feat(server): Add native support for PEFT Lora models (#762) Nicolas Patry 2023-08-03 17:22:45 +02:00
8b0d608f1f Automatically map deduplicated safetensors weights to their original values (#501) (#761) Nicolas Patry 2023-08-02 20:24:37 +02:00