llm-text-generation-inference

alihan/llm-text-generation-inference

Fork 0

mirror of https://github.com/huggingface/text-generation-inference.git synced 2023-08-15 01:09:35 +03:00

Commit Graph

Select branches

Hide Pull Requests

add_gptq_docs

add_integration_test

bnb4

compat_logger

deploy/aml

dev

enable_non_divisible_embeddings

feat/better_tokens

feat/improve_max_tokens

feat/parse_logs

feat/support_deepspeed

fix_leak

idefics

main

megatron

model_compat_log

osanseviero-patch-1

quantization

remove_post_load_weights

simpler_exllama

streaming_conceptual

test_docs

#1

#100

#101

#102

#103

#106

#107

#108

#109

#11

#110

#114

#115

#116

#117

#118

#119

#122

#123

#126

#128

#129

#13

#130

#132

#133

#134

#135

#136

#137

#138

#138

#139

#14

#140

#141

#142

#143

#144

#145

#147

#148

#149

#15

#150

#151

#152

#153

#154

#155

#159

#16

#160

#161

#162

#163

#164

#167

#168

#17

#170

#173

#174

#175

#178

#179

#18

#180

#181

#183

#184

#185

#186

#187

#19

#190

#191

#193

#194

#194

#196

#2

#20

#200

#201

#202

#203

#205

#207

#208

#210

#210

#212

#213

#214

#215

#216

#217

#218

#219

#22

#220

#221

#222

#226

#227

#228

#23

#233

#234

#235

#237

#24

#242

#244

#246

#248

#25

#250

#251

#252

#255

#257

#258

#259

#26

#261

#262

#264

#266

#267

#269

#27

#272

#272

#274

#275

#276

#277

#278

#28

#282

#284

#285

#286

#287

#29

#292

#294

#297

#298

#299

#30

#302

#303

#304

#305

#308

#31

#310

#313

#313

#317

#318

#32

#325

#327

#328

#329

#33

#334

#335

#336

#34

#340

#340

#341

#341

#343

#344

#348

#35

#351

#352

#353

#356

#357

#358

#359

#36

#360

#362

#363

#364

#367

#368

#37

#370

#373

#379

#384

#385

#388

#39

#393

#394

#395

#396

#4

#40

#400

#404

#406

#407

#407

#41

#411

#412

#42

#434

#438

#44

#441

#443

#45

#453

#46

#462

#465

#47

#470

#470

#472

#475

#477

#477

#48

#480

#480

#483

#485

#488

#49

#498

#5

#50

#501

#502

#502

#51

#513

#514

#516

#519

#52

#520

#521

#522

#525

#529

#53

#534

#54

#543

#544

#545

#55

#550

#553

#557

#558

#56

#561

#562

#567

#57

#575

#578

#578

#579

#58

#580

#581

#582

#583

#585

#586

#587

#588

#59

#590

#595

#596

#6

#60

#600

#605

#605

#608

#609

#61

#611

#616

#617

#617

#618

#619

#62

#621

#623

#624

#626

#63

#630

#633

#634

#635

#639

#64

#642

#643

#647

#648

#659

#66

#661

#664

#665

#666

#67

#670

#671

#678

#68

#684

#689

#698

#698

#7

#70

#704

#708

#71

#712

#713

#715

#719

#72

#721

#723

#725

#727

#73

#733

#737

#738

#740

#741

#743

#745

#746

#748

#748

#75

#750

#76

#761

#762

#767

#768

#770

#773

#783

#785

#789

#791

#793

#794

#795

#797

#798

#799

#8

#803

#805

#806

#809

#810

#810

#812

#82

#820

#820

#821

#822

#823

#829

#829

#831

#836

#838

#84

#842

#842

#85

#86

#87

#88

#89

#9

#90

#91

#93

#94

#95

#96

#97

v0.2.0

v0.2.1

v0.3.0

v0.3.1

v0.3.2

v0.4.0

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.6.0

v0.7.0

v0.8.0

v0.8.1

v0.8.2

v0.9.0

v0.9.1

v0.9.2

v0.9.3

v0.9.4

v1.0.0

v1.0.1

05dd14fdb9 Fix tokenizers==0.13.4 . (#838) main Nicolas Patry 2023-08-14 19:26:19 +02:00
5fa4676221 Tp ready. idefics Nicolas Patry 2023-08-14 17:05:21 +00:00
0e992ff615 Adding Idefics multi modal model. Nicolas Patry 2023-08-14 16:05:47 +00:00
d8f1337e7e README edit -- running the service with no GPU or CUDA support (#773) Pasquale Minervini 2023-08-14 15:41:13 +02:00
a072660bf5 fix: LlamaTokenizerFast to AutoTokenizer at flash_llama.py (#619) Dong Shin 2023-08-14 21:20:18 +09:00
b5087c4f4e Fix rope dynamic + factor (#822) Nicolas Patry 2023-08-14 14:09:51 +02:00
3ffcd9d311 Added two more features in readme.md file (#831) sawan Rawat 2023-08-14 17:39:20 +05:30
d71237fc8b Have snippets in Python/JavaScript in quicktour (#809) Omar Sanseviero 2023-08-14 13:47:32 +02:00
09eca64227 Version 1.0.1 (#836) v1.0.1 Nicolas Patry 2023-08-14 11:23:11 +02:00
89a4e723d2 Attempting to fix torch leak. fix_leak Nicolas Patry 2023-08-12 09:06:49 +02:00
a2a913eec5 Added streaming for InferenceClient (#821) Merve Noyan 2023-08-11 18:05:19 +03:00
cc7bb5084d Upgrade transformers (fix protobuf==3.20 issue) (#795) Nicolas Patry 2023-08-11 16:46:08 +02:00
d0e30771c2 Added ChatUI Screenshot to Docs (#823) Merve Noyan 2023-08-11 17:42:43 +03:00
4a9615e8ff Add to ToC streaming_conceptual osanseviero 2023-08-11 15:05:10 +02:00
6daee77c09 Add embedded space osanseviero 2023-08-11 15:03:56 +02:00
5df4c7c0d7 [docs] Build docs only when doc files change (#812) Mishig 2023-08-11 07:07:53 +02:00
e58ad6dd66 Added CLI docs (#799) Merve Noyan 2023-08-10 16:00:30 +03:00
7dbaef3f5b Minor docs style fixes (#806) Omar Sanseviero 2023-08-10 14:32:51 +02:00
04f7c2d86b Fix gated docs (#805) Omar Sanseviero 2023-08-10 14:32:07 +02:00
8bdb16ee9a Use destructuring in router arguments to avoid '.0' (#798) ivarflakstad 2023-08-10 10:52:50 +02:00
43ed6c217a Dummy commit test_docs osanseviero 2023-08-10 10:33:52 +02:00
647ae7a7d3 Setup for doc-builder and docs for TGI (#740) Merve Noyan 2023-08-10 11:24:52 +03:00
0e8b47811e Llama change. (#793) Nicolas Patry 2023-08-08 13:43:40 +02:00
c4dac9f3dc Update __init__.py (#794) Nicolas Patry 2023-08-08 12:09:51 +02:00
4ddb6681ac Add workflow to upload documentation osanseviero-patch-1 Omar Sanseviero 2023-08-08 07:49:45 +02:00
1fdc88ee90 Fixing non 4bits quantization. (#785) Nicolas Patry 2023-08-07 13:02:00 +02:00
891e19cc51 Fix dynamic rope. (#783) Nicolas Patry 2023-08-07 12:28:19 +02:00
16fadcec57 Merge BNB 4bit. (#770) Nicolas Patry 2023-08-03 23:00:59 +02:00
f91e9d282d fix build tokenizer in quantize and remove duplicate import (#768) zspo 2023-08-04 04:21:33 +08:00
6ec5288ab7 This should prevent the PyTorch overriding. (#767) Nicolas Patry 2023-08-03 21:54:39 +02:00
ac736fd89c feat(server): Add native support for PEFT Lora models (#762) Nicolas Patry 2023-08-03 17:22:45 +02:00
8b0d608f1f Automatically map deduplicated safetensors weights to their original values (#501) (#761) Nicolas Patry 2023-08-02 20:24:37 +02:00
bd3088748e add FastLinear import (#750) zspo 2023-08-03 02:04:46 +08:00
e994ad1172 Added InferenceClient model_compat_log Merve Noyan 2023-08-02 17:57:01 +03:00
bb83f333b7 Added consuming TGI with ChatUI Merve Noyan 2023-08-02 17:40:56 +03:00
564bc99a7b fix toc Merve Noyan 2023-08-01 14:13:28 +03:00
470dcdfe7b Separated querying section and emphasized self generating docs Merve Noyan 2023-08-01 14:10:45 +03:00
21ca70e0eb Added supported models and hardware Merve Noyan 2023-08-01 14:02:14 +03:00
2675d934e5 Update local_launch.md Merve Noyan 2023-08-01 12:44:25 +03:00
7766fee9b1 fix typo for dynamic rotary (#745) compat_logger Florian Zimmermeister 2023-07-31 18:58:46 +02:00
d3d8f1bd6b Typo fix. (#746) Nicolas Patry 2023-07-31 18:57:29 +02:00
15fc64668f fix(server): Failing quantize config after local read. (#743) Nicolas Patry 2023-07-31 17:51:26 +02:00
c86dcbeeb1 Update build_pr_documentation.yml Merve Noyan 2023-07-31 18:16:29 +03:00
d65bbb333d Update build_pr_documentation.yml Merve Noyan 2023-07-31 18:13:32 +03:00
b2268272ad Added installation and launch notes and re-structured toc Merve Noyan 2023-07-31 17:35:36 +03:00
2a13f1a046 chore: fix typo in mpt_modeling.py (#737) Ikko Eltociear Ashimine 2023-07-31 22:43:44 +09:00
932bdd93ff Adding Rope scaling. (#741) Nicolas Patry 2023-07-31 15:38:47 +02:00
41bd0e4af1 Added index.md and other initial files Merve Noyan 2023-07-31 15:56:29 +03:00
b9633c46d0 Fix typing in Model.generate_token (#733) Jae-Won Chung 2023-07-31 08:35:14 -04:00
dc631b5be5 Setup for doc-builder and added TOC Merve Noyan 2023-07-31 14:18:20 +03:00
92bb56b0c1 Local gptq support. (#738) Nicolas Patry 2023-07-31 10:32:52 +02:00
66cea49d57 Cargo fmt dev Nicolas Patry 2023-07-31 09:57:18 +02:00
4b3e24f843 feat(server): Add bitsandbytes 4bit quantization (#626) krzim 2023-07-21 03:53:05 -04:00
3ef5ffbc64 v1.0.0 (#727) v1.0.0 OlivierDehaene 2023-07-28 17:43:46 +02:00
bde25e62b3 chore: update license to HFOIL (#725) OlivierDehaene 2023-07-28 15:59:46 +02:00
afd04dc71e feat(server): update vllm version (#723) OlivierDehaene 2023-07-28 15:36:38 +02:00
f848decee6 docs: Add hardware section to TOC in README (#721) regisss 2023-07-28 11:20:03 +02:00
5a1cccbb98 Add section about TGI on other AI hardware accelerators in README (#715) regisss 2023-07-28 09:14:03 +02:00
9f18f4c006 v0.9.4 (#713) v0.9.4 OlivierDehaene 2023-07-27 19:25:15 +02:00
ab96b9aec3 feat(server): support new falcon config (#712) OlivierDehaene 2023-07-27 18:38:57 +02:00
2efd46ef95 fix(server): fix missing datasets in quantize OlivierDehaene 2023-07-27 14:50:45 +02:00
8bd0adb135 fix(server): fix quantization python requirements (#708) OlivierDehaene 2023-07-27 12:28:10 +02:00
e64a65891b docs(README): update readme OlivierDehaene 2023-07-25 19:45:25 +02:00
a0d55358d2 feat(server): Using quantize_config.json instead of GPTQ_BITS env variables. (#671) Nicolas Patry 2023-07-25 12:00:27 +01:00
37df6df38e fix(server): fix exllama buffers (#689) OlivierDehaene 2023-07-24 14:25:43 +02:00
73a4d65d26 feat: add cuda memory fraction (#659) OlivierDehaene 2023-07-24 11:43:58 +02:00
1da642bd0e feat(server): add local prom and health routes if running w/ ngrok OlivierDehaene 2023-07-21 16:56:30 +02:00
15b3e9ffb0 Directly load GPTBigCode to specified device (#618) Yang, Bo 2023-07-21 02:27:31 -07:00
d5b5bc750f feat(server): Add exllama GPTQ CUDA kernel support #553 (#666) Nicolas Patry 2023-07-21 10:59:00 +02:00
f555dabca8 Putting back header inclusion (seems unused but still) simpler_exllama Nicolas Patry 2023-07-20 15:46:51 +00:00
5ca0508d02 Simpler exllama Nicolas Patry 2023-07-20 15:36:53 +00:00
bf94df3c71 fix(server): use mem_get_info to get kv cache size (#664) OlivierDehaene 2023-07-20 17:23:49 +02:00
08b8eec1d7 fix(server): Fixing non parameters in quantize script bigcode/starcoder was an example. (#661) Nicolas Patry 2023-07-20 16:04:15 +02:00
362883f259 fix(server): llama v2 GPTQ (#648) fxmarty 2023-07-20 15:02:54 +02:00
214c06f510 Add trust_remote_code to quantize script (#647) cdawg 2023-07-20 13:53:08 +02:00
6bf7090ecd fix per-column quantization Felix Marty 2023-07-19 17:55:41 +00:00
edfbfdfb3f Merge branch 'main' into gptq-cuda-kernels Félix Marty 2023-07-19 16:58:54 +02:00
5a1512c025 docs: Update README.md (#643) Nicolas Patry 2023-07-19 13:39:12 +02:00
1c81df15cd docs: Update README.md (#639) Nicolas Patry 2023-07-19 13:38:52 +02:00
b66b190403 feat(router): ngrok edge (#642) OlivierDehaene 2023-07-19 11:59:58 +02:00
fe80f5360c feat(server): auto max_batch_total_tokens for flash att models (#630) OlivierDehaene 2023-07-19 09:31:25 +02:00
5e6ddfd6a4 fix(server): fix llamav2 config (#635) v0.9.3 OlivierDehaene 2023-07-18 18:49:42 +02:00
cf83f9b66f v0.9.3 (#634) OlivierDehaene 2023-07-18 18:11:20 +02:00
211b211ec0 feat(server): add support for llamav2 (#633) Nicolas Patry 2023-07-18 18:09:53 +02:00
3b71c38558 feat(server): flash attention v2 (#624) OlivierDehaene 2023-07-18 16:21:18 +02:00
4d38a1c4ad feat(server): Reworking the quantization script so it's still universal (not llama specific) (#587) Nicolas Patry 2023-07-18 12:19:05 +02:00
44acf72a73 fea(launcher): debug logs (#623) OlivierDehaene 2023-07-17 19:03:07 +02:00
bc2873246c fix(launcher): Rename b-float16 to bfloat16 in the launcher arg (#621) Nicolas Patry 2023-07-17 18:38:16 +02:00
a2cf1bdb2f fix(server): empty_cache when stopped OlivierDehaene 2023-07-15 13:57:31 +02:00
c58a0c185b v0.9.2 (#616) v0.9.2 OlivierDehaene 2023-07-14 16:31:48 +02:00
5b9de4a1d3 fix(server): blacklist local files (#609) OlivierDehaene 2023-07-13 21:54:55 +02:00
c8b077be79 docs: README: Add logo + baseline (#611) Victor Muštar 2023-07-13 21:45:20 +02:00
982ce3227b feat(router): explicit warning if revision is not set (#608) OlivierDehaene 2023-07-13 18:59:38 +02:00
74e6d6e54e fix the usual merge mess Felix Marty 2023-07-13 15:48:55 +00:00
9401e10210 Merge branch 'main' into gptq-cuda-kernels Félix Marty 2023-07-13 17:45:52 +02:00
0036084294 support all, test llama Felix Marty 2023-07-13 15:41:57 +00:00
b7327205a6 feat(launcher): add arg validation and drop subprocess (#595) OlivierDehaene 2023-07-13 14:22:37 +02:00
2ae65b45a8 fix tests Felix Marty 2023-07-13 10:38:08 +00:00
38c2be5926 fix test Felix Marty 2023-07-12 18:31:49 +00:00
3628559516 GPTQ Env vars: catch correct type of error (#596) ssmi153 2023-07-13 01:57:46 +08:00