90 Commits

Author SHA1 Message Date
Nicolas Patry
16fadcec57 Merge BNB 4bit. (#770)
# What does this PR do?


See #626 
<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->

---------

Co-authored-by: krzim <zimmerk4@live.com>
2023-08-03 23:00:59 +02:00
Nicolas Patry
ac736fd89c feat(server): Add native support for PEFT Lora models (#762)
- Will detect `peft` model by finding `adapter_config.json`.
- This triggers a totally dedicated `download-weights` path
- This path, loads the adapter config, finds the base model_id
- It loads the base_model
- Then peft_model
- Then `merge_and_unload()`
- Then `save_pretrained(.., safe_serialization=True)
- Add back the config + tokenizer.merge_and_unload()`
- Then `save_pretrained(.., safe_serialization=True)
- Add back the config + tokenizer.
- The chosen location is a **local folder with the name of the user
  chosen model id**

PROs:

- Easier than to expect user to merge manually
- Barely any change outside of `download-weights` command.
- This means everything will work in a single load.
- Should enable out of the box SM + HFE

CONs:

- Creates a local merged model in unusual location, potentially
  not saved across docker reloads, or ovewriting some files if the PEFT
  itself was local and containing other files in addition to the lora

Alternatives considered:
- Add `local_files_only=True` every where (discard because of massive
  code change for not a good enough reason)
- Return something to `launcher` about the new model-id (a cleaner
  location for this new model), but it would
  introduce new communication somewhere where we didn't need it before.
- Using the HF cache folder and *stopping* the flow after
  `download-weights` and asking user to restart with the actual local
  model location


Fix #482 


# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-08-03 17:22:45 +02:00
Nicolas Patry
932bdd93ff Adding Rope scaling. (#741)
# What does this PR do?


- Adds Rope NTK scaling.

Done because
https://github.com/huggingface/text-generation-inference/pull/529 was
closed
Took some code from
https://github.com/huggingface/transformers/pull/24653

- `--rope-scaling` and `--rope-factor` are added separately. I
considered having a single one and parsing something line ("linear:4.0"
, or "dynamic") but decided against
it because it would push more parsing+validation a bit everywhere (both
in the launcher and the server).


Fixes #512




<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-07-31 15:38:47 +02:00
OlivierDehaene
73a4d65d26 feat: add cuda memory fraction (#659)
Close #673
2023-07-24 11:43:58 +02:00
OlivierDehaene
b66b190403 feat(router): ngrok edge (#642) 2023-07-19 11:59:58 +02:00
OlivierDehaene
fe80f5360c feat(server): auto max_batch_total_tokens for flash att models (#630) 2023-07-19 09:31:25 +02:00
OlivierDehaene
44acf72a73 fea(launcher): debug logs (#623) 2023-07-17 19:03:07 +02:00
Nicolas Patry
bc2873246c fix(launcher): Rename b-float16 to bfloat16 in the launcher arg (#621) 2023-07-17 18:38:16 +02:00
OlivierDehaene
c58a0c185b v0.9.2 (#616) 2023-07-14 16:31:48 +02:00
OlivierDehaene
982ce3227b feat(router): explicit warning if revision is not set (#608) 2023-07-13 18:59:38 +02:00
OlivierDehaene
b7327205a6 feat(launcher): add arg validation and drop subprocess (#595) 2023-07-13 14:22:37 +02:00
OlivierDehaene
6f42942772 feat(router): add argument for hostname in router (#545) (#550)
# What does this PR do?

In title. Adds argument `--hostname` in router to support something like
`--hostname ::`. Tested with

```commandline
cargo run -- --port 8080 --hostname ::
curl -I -X GET 'http://[::1]:8080/health'  # failed before this commit
```

Trigger CI

---------

Co-authored-by: Phil Chen <philchen2000@gmail.com>
2023-07-05 18:28:45 +02:00
OlivierDehaene
e28a809004 v0.9.0 (#525) 2023-07-01 19:25:41 +02:00
OlivierDehaene
2b53d71991 fix(launcher): fix issue where launcher does not properly report shard failures (#522) 2023-06-30 23:09:20 +02:00
Nicolas Patry
ecf6dc3a5a feat: Add the option to force another dtype than f16. (#513) 2023-06-30 20:30:09 +02:00
OlivierDehaene
3b0c979efc feat(router): arg validation (#519) 2023-06-30 20:07:49 +02:00
OlivierDehaene
e74bd41e0f feat(server): add paged attention to flash models (#516)
Closes #478
2023-06-30 19:09:59 +02:00
OlivierDehaene
f59fb8b630 feat(router): add ngrok integration (#453) 2023-06-16 16:25:11 +02:00
A.J
d4eb60f48d docs(launcher): fix CUDA_VISIBLE_DEVICES helper comment (#441)
# What does this PR do?
It solves a typo in the comment sections referencing the environment
variable `CUDA_VISIBLE_DEVICES`. No misspelling references to this
variable have been found in code logic leading to undefined behaviour or
bugs. This PR is not expected to perform any code logic modification.
2023-06-12 13:59:22 +02:00
OlivierDehaene
83b84486ad feat(launcher): parse oom signal (#404) 2023-06-02 14:17:27 +02:00
OlivierDehaene
95d3546976 feat(server): load santacoder/starcoder models with safetensors (#393)
Fix #366
2023-06-01 12:10:35 +02:00
OlivierDehaene
49a6c8c1b2 fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES 2023-05-30 13:27:48 +02:00
OlivierDehaene
146e72c3be fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES 2023-05-30 12:52:18 +02:00
OlivierDehaene
e3e487dc71 feat(server): support trust_remote_code (#363) 2023-05-23 20:40:39 +02:00
OlivierDehaene
e71471bec9 feat: add snapshot testing (#282) 2023-05-15 23:36:30 +02:00
Nicolas Patry
76a48cd365 feat(server): GPTQ quantization (step1) (#277)
Changes only the type from `bool` to `Option<Enum>` pretty much
everywhere.
- Use `Optional[str]` in Python (easier to manage than importing type
everywhere). Except for the cli to get proper validation
- Updated all models to handle gracefully new values. (Error out if
unknown value, or gptq since not implemented).

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-05-12 14:46:41 +02:00
OlivierDehaene
e250282213 feat(docker): add benchmarking tool to docker image (#298) 2023-05-09 13:19:31 +02:00
Nicolas Patry
e68509add7 feat(launcher): Improve error message when download process fails. (#276) 2023-05-04 15:29:29 +02:00
OlivierDehaene
b67908e0cf fix(launcher): pass weights cache override to the download process (#274)
closes #273
2023-05-03 23:39:35 +02:00
OlivierDehaene
85aa7e2e7b feat(server): support hf endpoint weight layout (#266) 2023-05-03 11:36:24 +02:00
Nicolas Patry
411b0d4e1f chore(github): add templates (#264) 2023-05-02 15:43:19 +02:00
Nicolas Patry
b0b97fd9a7 doc(launcher): add more docs to the launcher itself and link in the README (#257) 2023-04-29 11:53:42 +02:00
Nicolas Patry
db2b4e0754 feat(router): new healthcheck that skips the queue (#244)
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
Nicolas Patry
77758f603b chore(launcher): refactor logic (#242)
Hopefully it's cleaner
2023-04-26 14:43:36 +02:00
OlivierDehaene
ebc74d5666 feat(router): use number of tokens in batch as input for dynamic batching (#226)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene
6ded76a4ae v0.6.0 (#222) 2023-04-21 21:00:57 +02:00
OlivierDehaene
252f42c1e6 fix(router): add auth token to get model info (#207) 2023-04-19 20:06:06 +02:00
OlivierDehaene
2475aede61 feat(router): add info route (#196)
close #125
2023-04-18 16:16:06 +02:00
OlivierDehaene
7a1ba58557 fix(docker): fix docker image dependencies (#187) 2023-04-17 00:26:47 +02:00
OlivierDehaene
e3a63b6fbc fix(launcher): revert change on shard errors (#173) 2023-04-13 11:07:11 +02:00
OlivierDehaene
6f0f1d70f6 v0.5.0 (#168) 2023-04-11 20:32:18 +02:00
OlivierDehaene
f26dfd0dc1 feat(server): support OPT models (#55)
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
2023-04-11 19:16:41 +02:00
OlivierDehaene
299217c95c feat(server): add flash attention llama (#144) 2023-04-11 16:38:22 +02:00
OlivierDehaene
e63a21eb4d feat(launcher): allow disabling hf_transfer (#161) 2023-04-09 20:00:05 +02:00
OlivierDehaene
fef1a1c381 v0.4.3 (#152) 2023-03-30 17:28:14 +02:00
OlivierDehaene
84722f3e33 v0.4.2 (#151) 2023-03-30 17:10:01 +02:00
OlivierDehaene
ab5fd8cf93 v0.4.1 (#140) 2023-03-26 16:37:51 +02:00
OlivierDehaene
411d6247f4 v0.4.0 (#119) 2023-03-09 16:07:01 +01:00
OlivierDehaene
55bd4fed7d feat(router): add best_of parameter (#117) 2023-03-09 15:30:54 +01:00
OlivierDehaene
5fd2dcb513 feat(launcher): default num_shard to CUDA_VISIBLE_DEVICES if possible (#108) 2023-03-08 13:53:41 +01:00