mirror of
https://github.com/microsoft/LLMLingua.git
synced 2024-01-23 02:05:46 +03:00
Feature(LLMLingua): support phi-2 (#67)
Co-authored-by: Siyun Zhao <siyunzhao@microsoft.com> Co-authored-by: Qianhui Wu <wuqh_thu@foxmail.com> Co-authored-by: Xufang Luo <34053802+XufangLuo@users.noreply.github.com> Co-authored-by: Yuqing Yang <justin.yqyang@gmail.com>
This commit is contained in:
16
DOCUMENT.md
16
DOCUMENT.md
@@ -141,9 +141,21 @@ recovered_response = llm_lingua.recover(
|
|||||||
|
|
||||||
## Advanced Usage
|
## Advanced Usage
|
||||||
|
|
||||||
### Utilizing Quantized Small Models
|
### Utilizing Small Models
|
||||||
|
|
||||||
(LLong)LLMLingua supports the use of quantized small models such as `TheBloke/Llama-2-7b-Chat-GPTQ`, which require less than 8GB of GPU memory.
|
### Using phi-2
|
||||||
|
|
||||||
|
Thanks to the efforts of the community, phi-2 is now available for use in LLMLingua.
|
||||||
|
|
||||||
|
Before using it, please update your transformers to the GitHub version by running `pip install -U git+https://github.com/huggingface/transformers.git`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
llm_lingua = PromptCompressor("microsoft/phi-2")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Quantized Models
|
||||||
|
|
||||||
|
(Long)LLMLingua supports the use of quantized small models such as `TheBloke/Llama-2-7b-Chat-GPTQ`, which require less than 8GB of GPU memory.
|
||||||
|
|
||||||
To begin, ensure you install the necessary packages with:
|
To begin, ensure you install the necessary packages with:
|
||||||
|
|
||||||
|
|||||||
@@ -119,6 +119,10 @@ compressed_prompt = llm_lingua.compress_prompt(prompt, instruction="", question=
|
|||||||
# 'ratio': '11.2x',
|
# 'ratio': '11.2x',
|
||||||
# 'saving': ', Saving $0.1 in GPT-4.'}
|
# 'saving': ', Saving $0.1 in GPT-4.'}
|
||||||
|
|
||||||
|
## Or use the phi-2 model,
|
||||||
|
## Before that, you need to update the transformers to the github version, like pip install -U git+https://github.com/huggingface/transformers.git
|
||||||
|
llm_lingua = PromptCompressor("microsoft/phi-2")
|
||||||
|
|
||||||
## Or use the quantation model, like TheBloke/Llama-2-7b-Chat-GPTQ, only need <8GB GPU memory.
|
## Or use the quantation model, like TheBloke/Llama-2-7b-Chat-GPTQ, only need <8GB GPU memory.
|
||||||
## Before that, you need to pip install optimum auto-gptq
|
## Before that, you need to pip install optimum auto-gptq
|
||||||
llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"})
|
llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"})
|
||||||
|
|||||||
Reference in New Issue
Block a user