From afaaef2b05fdbe2e631794cb6cdf459bc498b4c7 Mon Sep 17 00:00:00 2001 From: Huiqiang Jiang Date: Thu, 18 Jan 2024 21:47:26 +0800 Subject: [PATCH] Feature(LLMLingua): support phi-2 (#67) Co-authored-by: Siyun Zhao Co-authored-by: Qianhui Wu Co-authored-by: Xufang Luo <34053802+XufangLuo@users.noreply.github.com> Co-authored-by: Yuqing Yang --- DOCUMENT.md | 16 ++++++++++++++-- README.md | 4 ++++ 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/DOCUMENT.md b/DOCUMENT.md index 13ab13d..d2ad221 100644 --- a/DOCUMENT.md +++ b/DOCUMENT.md @@ -141,9 +141,21 @@ recovered_response = llm_lingua.recover( ## Advanced Usage -### Utilizing Quantized Small Models +### Utilizing Small Models -(LLong)LLMLingua supports the use of quantized small models such as `TheBloke/Llama-2-7b-Chat-GPTQ`, which require less than 8GB of GPU memory. +### Using phi-2 + +Thanks to the efforts of the community, phi-2 is now available for use in LLMLingua. + +Before using it, please update your transformers to the GitHub version by running `pip install -U git+https://github.com/huggingface/transformers.git`. + +```python +llm_lingua = PromptCompressor("microsoft/phi-2") +``` + +### Quantized Models + +(Long)LLMLingua supports the use of quantized small models such as `TheBloke/Llama-2-7b-Chat-GPTQ`, which require less than 8GB of GPU memory. To begin, ensure you install the necessary packages with: diff --git a/README.md b/README.md index fb61933..79e89fb 100644 --- a/README.md +++ b/README.md @@ -119,6 +119,10 @@ compressed_prompt = llm_lingua.compress_prompt(prompt, instruction="", question= # 'ratio': '11.2x', # 'saving': ', Saving $0.1 in GPT-4.'} +## Or use the phi-2 model, +## Before that, you need to update the transformers to the github version, like pip install -U git+https://github.com/huggingface/transformers.git +llm_lingua = PromptCompressor("microsoft/phi-2") + ## Or use the quantation model, like TheBloke/Llama-2-7b-Chat-GPTQ, only need <8GB GPU memory. ## Before that, you need to pip install optimum auto-gptq llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"})