mirror of
https://github.com/YerbaPage/LongCodeZip.git
synced 2025-10-22 23:19:46 +03:00
packaging
This commit is contained in:
103
README.md
103
README.md
@@ -21,37 +21,20 @@ LongCodeZip introduces a two-stage code compression framework specifically desig
|
||||
|
||||
The method is plug-and-play and can be integrated with existing code LLMs to achieve significant compression ratios while maintaining or improving task performance.
|
||||
|
||||
## Repository Structure
|
||||
|
||||
This repository contains implementations and experiments for three code-related tasks:
|
||||
|
||||
```
|
||||
LongCodeZip/
|
||||
├── repo-qa/ # Code Retrieval Task
|
||||
│ ├── main.py # Main evaluation script
|
||||
│ ├── run.sh # Experiment runner
|
||||
│ ├── code_compressor.py # Core compression implementation
|
||||
│ ├── compute_score.py # Evaluation metrics
|
||||
│ └── ...
|
||||
├── long-code-completion/ # Code Completion Task
|
||||
│ ├── main.py # Main evaluation script
|
||||
│ ├── run.sh # Experiment runner
|
||||
│ ├── code_compressor.py # Core compression implementation
|
||||
│ ├── utils.py # Utility functions
|
||||
│ └── ...
|
||||
├── module-summarization/ # Code Summarization Task
|
||||
│ ├── main.py # Main evaluation script
|
||||
│ ├── run.sh # Experiment runner
|
||||
│ ├── code_compressor.py # Core compression implementation
|
||||
│ ├── utils.py # Utility functions
|
||||
│ └── ...
|
||||
└── README.md
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
You can install directly from the GitHub repository:
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
pip install git+https://github.com/YerbaPage/LongCodeZip.git
|
||||
```
|
||||
|
||||
Or clone and install in development mode:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/YerbaPage/LongCodeZip.git
|
||||
cd LongCodeZip
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
## Quick Demo
|
||||
@@ -62,36 +45,21 @@ We provide a simple demo (`demo.py`) to help you get started with LongCodeZip:
|
||||
python demo.py
|
||||
```
|
||||
|
||||
This demo showcases the core compression functionality by compressing a simple code snippet containing multiple functions (add, quick_sort, search_with_binary_search) based on a query about quick sort. The compressor will:
|
||||
1. Rank functions by relevance to the query
|
||||
2. Apply fine-grained compression to maximize information density
|
||||
3. Generate a compressed prompt suitable for code LLMs
|
||||
|
||||
**Example output:**
|
||||
```python
|
||||
# Original: ~150 tokens
|
||||
# Compressed: ~64 tokens (target)
|
||||
# Selected: quick_sort function (most relevant to query)
|
||||
```
|
||||
|
||||
## Core API Usage
|
||||
|
||||
LongCodeZip provides a simple and powerful API for compressing long code contexts. Here's how to use it:
|
||||
|
||||
### Basic Example
|
||||
## Basic Example
|
||||
|
||||
```python
|
||||
from longcodezip import CodeCompressor
|
||||
from longcodezip import LongCodeZip
|
||||
|
||||
# Initialize the compressor
|
||||
compressor = CodeCompressor(model_name="Qwen/Qwen2.5-Coder-7B-Instruct")
|
||||
compressor = LongCodeZip(model_name="Qwen/Qwen2.5-Coder-7B-Instruct")
|
||||
|
||||
# Compress code with a query
|
||||
result = compressor.compress_code_file(
|
||||
code=your_code_string,
|
||||
query="What does this function do?",
|
||||
instruction="Answer the question based on the code.",
|
||||
code=<your_code_string>,
|
||||
query=<your_query>,
|
||||
instruction=<your_instruction>,
|
||||
rate=0.5, # Keep 50% of tokens
|
||||
rank_only=False, # Set to True to only rank and select contexts without fine-grained compression
|
||||
)
|
||||
|
||||
# Access compressed results
|
||||
@@ -99,41 +67,6 @@ compressed_code = result['compressed_code']
|
||||
compressed_prompt = result['compressed_prompt'] # Full prompt with instruction
|
||||
compression_ratio = result['compression_ratio']
|
||||
```
|
||||
## Usage
|
||||
|
||||
### Quick Start
|
||||
|
||||
Each task directory contains a `run.sh` script for easy experimentation. Simply navigate to the desired task directory and run:
|
||||
|
||||
```bash
|
||||
cd <task_directory>
|
||||
bash run.sh
|
||||
```
|
||||
|
||||
### Code Retrieval (RepoQA)
|
||||
|
||||
Navigate to the `repo-qa` directory and run experiments with different compression ratios:
|
||||
|
||||
```bash
|
||||
cd repo-qa
|
||||
bash run.sh
|
||||
```
|
||||
|
||||
The script will evaluate LongCodeZip on the RepoQA dataset with compression ratios, running experiments in parallel on multiple GPUs.
|
||||
|
||||
**Key Parameters:**
|
||||
- `--compression-ratio`: Controls the compression level
|
||||
- `--model`: Specifies the base LLM model
|
||||
- `--backend`: Backend for model inference (vllm)
|
||||
|
||||
### Code Completion
|
||||
|
||||
Navigate to the `long-code-completion` directory:
|
||||
|
||||
```bash
|
||||
cd long-code-completion
|
||||
bash run.sh
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
|
||||
Reference in New Issue
Block a user