
[](https://arxiv.org/abs/2510.00446) [](https://conf.researchr.org/details/ase-2025/ase-2025-papers/121/LongCodeZip-Compress-Long-Context-for-Code-Language-Models) [](https://www.python.org/downloads/release/python-397/) [](https://github.com/YerbaPage/LongCodeZip) [](LICENSE)
# LongCodeZip
This repository is the official implementation of LongCodeZip, a novel two-stage long code compression method. Our paper "LongCodeZip: Compress Long Context for Code Language Models" has been accepted to **ASE 2025**.
## Method Overview

LongCodeZip introduces a two-stage code compression framework specifically designed for code LLMs:
1. **Coarse-grained Compression**: Function-based chunking and ranking using conditional perplexity with respect to the query to select the most relevant functions.
2. **Fine-grained Compression**: Entropy-based block detection combined with 0/1 knapsack optimization to maximize relevance within adaptive token budgets.
The method is plug-and-play and can be integrated with existing code LLMs to achieve significant compression ratios while maintaining or improving task performance.
## Installation
You can install directly from the GitHub repository:
```bash
pip install git+https://github.com/YerbaPage/LongCodeZip.git
```
Or clone and install in development mode:
```bash
git clone https://github.com/YerbaPage/LongCodeZip.git
cd LongCodeZip
pip install -e .
```
## Quick Demo
We provide a simple demo (`demo.py`) to help you get started with LongCodeZip:
```bash
python demo.py
```
## Basic Example
```python
from longcodezip import LongCodeZip
# Initialize the compressor
compressor = LongCodeZip(model_name="Qwen/Qwen2.5-Coder-7B-Instruct")
# Compress code with a query
result = compressor.compress_code_file(
code=