## Online Meeting

<a target="_blank" href="https://colab.research.google.com/github/microsoft/LLMLingua/blob/main/examples/OnlineMeeting.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Using generative AI like ChatGPT in online meetings can greatly improve work efficiency (e.g., **Teams**). However, the context in such applications tends to be more conversational, with a high degree of redundancy and a large number of tokens(more than **40k**). By utilizing LLMLingua to compress prompts, we can significantly reduce the length of prompts, which in turn helps to reduce latency. This makes the AI more efficient and responsive in real-time communication scenarios like online meetings, enabling smoother interactions and better overall performance. We use meeting transcripts from the [**MeetingBank** dataset](https://huggingface.co/datasets/lytang/MeetingBank-transcript) as an example to demonstrate the capabilities of LLMLingua.

### MeetingBank Dataset

Next, we will demonstrate the use of LongLLMLingua on the **MeetingBank** dataset, which can achieve similar or even better performance with significantly fewer tokens. The online meeting scenario is quite similar to RAG, as it also suffers from the "lost in the middle" issue, where noise data at the beginning or end of the prompt interferes with LLMs extracting key information. This dataset closely resembles real-world online meeting scenarios, with prompt lengths exceeding **60k tokens at their longest.  
   
The original dataset can be found at https://huggingface.co/datasets/lytang/MeetingBank-transcript

In [1]:
# Install dependency.
!pip install llmlingua datasets

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.9 -m pip install --upgrade pip[0m


In [6]:
# Download the original prompt and dataset
from datasets import load_dataset
dataset = load_dataset("lytang/MeetingBank-transcript")["train"]

In [8]:
# Using the OAI
import openai
openai.api_key = "<insert_openai_key>"

In [10]:
# or Using the AOAI
import openai
openai.api_key = "<insert_openai_key>"
openai.api_base = "https://xxxx.openai.azure.com/"
openai.api_type = 'azure'
openai.api_version = '2023-05-15'

### Setup Data

In [12]:
# select an example from MeetingBank
contexts = dataset[1]["source"]

### Q1

In [13]:
question = "Question: How much did the crime rate increase last year?\nAnswer:"
reference = "5.4%"

In [15]:
# The response from original prompt, using GPT-4-32k
import json
prompt = "\n\n".join([contexts, question])

message = [
    {"role": "user", "content": prompt},
]

request_data = {
    "messages": message,
    "max_tokens": 100,
    "temperature": 0,
    "top_p": 1,
    "n": 1,
    "stream": False,
}
response = openai.ChatCompletion.create(
    "gpt-4-32k",
    **request_data,
)
print(json.dumps(response, indent=4))

{
    "id": "chatcmpl-8FNC3cZSVtzUCxOVhB04RxnEUVrf8",
    "object": "chat.completion",
    "created": 1698674767,
    "model": "gpt-4-32k",
    "choices": [
        {
            "index": 0,
            "finish_reason": "stop",
            "message": {
                "role": "assistant",
                "content": "The crime rate increased by 5.4% year to date."
            }
        }
    ],
    "usage": {
        "prompt_tokens": 30096,
        "completion_tokens": 14,
        "total_tokens": 30110
    }
}


In [1]:
# Setup LLMLingua
from llmlingua import PromptCompressor
llm_lingua = PromptCompressor()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



In [17]:
# 200 Compression
compressed_prompt = llm_lingua.compress_prompt(
    contexts.split("\n"),
    instruction="",
    question=question,
    target_token=200,
    condition_compare=True,
    condition_in_question='after',
    rank_method='longllmlingua',
    use_sentence_level_filter=False,
    context_budget="+100",
    dynamic_context_compression_ratio=0.4, # enable dynamic_context_compression_ratio
    reorder_context="sort"
)
message = [
    {"role": "user", "content": compressed_prompt["compressed_prompt"]},
]

request_data = {
    "messages": message,
    "max_tokens": 100,
    "temperature": 0,
    "top_p": 1,
    "n": 1,
    "stream": False,
}
response = openai.ChatCompletion.create(
    "gpt-4-32k",
    **request_data,
)

print(json.dumps(compressed_prompt, indent=4))
print("Response:", response)

{
    "compressed_prompt": "aker3., the.\n\naker : Thank you Counciloman Yes,'s. 5.4% increase to date That after this a 1.4 increase in crime in. 1 From Police. Let the police. : day. Our department will continue to evolve and move forward, building on our existing strengths and taking advantage of opportunities for growth and renewal. Our priorities around crime and homelessness, employee and community wellness and open communication will help guide us further into 21st century policing, while also supporting the shared responsibility of public safety in the city of Long Beach. Thank you. Myself and Bureau Chief Josie Murray stand ready to answer any questions they can.\n\nQuestion: How much did the crime rate increase last year?\nAnswer:",
    "origin_tokens": 30089,
    "compressed_tokens": 149,
    "ratio": "201.9x",
    "saving": ", Saving $1.8 in GPT-4."
}
Response: {
  "id": "chatcmpl-8FNIg6iVYBfI1354r72xYE9X4tDDE",
  "object": "chat.completion",
  "created": 1698675178,
  "mod

### Q2

In [18]:
question = "Question: What is the homicide clearance rate?\nAnswer:"
reference = "77%"

In [19]:
# The response from original prompt, using GPT-4-32k
import json
prompt = "\n\n".join([contexts, question])

message = [
    {"role": "user", "content": prompt},
]

request_data = {
    "messages": message,
    "max_tokens": 100,
    "temperature": 0,
    "top_p": 1,
    "n": 1,
    "stream": False,
}
response = openai.ChatCompletion.create(
    "gpt-4-32k",
    **request_data,
)
print(json.dumps(response, indent=4))

{
    "id": "chatcmpl-8FNJi0fTohhSuLHTF13uWBBcslAtx",
    "object": "chat.completion",
    "created": 1698675242,
    "model": "gpt-4-32k",
    "choices": [
        {
            "index": 0,
            "finish_reason": "stop",
            "message": {
                "role": "assistant",
                "content": "The homicide clearance rate for the Long Beach Fire Department is 77%."
            }
        }
    ],
    "usage": {
        "prompt_tokens": 30093,
        "completion_tokens": 14,
        "total_tokens": 30107
    }
}


In [56]:
# 200 Compression
compressed_prompt = llm_lingua.compress_prompt(
    contexts.split("\n"),
    instruction="",
    question=question,
    target_token=200,
    condition_compare=True,
    condition_in_question='after',
    rank_method='longllmlingua',
    use_sentence_level_filter=True,
    context_budget="+100",
    reorder_context="sort"
)
message = [
    {"role": "user", "content": compressed_prompt["compressed_prompt"]},
]

request_data = {
    "messages": message,
    "max_tokens": 100,
    "temperature": 0,
    "top_p": 1,
    "n": 1,
    "stream": False,
}
response = openai.ChatCompletion.create(
    "gpt-4-32k",
    **request_data,
)

print(json.dumps(compressed_prompt, indent=4))
print("Response:", response)

{
    "compressed_prompt": "\n\nEvery we discuss a variety of public we provide, emergency response and calls for service criminal investig, and advoc, safarding while protect infrastr and and threats.\n you see we experiencing, exempl how our are working.\n51% these arrests forb by law from possessing firear.\n this alone have seized  firear includes a 23% increase in the recovery manufactured firearms knownimps or ghost guns.And while every homic tragic, we not dissuaded and continue to toward bringing justice to the families and loved ones of victimsAmong accomplish,'ll see we have a homicide clearance rate of 77%.\nThere are many factors that contribute to our effectiveness in this area, including a rapid reaction and response by patrol officers, immediate follow up by our Special Investigations Division and the excellent investigative efforts of our homicide detectives.\nTo help increase our communication, transparency and engagement, we've developed a community advisory committee

### Q3

In [57]:
question = "Question: what are the arrangements the Police Department will make this year?"
reference = "enhancing community engagement and internal communication models, building a culture of accountability and transparency, and prioritizing recruitment and retention."

In [58]:
# The response from original prompt, using GPT-4-32k
import json
prompt = "\n\n".join([contexts, question])

message = [
    {"role": "user", "content": prompt},
]

request_data = {
    "messages": message,
    "max_tokens": 500,
    "temperature": 0,
    "top_p": 1,
    "n": 1,
    "stream": False,
}
response = openai.ChatCompletion.create(
    "gpt-4-32k",
    **request_data,
)
print(json.dumps(response, indent=4))

{
    "id": "chatcmpl-8FNz2YdueWIGpFTnRAM0ZbWKPNWIY",
    "object": "chat.completion",
    "created": 1698677804,
    "model": "gpt-4-32k",
    "choices": [
        {
            "index": 0,
            "finish_reason": "stop",
            "message": {
                "role": "assistant",
                "content": "The Police Department plans to focus on addressing the steady increase in call volume and maintaining or improving response times to fires, emergency medical and other emergency responses. They will also prioritize firefighter safety and behavioral health, increase diversity in all ranks of the department through recruitment and training opportunities, and maintain staffing and resources to meet service demands of citywide growth. The department will also begin preparing for the upcoming emergency service demands brought on by the 2028 Summer Olympic Games. They plan to replace front line vehicles and improve compliance with mandated fire prevention inspections. The departm

In [61]:
# 2000 Compression
compressed_prompt = llm_lingua.compress_prompt(
    contexts.split("\n"),
    instruction="",
    question=question,
    target_token=2000,
    condition_compare=True,
    condition_in_question='after',
    rank_method='longllmlingua',
    use_sentence_level_filter=False,
    context_budget="+100",
    dynamic_context_compression_ratio=0.4, # enable dynamic_context_compression_ratio
    reorder_context="sort"
)
message = [
    {"role": "user", "content": compressed_prompt["compressed_prompt"]},
]

request_data = {
    "messages": message,
    "max_tokens": 500,
    "temperature": 0,
    "top_p": 1,
    "n": 1,
    "stream": False,
}
response = openai.ChatCompletion.create(
    "gpt-4-32k",
    **request_data,
)

print(json.dumps(compressed_prompt, indent=4))
print("Response:", response)

{
    "compressed_prompt": "Speaker3: Thank., the\n\nSpe Thank. Next keep the\n  Thank. Councilwoman Yes,'s5% year date. is after this year with a74%.: Mr. Mods,able Mayor and of the' very be presenting the Polices3 budget. for.ented. police and have experienced increased and de, they. their work purpose they are needed. to leave or, vast majority have toers with Department I believe because not typicalre worldized to maintain andation mental, qualityach programs as are, and to' mistakesre, is. of the or, the officers to here each a Theyageance In of, rising crime un police, theirment andre everyone our Every year we a of safety services,gency and, victim and ouringucture and resource, should also we ourhips and like with the Cityation,uma Program to toaborating with the Department Communic to responses. joining department as part the many other reason're we. Here volumere which. Year10 calls nearly60. Although' had to make modifications through the years to one or the of about 5 Like 