Upload Code for MiniRAG
This commit is contained in:
21
LICENSE
Normal file
21
LICENSE
Normal file
@@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2024 Gustavo Ye
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
124
README.md
Normal file
124
README.md
Normal file
@@ -0,0 +1,124 @@
|
||||
# MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation
|
||||
|
||||

|
||||
|
||||
|
||||
This repository hosts the code of the paper: **MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation**
|
||||
|
||||
<br />
|
||||
|
||||
[Tianyu Fan](https://tianyufan0504.github.io/), [Jingyuan Wang](), [Xubin Ren](https://ren-xubin.github.io/), [Chao Huang](https://sites.google.com/view/chaoh)* (*Correspondence)<br />
|
||||
</div>
|
||||
|
||||
## 🌍 README Translations
|
||||
|
||||
[中文说明](./README_CN.md)
|
||||
|
||||
## TLDR
|
||||
MiniRAG is an extremely simple retrieval-augmented generation framework that enables small models to achieve good RAG performance through heterogeneous graph indexing and lightweight topology-enhanced retrieval.
|
||||
|
||||
## Abstract
|
||||
The growing demand for efficient and lightweight Retrieval-Augmented Generation (RAG) systems has highlighted significant challenges when deploying Small Language Models (SLMs) in existing RAG frameworks. Current approaches face severe performance degradation due to SLMs' limited semantic understanding and text processing capabilities, creating barriers for widespread adoption in resource-constrained scenarios. To address these fundamental limitations, we present **MiniRAG**, a novel RAG system designed for extreme simplicity and efficiency. **MiniRAG** introduces two key technical innovations: (1) a semantic-aware heterogeneous graph indexing mechanism that combines text chunks and named entities in a unified structure, reducing reliance on complex semantic understanding, and (2) a lightweight topology-enhanced retrieval approach that leverages graph structures for efficient knowledge discovery without requiring advanced language capabilities. Our extensive experiments demonstrate that **MiniRAG** achieves comparable performance to LLM-based methods even when using SLMs while requiring only 25\% of the storage space. Additionally, we contribute a comprehensive benchmark dataset LiHua-World for evaluating lightweight RAG systems under realistic on-device scenarios with complex queries.
|
||||
|
||||
## Install
|
||||
|
||||
* Install from source (Recommend)
|
||||
|
||||
```bash
|
||||
cd MiniRAG
|
||||
pip install -e .
|
||||
```
|
||||
* Install from PyPI (Our code is based on [LightRAG](https://github.com/HKUDS/LightRAG), so you can install it directly)
|
||||
|
||||
```bash
|
||||
pip install lightrag-hku
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
* All the code can be found in the `./reproduce`.
|
||||
* Download the dataset you need.
|
||||
* Put the dataset in the `./dataset` directory.
|
||||
* Note: We have already put the LiHua-World dataset in `./dataset/LiHua-World/data/` as `LiHuaWorld.zip`. If you want to use other dataset, you can put it in the `./dataset/xxx`.
|
||||
|
||||
|
||||
Then use the following bash command to index the dataset:
|
||||
```bash
|
||||
python ./reproduce/Step_0_index.py
|
||||
python ./reproduce/Step_1_QA.py
|
||||
```
|
||||
|
||||
Or, use the code in `./main.py` to initialize MiniRAG.
|
||||
|
||||
|
||||
### Overall Performance Table
|
||||
| Model | NaiveRAG | | GraphRAG | | LightRAG | | **MiniRAG** | |
|
||||
|-------|----------|----------|-----------|----------|-----------|----------|----------|----------|
|
||||
| | acc↑ | err↓ | acc↑ | err↓ | acc↑ | err↓ | acc↑ | err↓ |
|
||||
| Phi-3.5-mini-instruct | 41.22% | 23.20% | / | / | 39.81% | 25.39% | **53.29%** | 23.35% |
|
||||
| GLM-Edge-1.5B-Chat | 42.79% | 24.76% | / | / | 35.74% | 25.86% | **52.51%** | 25.71% |
|
||||
| Qwen2.5-3B-Instruct | 43.73% | 24.14% | / | / | 39.18% | 28.68% | **48.75%** | 26.02% |
|
||||
| MiniCPM3-4B | 43.42% | 17.08% | / | / | 35.42% | 21.94% | **51.25%** | 21.79% |
|
||||
| gpt-4o-mini | 46.55% | 19.12% | 35.27% | 37.77% | **56.90%** | 20.85% | 54.08% | 19.44% |
|
||||
| MultiHop-RAG | | | | | | | | |
|
||||
| Phi-3.5-mini-instruct | 42.72% | 31.34% | / | / | 27.03% | 11.78% | **49.96%** | 28.44% |
|
||||
| GLM-Edge-1.5B-Chat | 44.44% | 24.26% | / | / | / | / | **51.41%** | 23.44% |
|
||||
| Qwen2.5-3B-Instruct | 39.48% | 31.69% | / | / | 21.91% | 13.73% | **48.55%** | 33.10% |
|
||||
| MiniCPM3-4B | 39.24% | 31.42% | / | / | 19.48% | 10.41% | **47.77%** | 26.88% |
|
||||
| gpt-4o-mini | 53.60% | 27.19% | 60.92% | 16.86% | 64.91% | 19.37% | **68.43%** | 19.41% |
|
||||
|
||||
|
||||
In the table, / means the method struggles to generate effective responses.
|
||||
|
||||
## Reproduce
|
||||
All the code can be found in the `./reproduce` directory.
|
||||
|
||||
## Code Structure
|
||||
|
||||
```python
|
||||
├── dataset
|
||||
│ └── LiHua-World
|
||||
│ ├── data
|
||||
│ └── qa
|
||||
│ ├── query_set.csv
|
||||
│ └── query_set.json
|
||||
├── minirag
|
||||
│ ├── kg
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── neo4j_impl.py
|
||||
│ │ └── oracle_impl.py
|
||||
│ ├── __init__.py
|
||||
│ ├── base.py
|
||||
│ ├── llm.py
|
||||
│ ├── minirag.py
|
||||
│ ├── operate.py
|
||||
│ ├── prompt.py
|
||||
│ ├── storage.py
|
||||
│ └── utils.py
|
||||
├── reproduce
|
||||
│ ├── Step_0_index.py
|
||||
│ └── Step_1_QA.py
|
||||
├── exp.py
|
||||
├── LICENSE
|
||||
├── main.py
|
||||
├── main.sh
|
||||
├── README.md
|
||||
└── requirements.txt
|
||||
```
|
||||
|
||||
## Dataset: LiHua-World
|
||||
|
||||
LiHua-World is a dataset specifically designed for on-device RAG scenarios, containing one year of chat records from a virtual user named LiHua. The dataset includes three types of questions: single-hop, multi-hop, and summary, with each question paired with manually annotated answers and supporting documents.
|
||||
|
||||
|
||||
|
||||
|
||||
## Acknowledgements
|
||||
You may refer to related work that serves as foundations for our framework and code repository,
|
||||
[nano-graphrag](https://github.com/gusye1234/nano-graphrag) and [LightRAG](https://github.com/HKUDS/LightRAG). Thanks for their wonderful works.
|
||||
|
||||
## 🌟Citation
|
||||
|
||||
```python
|
||||
```
|
||||
|
||||
**Thank you for your interest in our work!**
|
||||
118
README_CN.md
Normal file
118
README_CN.md
Normal file
@@ -0,0 +1,118 @@
|
||||
# MiniRAG: 迈向极简检索增强生成
|
||||
|
||||

|
||||
|
||||
|
||||
本仓库是论文: **MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation** 的代码仓库。
|
||||
|
||||
<br />
|
||||
|
||||
[Tianyu Fan](https://tianyufan0504.github.io/), [Jingyuan Wang](), [Xubin Ren](https://ren-xubin.github.io/), [Chao Huang](https://sites.google.com/view/chaoh)* (*Correspondence)<br />
|
||||
</div>
|
||||
|
||||
## TLDR
|
||||
MiniRAG 是一个极简的检索增强生成框架,它通过异质图索引和轻量级的拓扑增强检索,让小模型也能取得很好的RAG效果。
|
||||
|
||||
## Abstract
|
||||
对高效且轻量级的检索增强生成(RAG)系统日益增长的需求,凸显了在现有RAG框架中部署小型语言模型(SLMs)时所面临的重大挑战。由于SLMs在语义理解和文本处理能力上的局限性,当前方法面临严重的性能下降问题,这在资源受限的场景中阻碍了其广泛应用。为了应对这些根本性限制,我们提出了**MiniRAG**,这是一种专为极简和高效而设计的新型RAG系统。MiniRAG引入了两项关键技术创新:(1)一种语义感知的异构图索引机制,将文本块和命名实体结合在一个统一结构中,减少了对复杂语义理解的依赖;(2)一种轻量级的拓扑增强检索方法,利用图结构实现高效的知识发现,而无需高级语言能力。我们的大量实验表明,MiniRAG在使用SLMs时,性能与基于LLM的方法相当,同时仅需25%的存储空间。此外,我们还贡献了一个全面的基准数据集LiHua-World,用于评估轻量级RAG系统在现实设备场景下处理复杂查询的能力。
|
||||
|
||||
|
||||
## 安装
|
||||
|
||||
* 从源码安装(推荐)
|
||||
|
||||
```bash
|
||||
cd MiniRAG
|
||||
pip install -e .
|
||||
```
|
||||
* 从 PyPI 安装(我们的代码基于 [LightRAG](https://github.com/HKUDS/LightRAG),因此可以直接安装)
|
||||
|
||||
```bash
|
||||
pip install lightrag-hku
|
||||
```
|
||||
|
||||
## 快速开始
|
||||
* 所有复现代码可以在 `./reproduce` 目录下找到。
|
||||
* 下载您需要的知识库数据集。
|
||||
* 将数据集放入 `./dataset` 目录下。
|
||||
* Note:我们已经将 LiHua-World 数据集以 `LiHuaWorld.zip` 的形式放在了 `./dataset/LiHua-World/data/` 目录下。如果您想使用其他数据集,可以将其放在 `./dataset/xxx` 目录下。
|
||||
|
||||
|
||||
然后使用以下命令对数据集进行索引:
|
||||
```bash
|
||||
python ./reproduce/Step_0_index.py
|
||||
python ./reproduce/Step_1_QA.py
|
||||
```
|
||||
|
||||
或者,使用 `./main.py` 中的代码初始化 MiniRAG。
|
||||
|
||||
|
||||
### 整体性能表
|
||||
| Model | NaiveRAG | | GraphRAG | | LightRAG | | **MiniRAG** | |
|
||||
|-------|----------|----------|-----------|----------|-----------|----------|----------|----------|
|
||||
| | acc↑ | err↓ | acc↑ | err↓ | acc↑ | err↓ | acc↑ | err↓ |
|
||||
| Phi-3.5-mini-instruct | 41.22% | 23.20% | / | / | 39.81% | 25.39% | **53.29%** | 23.35% |
|
||||
| GLM-Edge-1.5B-Chat | 42.79% | 24.76% | / | / | 35.74% | 25.86% | **52.51%** | 25.71% |
|
||||
| Qwen2.5-3B-Instruct | 43.73% | 24.14% | / | / | 39.18% | 28.68% | **48.75%** | 26.02% |
|
||||
| MiniCPM3-4B | 43.42% | 17.08% | / | / | 35.42% | 21.94% | **51.25%** | 21.79% |
|
||||
| gpt-4o-mini | 46.55% | 19.12% | 35.27% | 37.77% | **56.90%** | 20.85% | 54.08% | 19.44% |
|
||||
| MultiHop-RAG | | | | | | | | |
|
||||
| Phi-3.5-mini-instruct | 42.72% | 31.34% | / | / | 27.03% | 11.78% | **49.96%** | 28.44% |
|
||||
| GLM-Edge-1.5B-Chat | 44.44% | 24.26% | / | / | / | / | **51.41%** | 23.44% |
|
||||
| Qwen2.5-3B-Instruct | 39.48% | 31.69% | / | / | 21.91% | 13.73% | **48.55%** | 33.10% |
|
||||
| MiniCPM3-4B | 39.24% | 31.42% | / | / | 19.48% | 10.41% | **47.77%** | 26.88% |
|
||||
| gpt-4o-mini | 53.60% | 27.19% | 60.92% | 16.86% | 64.91% | 19.37% | **68.43%** | 19.41% |
|
||||
|
||||
表中,/ 表示该方法难以生成有效响应。
|
||||
|
||||
## 复现
|
||||
所有代码可以在 `./reproduce` 目录下找到。
|
||||
|
||||
## 代码结构
|
||||
|
||||
```python
|
||||
├── dataset
|
||||
│ └── LiHua-World
|
||||
│ ├── data
|
||||
│ └── qa
|
||||
│ ├── query_set.csv
|
||||
│ └── query_set.json
|
||||
├── minirag
|
||||
│ ├── kg
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── neo4j_impl.py
|
||||
│ │ └── oracle_impl.py
|
||||
│ ├── __init__.py
|
||||
│ ├── base.py
|
||||
│ ├── llm.py
|
||||
│ ├── minirag.py
|
||||
│ ├── operate.py
|
||||
│ ├── prompt.py
|
||||
│ ├── storage.py
|
||||
│ └── utils.py
|
||||
├── reproduce
|
||||
│ ├── Step_0_index.py
|
||||
│ └── Step_1_QA.py
|
||||
├── exp.py
|
||||
├── LICENSE
|
||||
├── main.py
|
||||
├── main.sh
|
||||
├── README.md
|
||||
└── requirements.txt
|
||||
```
|
||||
|
||||
## 数据集: LiHua-World
|
||||
|
||||
LiHua-World 是一个专门为本地 RAG 场景设计的数据集,包含了一个名为 LiHua 的虚拟用户一年内的聊天记录。该数据集包含三种类型的问题:单跳、多跳和总结性问题,每个问题都配有人工标注的答案和支持文档。
|
||||
|
||||
|
||||
|
||||
## 致谢
|
||||
你可以参考以下相关工作,它们为我们的框架和代码库奠定了基础:[nano-graphrag](https://github.com/gusye1234/nano-graphrag) 和 [LightRAG](https://github.com/HKUDS/LightRAG)。感谢他们的出色工作。
|
||||
|
||||
## 🌟引用
|
||||
|
||||
```python
|
||||
```
|
||||
|
||||
**感谢您对我们工作的关注!**
|
||||
48
dataset/LiHua-World/README.md
Normal file
48
dataset/LiHua-World/README.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# LiHua-World Dataset
|
||||
|
||||
[中文说明](./README_CN.md)
|
||||
|
||||
|
||||
LiHua-World is a dataset specifically designed for local RAG (Retrieval-Augmented Generation) scenarios. It contains one year's worth of chat records from a virtual user named LiHua.
|
||||
|
||||
## Dataset Features
|
||||
|
||||
- Includes three types of questions:
|
||||
- Single-hop
|
||||
- Multi-hop
|
||||
- Summary
|
||||
- Each question is accompanied by manually annotated answers and supporting documents
|
||||
- The chat records cover various aspects of daily life, including:
|
||||
- Social interactions
|
||||
- Fitness training
|
||||
- Entertainment activities
|
||||
- Life affairs
|
||||
- ...
|
||||
## Dataset Structure
|
||||
|
||||
The dataset mainly consists of the following parts:
|
||||
|
||||
### 1. Original Chat Records (/data)
|
||||
- Chat messages organized in chronological order
|
||||
- Each message contains:
|
||||
- Timestamp
|
||||
- Sender
|
||||
- Message content
|
||||
- Message type
|
||||
|
||||
### 2. Q&A Data (/qa)
|
||||
- query_set.csv: Contains questions, standard answers, and evidence
|
||||
- query_set.json: JSON format version of the CSV file
|
||||
|
||||
### 3. Metadata
|
||||
- User information
|
||||
- Time range: January 2026 to December 2026
|
||||
- List of conversation participants
|
||||
|
||||
## Usage Instructions
|
||||
|
||||
Step 1. Unzip the `LiHuaWorld.zip` file in the `./data` directory to obtain the original chat records.
|
||||
|
||||
Step 2. Use all the chat records in the `./data` directory as the knowledge base.
|
||||
|
||||
Step 3. Use `query_set.csv` or `query_set.json` in the `./qa` directory as the question set to conduct RAG testing.
|
||||
47
dataset/LiHua-World/README_CN.md
Normal file
47
dataset/LiHua-World/README_CN.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# LiHua-World 数据集
|
||||
|
||||
LiHua-World 是一个专门为本地 RAG (检索增强生成)场景设计的数据集。该数据集包含了一个名为 LiHua 的虚拟用户一年内的聊天记录。
|
||||
|
||||
## 数据集特点
|
||||
|
||||
- 包含三种类型的问题:
|
||||
- 单跳问题 (Single-hop)
|
||||
- 多跳问题 (Multi-hop)
|
||||
- 总结性问题 (Summary)
|
||||
- 每个问题都配有人工标注的答案和支持文档
|
||||
- 聊天记录涵盖了日常生活的多个方面,包括:
|
||||
- 社交互动
|
||||
- 健身训练
|
||||
- 娱乐活动
|
||||
- 生活事务
|
||||
- ...
|
||||
|
||||
## 数据集结构
|
||||
|
||||
数据集主要包含以下部分:
|
||||
|
||||
### 1. 原始聊天记录 (./data)
|
||||
- 按时间顺序组织的聊天消息
|
||||
- 每条消息包含:
|
||||
- 时间戳
|
||||
- 发送者
|
||||
- 消息内容
|
||||
- 消息类型
|
||||
为了方便组织,每个文件夹包含的是一周的聊天记录。
|
||||
|
||||
### 2. 问答数据 (/qa)
|
||||
- query_set.csv: 包含问题、标准答案和证据
|
||||
- query_set.json: CSV文件的JSON格式版本
|
||||
|
||||
### 3. 元数据
|
||||
- 用户信息
|
||||
- 时间范围: 2026年1月至12月
|
||||
- 对话参与者列表
|
||||
|
||||
## 使用说明
|
||||
|
||||
Step1. 在./data文件夹下,解压LiHuaWorld.zip,获取原始聊天记录。
|
||||
|
||||
Step2. 使用./data下的所有聊天记录作为知识库。
|
||||
|
||||
Step3. 使用./qa下的query_set.csv或是query_set.json作为问题,进行RAG测试。
|
||||
BIN
dataset/LiHua-World/data/LiHuaWorld.zip
Normal file
BIN
dataset/LiHua-World/data/LiHuaWorld.zip
Normal file
Binary file not shown.
638
dataset/LiHua-World/qa/query_set.csv
Normal file
638
dataset/LiHua-World/qa/query_set.csv
Normal file
@@ -0,0 +1,638 @@
|
||||
Question,Gold Answer,Evidence,Type
|
||||
Did Adam Smith send a message to Li Hua about the upcoming building maintenance schedule before the administrators announced a temporary change in the construction schedule due to weather conditions?,Yes,20260121_10:00<and>20260701_10:00,Multi
|
||||
"Did Wolfgang ask Li Hua about watching ""Star Wars: A New Hope"" after he asked Li Hua about going to see ""Overwatch 3""?",Yes,20260121_13:00<and>20261009_17:00,Multi
|
||||
Did Li Hua agree to go out for dinner after Wolfgang first asked him if he wanted to go out for dinner?,Yes,20260123_17:00<and>20260930_16:00,Multi
|
||||
Did Li Hua send a message to Jennifer thanking her for the new training schedule before he requested a change in his training schedule for Thursday?,Yes,20260204_15:00<and>20260204_16:00<and>20260211_19:00,Multi
|
||||
Did Li Hua ask Jennifer for advice on how to prevent muscle soreness after an intense workout session before he told her that he feels soreness in his arm muscles after the workout this week?,Yes,20260206_16:00<and>20260811_11:00<and>20261008_14:00<and>20261211_11:00,Multi
|
||||
Did Li Hua send a message to Jennifer asking for her opinion on protein supplements before he consulted her about his daily protein powder consumption?,Yes,20260213_16:00<and>20261120_20:00,Multi
|
||||
"Did Yuriko ask Li Hua for help with her studio's homepage before she booked a seat at the ""Central Perk"" cafe?",Yes,20260223_15:00<and>20260225_15:00,Multi
|
||||
Did Li Hua discuss his progress with the fitness plan before he shared a blog post about his recent fitness achievements?,Yes,20260305_17:00<and>20260325_19:00<and>20260610_16:00<and>20260630_18:00<and>20260708_14:00<and>20260817_12:15<and>20261022_22:00<and>20261202_14:00,Multi
|
||||
Did Li Hua send a message to Jennifer asking if he can turn the Thursday class to Friday after he requested a change in his training schedule for Thursday?,Yes,20260211_19:00<and>20260309_12:00,Multi
|
||||
Did Li Hua ask Yuriko to play music together before Wolfgang proposed to pause playing musical instruments?,Yes,20260318_15:10<and>20260416_21:00,Multi
|
||||
"Did Wolfgang Schulz recommend the band learns ""Viva la Vida"" by Coldplay after he and Li Hua discussed what song to play this Sunday?",Yes,20260318_15:30<and>20260625_19:00,Multi
|
||||
Did Wolfgang's promotion announcement occur before he invited Li Hua for dinner on 20260430?,Yes,20260428_18:30<and>20260930_16:00,Multi
|
||||
Did Turalyon announce the construction updates and feedback from residents after Illidan Stormrage complained about the construction noise?,Yes,20260526_15:40<and>20260527_10:00<and>20260528_15:00,Multi
|
||||
Did Hailey announce the new line of high-protein breads before inviting Li Hua to the special bakery event?,Yes,20260611_12:45<and>20260622_13:30,Multi
|
||||
Did Chae tell Li Hua that taking a warm shower before sleeping can improve the sleep quality before sharing the neuroscience article with her?,No,20260711_11:00<and>20260926_10:30,Multi
|
||||
Did Li Hua ask Jennifer Moore for book recommendations on fitness nutrition before she announced the special guest speaker at the gym?,No,20260713_19:30<and>20260817_12:15<and>20260831_19:00,Multi
|
||||
Did Jennifer remind Li Hua about proper nutrition and hydration before Jake shared his tips for staying hydrated during the match?,Yes,20260804_14:00<and>20261001_18:00,Multi
|
||||
Did the group members talk about their favorite characters in the TV series Game of Thrones after Emily started a vote on the most hateable character?,Yes,20260902_16:00<and>20260915_12:00,Multi
|
||||
Question: Did Jennifer remind Li Hua to consume enough protein after the workout before she shared tips with the group members on common mistakes to avoid after an intense workout?,Answer: Yes,20260919_10:00<and>20261101_11:00,Multi
|
||||
Did Li Hua ask Thane about his opinion on The Last of Us before he asked about Sekiro: Shadows Die Twice?,No,20261001_20:00<and>20261228_14:00,Multi
|
||||
Did Jake Watson and Li Hua discuss the classic matches between FC Barcelona and FC Bayern Munich before the group members discussed the classic matches between FC Barcelona and Real Madrid?,Yes,20261005_10:05<and>20261026_16:00,Multi
|
||||
Did the group members debate about the best football manager in the Premier League history after they debated if Pep Guardiola is the greatest soccer manager in football history?,Yes,20261006_10:00<and>20261110_10:00,Multi
|
||||
Did the discussion about Jaime Lannister's character occur after the discussion about Cersei Lannister's character?,Yes,20261010_10:10<and>20261026_20:00,Multi
|
||||
Did Wolfgang ask Li Hua if she wants to have pizza for dinner after work today before he wondered if she wanted to have Sichuan hot pot for dinner tonight?,No,20260930_16:00<and>20261015_15:00,Multi
|
||||
Did Jennifer challenge Li Hua to do 60 pull-ups in a training session after she challenged him to do 100 pushups?,Yes,20261104_18:00<and>20261129_19:00,Multi
|
||||
Did Jake share common knowledge about offside in soccer with Li Hua before he passed practical techniques to Li Hua on how to avoid offside for a forward?,Yes,20261105_15:00<and>20261130_11:30,Multi
|
||||
Did Wolfgang arrive in Hong Kong after he informed Li Hua about his upcoming trip?,Yes,20261219_19:00<and>20261223_23:00,Multi
|
||||
"What time does Li Hua watch the movie ""Overwatch 3""?",20260122,20260122_17:00<and>20260121_13:00,Multi
|
||||
"Who does Li Hua go to watch the movie ""Overwatch 3"" with?",Wolfgang,20260122_17:00<and>20260121_13:00,Multi
|
||||
Has Wolfgang ever been to Hong Kong?,Yes,20261219_19:00<and>20261220_20:00<and>20261221_12:00<and>20261228_10:00,Multi
|
||||
Who knows about Wolfgang going to Hong Kong?,LiHua & Chae & Yuriko,20261219_19:00<and>20261220_20:00<and>20261221_12:00<and>20261228_10:00,Multi
|
||||
Who wished Li Hua a happy Lunar New Year?,Adam Smith & Jennifer Moore & Wolfgang Schulz,20260119_11:30<and>20260119_14:30<and>20260119_09:30,Multi
|
||||
Who introduced the bread delivery service and recommend Alice for the delivery?,HaileyJohnson,20260318_15:00<and>20260329_13:00,Multi
|
||||
What is the opportunity that makes Wolfgang and Yuriko acquaitances?,LiHua introduce them to each other by saying that they can play music together every Sunday,20260318_15:10<and>20260319_16:00,Multi
|
||||
What was the content of the first-ever delivery from Hailey to LiHua and what was LiHua's opinion about it?,a fresh sourdough loaf and a bottle of milk and LiHua praises Hailey's bread and milk,20260317_08:00<and>20260318_15:00,Multi
|
||||
What opportunity did LiHua create for Chae to meet Wolfgang and Yuriko?,LiHua introduced Chae to Wolfgang and Yuriko during the band's gathering on Sunday evening,20260425_21:00<and>20260425_23:30,Multi
|
||||
What special offerings did Hailey have for her backery shop in the month of May?,a special Mother's Day bakery promotion & a special summer promotion on ice cream & a free baking class at the end of May & banana durian cheesecake,20260508_08:00<and>20260514_14:00<and>20260527_16:00<and>20260531_20:00,Multi
|
||||
What feedbacks does Hailey ask from LiHua in July?,feedback on the bread delivery service & customer feedback on a new line of artisanal donuts,20260710_08:30<and>20260712_16:00,Multi
|
||||
How long does it take in total from LiHua planning on getting the air-conditioner to the air-conditioner been installed?,about 27 days,20260716_10:00<and>20260812_11:00,Multi
|
||||
Did it take more than 3 weeks from LiHua planning on getting the air-conditioner to the air-conditioner been actually installed?,Yes,20260716_10:00<and>20260812_11:00,Multi
|
||||
Did it take more than a week from Adam asking LiHua about the ideal installation date to Adam reminding LiHua about the contractor team installing air-conditioner at 18:00?,Yes,20260803_13:00<and>20260812_11:00,Multi
|
||||
Who does LiHua want to invite to the photo exhibition and who goes with him (during August)?,Wolfgang,20260801_19:00<and>20260805_16:00,Multi
|
||||
Is the time interval between LiHua asking JakeWatson to help him with dribbling skills and Li Hua asking the group about classic must-watch UCL matches more than 2 days (restrict your search within August)?,Yes,20260819_10:00<and>20260821_15:00,Multi
|
||||
Is the time interval more than 3 days between LiHua asking Adam to help him install a curtain on the basement window and Adam asking LiHua to measure the size of the window?,Yes,20260921_16:00<and>20260928_10:00,Multi
|
||||
Is the time interval more than 7 days between Adam asking LiHua to measure the size of the window and Adam informing Li Hua that he has booked the curtain of the right size?,Yes,20260928_10:00<and>20261007_12:00,Multi
|
||||
Is the time interval more than 3 days between LiHua confirming that he has received the curtain and Adam asking LiHua if the curtain is all good?,Yes,20261012_10:00<and>20261019_20:00,Multi
|
||||
Is the time interval more than 3 days between LiHua first asking Adam if he can buy a small fridge for the basement and Adam asking LiHua about the size of the fridge?,Yes,20261110_11:00<and>20261116_16:00,Multi
|
||||
Is the time interval more than 7 days between Adam asking LiHua about the size of the fridge and Adam informing LiHua that the fridge will be delivered at 4pm next day?,Yes,20261116_16:00<and>20261123_23:00,Multi
|
||||
Wolfgang suddenly becomes very concerned about good body shape and healthy food choices in December. What are the two conversations he had with LiHua in December that reflect this?,20261202_14:00 & 20261209_19:00,20261202_14:00<and>20261209_19:00,Multi
|
||||
Did Li Hua agree to have dinner with Wolfgang after he told Wolfgang about the lunch arrangement?,Yes,20260105_11:00<and>20260930_16:00,Multi
|
||||
Did Li Hua ask Wolfgang Schulz for a recommendation on a gym or fitness center before asking Jennifer Moore for book recommendations on fitness nutrition?,Yes,20260111_08:00<and>20260831_19:00,Multi
|
||||
Did Li Hua ask Wolfgang Schulz if he wants to go to the gym together before Jennifer reminded Li Hua to participate in the gym's membership feedback activity?,Yes,20260112_10:00<and>20260605_11:00,Multi
|
||||
Did Li Hua send a message to Wolfgang Schulz saying that he has prepared all the delicious food for tonight's Chinese Lunar New Year before Wolfgang sent a message to Li Hua wishing him a happy Lunar New Year?,Yes,20260113_11:00<and>20260118_12:00<and>20260119_11:30<and>20260119_14:30<and>20260119_09:30,Multi
|
||||
Did Li Hua provide feedback to Jennifer Moore on his new meal plan before he asked her for advice on a healthy meal plan?,No,20260115_16:45<and>20260122_15:00,Multi
|
||||
Did Li Hua's complaint about the customer who modifies their requirements occur before Wolfgang comforted him?,No,20260123_17:30<and>20260131_14:00,Multi
|
||||
Did Adam Smith send Li Hua a reminder about the upcoming rent due date before Li Hua sent a message about having already transferred the rent on 20260301?,Yes,20260127_20:30<and>20260227_18:30<and>20260301_10:00<and>20260330_18:00<and>20260331_17:00<and>20260429_17:00<and>20260429_18:00,Multi
|
||||
Did Li Hua share a blog post about his recent fitness achievements after Jennifer sent him a motivational message?,Yes,20260129_14:00<and>20260520_18:00<and>20260606_09:00<and>20260817_12:15<and>20261022_22:00<and>20261202_14:00,Multi
|
||||
Did Li Hua send a follow-up message to Jennifer before she asked him about his latest sleeping schedule?,Yes,20260205_13:00<and>20260725_10:00,Multi
|
||||
Did Li Hua ask Adam Smith about placing potted plants in the basement before he asked about decorating the basement?,No,20260219_20:00<and>20261214_14:00,Multi
|
||||
Did Li Hua ask Wolfgang for advice on renovating the basement before he invited Adam Smith to check the progress of the basement renovation?,Yes,20260219_20:10<and>20260223_17:00<and>20260707_16:00,Multi
|
||||
Did Li Hua tell Wolfgang about making a new friend before Yuriko reminded the group members to meet for the music festival?,Yes,20260302_18:00<and>20260318_14:27<and>20261212_12:00,Multi
|
||||
"Did Yuriko tell Li Hua about booking a seat at the ""Central Perk"" cafe before Li Hua sent her a message to confirm the details of their next meeting?",Yes,20260225_15:00<and>20260303_09:30,Multi
|
||||
What time does Li Hua check in with Adam about moving in?,5:30 PM,20260105_14:00,Single
|
||||
When was the first time Li Hua had dinner with Wolfgang this year?,20260108,20260108_11:00,Single
|
||||
Where was the first time Li Hua had dinner with Wolfgang this year?,the cozy café downtown,20260108_11:00,Single
|
||||
What time is Li Hua's lunch with Wolfgang Schulz at the cozy café downtown?,20260108,20260108_11:00,Single
|
||||
What is the Wi-Fi password at Li Hua's house?,Family123,20260106_09:00,Single
|
||||
What does Adam say about having friends over?,having friends over occasionally is fine,20260106_09:00,Single
|
||||
What house rule does Adam mention?,keep noise to a minimum during late hours,20260106_09:00,Single
|
||||
What does Li Hua report to Adam on January 6th?,the water tab in the apartment is broken,20260106_13:00,Single
|
||||
When does Adam confirm the plumber will arrive?,tomorrow at 10 AM,20260106_15:00,Single
|
||||
What does Li Hua ask Adam about the door hinge?,a small repair,20260107_15:00,Single
|
||||
What is the name of the gym that Wolfgang recommended LiHua to go to?,FitZone,20260111_08:00,Single
|
||||
When does Li Hua ask Jennifer Moore about adjusting the protein in her meal plan?,20260122,20260122_15:00<and>20260122_15:00,Multi
|
||||
"What time does Li Hua watch the movie ""Overwatch 3""?",20260122,20260122_17:00<and>20260121_13:00,Multi
|
||||
"Who does Li Hua go to watch the movie ""Overwatch 3"" with?",Wolfgang,20260122_17:00<and>20260121_13:00,Multi
|
||||
When does Li Hua plan to celebrate Chinese Lunar New Year?,20260118,20260113_11:00,Single
|
||||
What does Li Hua plan to celebrate Chinese Lunar New Year?,dumplings,20260113_11:00,Single
|
||||
Who does Li Hua plan to celebrate Chinese Lunar New Year with?,Wolfgang,20260113_11:00,Single
|
||||
Li Hua accidentally broke a light fixture once. What did Adam say about the light fixture?,I'll arrange for a professional to take a look,20260108_19:00,Single
|
||||
Li Hua has a difficult client. When did he solve that client's project?,20260209,20260123_17:30<and>20260209_21:00,Multi
|
||||
When did Li Hua express frustration about a client changing requirements?,20260131_14:00,20260131_14:00,Single
|
||||
Who suggested Li Hua keep a clear log of all the changes requested by the client?,Wolfgang Schulz,20260131_14:00,Single
|
||||
What did Wolfgang Schulz suggest Li Hua to do when Li Hua face a difficult client?,keep a clear log of all the changes requested by the client,20260131_14:00,Single
|
||||
What does Li Hua plan to do to clear their head?,go for a walk and find a new café,20260131_14:00,Single
|
||||
What is the name of the café Wolfgang Schulz recommends to Li Hua?,The Java Spot,20260131_14:00,Single
|
||||
Can Li Hua hang his artworks on the wall?,Yes,20260201_20:00,Single
|
||||
What does Adam Smith suggest Li Hua use to hang artwork without damaging the walls?,use removable hooks,20260201_20:00,Single
|
||||
How many times does Li Hua train per week now?,2,20260204_15:00 ,Single
|
||||
When did Li Hua change the training plan?,20260204,20260204_15:00,Single
|
||||
What days does Li Hua takes for the training sessions?,Tuesdays and Thursdays,20260204_15:00,Single
|
||||
When did Li Hua ask Jennifer Moore for tips on dealing with muscle soreness?,20260206_16:00,20260206_16:00,Single
|
||||
What are some of Jennifer Moore's tips for dealing with muscle soreness?,hydration&active recovery&stretching&foam rolling&rest,20260206_16:00,Single
|
||||
Why did Li Hua ask for extra exercises to do at home?,to complement his training schedule,20260205_13:00,Single
|
||||
What exercises did Jennifer Moore suggest Li Hua do at home?,bodyweight squats& plank holds&push-ups&lunges&glute bridges,20260205_13:00,Single
|
||||
What is Adam Smith's condition for Li Hua to add decorations to the basement?,must be reversible and not damage anything,20260219_20:00,Single
|
||||
When was the first time Wolfgang went to Li Hua's basement?,20260221,20260221_15:00,Single
|
||||
What ideas did Wolfgang Schulz suggest for Li Hua's basement practice spot?,good lighting&soundproofing&a comfy chair,20260219_20:10,Single
|
||||
Why did Adam remind Li Hua not to play guitar late at night?,a few neighbors have mentioned they're hearing guitar music late at night,20260216_10:00,Single
|
||||
What type of ambiance does YurikoYamamoto want for her studio's homepage?,more welcoming and engaging,20260223_15:00,Single
|
||||
When did Li Hua invite Adam Smith to check the basement renovation progress?,20260223_19:00,20260223_17:00,Single
|
||||
What is the name of the café where Li Hua and YurikoYamamoto first meeting to talk about Yuriko's website?,Central Perk,20260225_15:00,Single
|
||||
What is the essence of YurikoYamamoto Li Hua is helping with her homepage?,speech therapy,20260223_15:00,Single
|
||||
What type of instrument does Li Hua play in the basement?,guitar,20260223_17:00,Single
|
||||
When did Adam Smith inform Li Hua about potential issues with the pipes in the basement?,20260301_13:00,20260301_13:00,Single
|
||||
When did Li Hua inform Adam Smith that the rent was transferred?,20260301_10:00,20260301_10:00,Single
|
||||
When is the music concert that Wolfgang invites Li Hua to?,20260307_18:00,20260302_18:00,Single
|
||||
What dish does Li Hua agree to bring to the neighborhood potluck dinner?,Homemade pasta salad,20260302_18:45,Single
|
||||
Who is Li Hua meeting with to discuss homepage design updates?,Yuriko Yamamoto,20260303_09:30,Single
|
||||
What new feature does Yuriko Yamamoto consider adding to her studio's homepage?,A blog section,20260307_13:00,Single
|
||||
What time is the power outage in the neighborhood?,2 PM to 3 PM,20260307_14:45,Single
|
||||
What suggestions does Li Hua give for promoting the new scheduling feature?,Showcase it on social media platforms and include a short tutorial and send out a newsletter to clients,20260311_14:30,Single
|
||||
Who invites Li Hua to join the community bake sale?,Adam Smith,20260312_12:30,Single
|
||||
What day and time is the community bake sale taking place?,Sunday at 3 PM,20260312_12:30,Single
|
||||
When does Li Hua request a delivery from Hailey Johnson?,Tuesday,20260314_17:00,Single
|
||||
What is the address where Li Hua wants the bread delivery to be made?,123 Sunny Street,20260314_17:00,Single
|
||||
What service does Hailey Johnson offer to Li Hua?,Doorstep delivery service for fresh milk and bread,20260314_17:00,Single
|
||||
What time does Hailey Johnson start baking?,4 AM,20260317_08:00,Single
|
||||
Where does Li Hua plan to meet Yuriko Yamamoto to show the final website?,Central Perk café,20260317_15:30,Single
|
||||
What does Li Hua suggest to Hailey regarding the frequency of bread delivery?,Twice a week on Mondays and Fridays at 8am,20260318_15:00,Single
|
||||
What does Li Hua agree to bring to the bonfire singing party hosted by Chae Song-hwa?,Li Hua will bring his guitar,20260320_18:00,Single
|
||||
What is the focus of Li Hua's next month's fitness plan according to Jennifer?,Strengthening lower limbs,20260325_19:00,Single
|
||||
What is the building's policy that Adam reminds Li Hua about?,Recycling policy,20260328_15:00,Single
|
||||
What is the topic of the online tutorial Yuriko shares with the group?,Advanced drum techniques,20260329_10:00,Single
|
||||
What is Wolfgang looking for in his new drums?,Something versatile that sounds good for both rock and softer tunes like The Beatles,20260326_16:00,Single
|
||||
What song does Li Hua suggest for the jam session on 20260405?,Viva la Vida,20260405_10:00,Single
|
||||
What does Li Hua think about the rosemary focaccia?,Li Hua thinks the rosemary focaccia is amazing,20260331_14:00,Single
|
||||
When does Li Hua confirm the rent transfer to Adam?,20260331_17:00,20260331_17:00,Single
|
||||
What joke does Wolfgang make as an April Fool's joke?,That Wolfgang bought a set of expensive drums,20260401_15:00,Single
|
||||
Who is delivering the bread to Li Hua on 20260403?,Alice,20260403_08:00,Single
|
||||
What does Li Hua think about improvisation during the jam session?,Improvisation sounds great,20260402_19:00,Single
|
||||
When is ChaeSong-hwa hosting the community medical knowledge lecture?,7 PM on Saturday,20260407_19:00,Single
|
||||
What topics will be covered in the community medical knowledge lecture?,Basics of common health issues and how to prevent them,20260407_19:00,Single
|
||||
What new song does the Jolly band decide to work on for the jam session according to their discussion on 20260410?,Stand By Me,20260410_11:00,Single
|
||||
What is Li Hua's feedback on Chae Song-hwa's medical knowledge lecture?,It is insightful and makes complex topics easy to understand,20260411_21:00,Single
|
||||
When is the anniversary event of Hailey Johnson's bakery shop?,April 15 to 17,20260413_21:00,Single
|
||||
What does Li Hua want to have on Hailey's bakery shop anniversary event?,Sourdough and sweet pastries,20260413_21:00,Single
|
||||
Why does Li Hua ask ChaeSong-hwa about whether neurosurgeons actually use test tubes in their work?,Li Hua is trying to get some insights for a website design,20260414_16:00,Single
|
||||
Who proposes that the band takes a break from jamming this week?,Wolfgang Schulz,20260416_21:00,Single
|
||||
What suggestions does Li Hua propose to Adam Smith about the upcoming community garden renovation?,Add more seating areas for people to relax and enjoy the space and some flower beds with native plants,20260417_11:00,Single
|
||||
What kinds of flowers does Li Hua recommend to Adam Smith for the flower beds?,Lavender and coneflowers and fresh herbs,20260417_11:00,Single
|
||||
What will be a gift for Li Hua if he chooses to renew the fitness contract with Jennifer Moore?,A cool fitness bag as a gift for all the gym activities,20260420_21:00,Single
|
||||
When is the karaoke activity organized by ChaeSong-hwa?,Saturday at 7 PM,20260421_16:00,Single
|
||||
Who is Li Hua bringing to the band's jam session according to their discussion on 20260425?,ChaeSong-hwa,20260425_21:00,Single
|
||||
What garden-related activity is Thrall planning to organize?,A community planting day,20260426_1330,Single
|
||||
What is the proposed solution for making the garden more inviting on sunny days?,Adding shade with umbrellas or trees,20260427_10:30,Single
|
||||
What is the main topic of the conversation on 2026-04-28 at 5 PM?,Breathing techniques and tips for squats during workouts,20260428_17:00,Single
|
||||
When is Wolfgang Schulz's promotion celebration dinner?,6 PM on the day after tomorrow (implied to be 2026-04-30),20260428_18:30,Single
|
||||
What is the name of the Italian restaurant where Wolfgang and Li Hua are having dinner to celebrate Wolfgang's promotion?,Venedia Grancaffe,20260430_17:00,Single
|
||||
What is Li Hua's suggestion for scheduling the water pipe repairs in the garden?,During off-peak hours,20260501_16:00,Single
|
||||
When is the community meeting for the garden project scheduled according to the discussion on 20260507?,Saturday at 10 am,20260507_16:00,Single
|
||||
What percentage discount is Hailey Johnson offering for Mother's Day pastries?,15%,20260508_08:00,Single
|
||||
Which two specific pastries does Hailey Johnson recommend for Mother's Day?,Raspberry-filled croissants and chocolate eclairs,20260508_08:00,Single
|
||||
What type of stretches does JenniferMoore suggest before and after workouts?,Dynamic stretches before and static stretches after,20260510_11:30,Single
|
||||
When is the web design seminar at Wolfgang's company happening?,Thursday at 3 PM,20260511_11:00,Single
|
||||
What is Li Hua looking forward to trying from the summer promotion?,Fruity ice cream flavors and a mango-coconut pastry,20260514_14:00,Single
|
||||
What did Li Hua enjoy the most about the restaurant that he and Wolfgang visited for dinner on 20260514?,The pasta dish and the dessert,20260514_22:00,Single
|
||||
Why is Chae Song-hwa unable to join the rehearsal?,She has to attend a medical lecture,20260515_10:00,Single
|
||||
What type of lighting is preferred for the seating area in the community garden?,Soft white lights,20260516_10:00,Single
|
||||
When is the construction of the garden supposed to start?,This Wednesday (20260520),20260518_10:00,Single
|
||||
Which flowers does RexxarRemar suggest for a vibrant vibe?,Bluebell and Camellia and Tulip,20260518_10:00,Single
|
||||
How does RexxarRemar plan to spend time in the garden once it's done?,For family gatherings and relaxing afternoons,20260518_10:00,Single
|
||||
What does LiHua find as a perfect place to work in his conversation with Chae?,The Lighthouse Cafe,20260521_15:00,Single
|
||||
What songs does Chae propose to the Jolly band to try out on Sunday according to the conversation on 20260521?,"The Yellow Wind Rises and ""To the West""",20260521_20:00,Single
|
||||
What is IllidanStormrage's suggestion to the ongoing garden construction according to the discussion on 20260522?,plan some quiet hours when the kids are playing,20260522_20:00,Single
|
||||
What songs does LiHua suggest adding to the karaoke playlist in his conversation with Chae?,"I Will Survive and ""Sweet Caroline""",20260525_15:00,Single
|
||||
What time does Turalyon plan to limit noisy activities?,From 2-3 pm,20260527_10:00,Single
|
||||
Why does WolfgangSchulz inquire of the band members about health products?,Wolfgang is feeling very tired lately with the overtime and wants to boost up his energy,20260527_23:00,Single
|
||||
What is the name of the song LiHua suggests revisiting for the band's music session according to their conversation on 20260529?,The History of Everything,20260529_17:00,Single
|
||||
Which kind of pastry does Li Hua express his interest in trying in his conversation with Hailey on 20260531?,the new banana durian cheesecake,20260531_20:00,Single
|
||||
When did Li Hua inform Adam Smith that the rent for last month (implied to be May) had been transferred?,2024-06-02 at 10:00,20260602_10:00,Single
|
||||
What is the name of Hailey Johnson's new weekly flavor cheese?,Hazelnut Basque Roasted Cheese,20260603_09:45,Single
|
||||
Why can't people sit on the benches when Turalyon informs the community that the benches have been installed on 20260604?,the paint isn't dry yet,20260604_18:00,Single
|
||||
What is the phone number for the maintenance worker that Adam Smith provided to Li Hua for the broken streetlight?,314159,20260605_18:45,Single
|
||||
What breathing technique does Jennifer Moore suggest for running?,Inhaling for 3 steps and exhaling for 2 steps,20260606_09:00,Single
|
||||
When is the Freelancer Group Meeting scheduled for according to the conversation between LiHua and Yuriko?,This Wednesday at 3 pm,20260608_14:30,Single
|
||||
What suggestions does Jennifer give to LiHua to help boost his endurance?,try incorporating longer cardio sessions and interval training into your routine,20260610_16:00,Single
|
||||
What new pastry does Hailey think is perfect for a fitness lover like Li Hua?,high-protein breads,20260611_12:45,Single
|
||||
Who proposes adding some outdoor games or a small water feature to the children's play areain the community discussion?,Thrall,20260612_15:00,Single
|
||||
Who proposes adding picnic tables for families to enjoy some snacks after playing?,Li Hua,20260612_15:00,Single
|
||||
Who proposes creating a little garden area where kids can help plant flowers or vegetables?,GromHellscream,20260612_15:00,Single
|
||||
Why does LiHua bring up Adam Smith in the band's conversation on 20260616?,LiHua thinks it is really nice for Mr. Smith to rent this basement to us for practice,20260616_10:40,Single
|
||||
What event does LiHua think is an opportunity for the Jolly band to perform in front of the crowd?,the poster saying that the town is holding a local music festival,20260617_17:00,Single
|
||||
What dietary restrictions does LiHua have in his conversation with Hailey?,No dietary restrictions,20260618_11:30,Single
|
||||
What will special guest speaker be talking about as Jennifer mentions to LiHua?,about nutrition for athletes,20260619_0815,Single
|
||||
What is the survey that Jennifer wants LiHua to fill out about?,the experience with our training sessions so far,20260622_11:00,Single
|
||||
When does the special event at Hailey's bakery start according to Hailey and LiHua's conversation on 20260622?,The event starts at 10 AM this Saturday,20260622_13:30,Single
|
||||
What video does LiHua send to the band group chat on 20260624?,"a video of himself playing the intro to ""Stairway to Heaven""",20260624_14:00,Single
|
||||
Waht song does Wolfgang propose adding to the band's practice set on 20260625?,Viva la Vida by Coldplay,20260625_19:00,Single
|
||||
What is the main topic of the conversation between LiHua and ChaeSong-hwa on 20260626?,Chae and her team making a breakthrough in the research study and LiHua congratulating her,20260626_11:00,Single
|
||||
Waht songs does Chae recommend in the band's group discussion on 20260629?,"Uptown Funk or ""Happy""",20260629_18:30,Single
|
||||
What tip does Jennifer give to the gym members on 20260630?,Staying hydrated during workout is very important,20260630_15:30,Single
|
||||
Why deoes the construction have to be postponed according to Tirion Fordring?,the storm and the rain have been going on for days,20260701_10:00,Single
|
||||
What kinds of Thai food does Wolfgang want to try out at the Thai restaurant on 20260702?,pad thai and maybe some spring rolls,20260702_15:00,Single
|
||||
What song does Yuriko propose that the band can practice this weekend according to the band's discussion on 20260703?,Take Me Home Country Roads by John Denver,20260703_11:45,Single
|
||||
What is the name of the song that Yuriko recommend to the band on 20260706?,Rolling in the Deep by Adele,20260706_19:30,Single
|
||||
Why are LiHua and Adam concerned about the basement in their conversation on 20270707?,They want to check if there were any issues in the basement after those rainstorms,20260707_16:00,Single
|
||||
Why is Jennifer checking in in the gym group chat on 20260708?,She wants to hear how the members are all doing and offer some personalized advice.20270708_14:00,Why,Single
|
||||
What is the article that Chae shares with LiHua about?,how to fall asleep faster at night,20260711_11:00,Single
|
||||
What is LiHua's feeback on Hailey's new artisanal donuts?,The flavors are so unique and delicious,20260712_16:00,Single
|
||||
What new flavor is LiHua looking for as he mentions to Hailey in their conversation on 20260714?,a matcha flavor,20260714_12:00,Single
|
||||
Which performance does Chae want the band members to check out in their discussion on 20260715?,"the amazing live performance of ""Bohemian Rhapsody"" by Queen",20260715_19:00,Single
|
||||
Why is LiHua praising Chae in their conversation on 20260717?,LiHua was really amazed by Chae's performance in the band last Sunday because Chae's singing has really leveled up,20260717_12:00,Single
|
||||
What cool thing does Wolfgang find that he wants to share with LiHua?,this awesome guitar Wolfgang found online,20260718_18:00,Single
|
||||
What tips on balancing work and personal hobbies does LiHua give to Chae?,try setting specific hours for work and separate times for your hobbies and don't forget to schedule some fun time for yourself,20260722_13:00,Single
|
||||
What is the size of the basement?,The basement is approximately 15 feet by 20 feet,20260723_14:00,Single
|
||||
What is Wolfgang's favorite superhero?,Iron Man,20260726_16:00,Single
|
||||
What is LiHua's favorite superhero?,Spider-Man,20260726_16:00,Single
|
||||
What sports would LiHua like to see on the community sports day?,some team sports like soccer or basketball,20260727_16:30,Single
|
||||
What activities are being discussed for the community sports day?,Soccer and basketball and tug-of-war and sack race are mentioned as potential activities for the community sports day,20260727_16:30,Single
|
||||
What new hobbies does Wolfgang Schulz propose to Li Hua?,Wolfgang Schulz proposes pottery or painting as a hobby to Li Hua,20260728_18:00,Single
|
||||
What brands of air-conditioners does LiHua recommend to Wolfgang?,Mitsubishi and Daikin,20260729_14:00,Single
|
||||
Who is recommended by Jennifer to share some fitness tips in the gym group chat on 20260730?,LiHua,20260730_17:30,Single
|
||||
What does Hailey Johnson want Li Hua to help with at the bakery according to their discussion on 20260731?,Hailey Johnson wants Li Hua to help gather feedback from customers at the bread-tasting event,20260731_13:00,Single
|
||||
What day do Li Hua and Wolfgang Schulz plan to visit the photography exhibition?,Li Hua and Wolfgang Schulz plan to visit the photography exhibition on Friday evening,20260801_19:00,Single
|
||||
When is Wolfgang and LiHua going to the photography exhibition on 20260805?,19:00,20260805_16:00,Single
|
||||
What new components are being added to the garden project as Turalyon announces it on 20260806?,incorporating sustainable practices like recycling and composting into our renovation project,20260806_16:30,Single
|
||||
Why is Yuriko asking for LiHua's help on 20260807?,She has got ten logo designs for my speech therapy studio and she would love LiHua's opinion on which one stands out the most,20260807_12:00,Single
|
||||
When is the installation of the air-conditioner for the basement?,It's set for Wednesday next week,20260809_12:00,Single
|
||||
Why does Adam reach out to LiHua on 20260810?,To check if LiHua had a chance to look over the warranty and maintenance plans for the air conditioner,20260810_18:00,Single
|
||||
What tips does Jennifer give to LiHua about preventing muscle soreness after a tough workout?,make sure to warm up properly before workouts and cool down afterward & Stretching & Stay hydrated and consider foam rolling post-session to help with recovery,20260811_11:00,Single
|
||||
When is the contractor team going to install the air-conditioner?,20260812 6PM,20260812_11:00,Single
|
||||
When is the garage sale being discussed?,20260814_15:00,20260814_15:00,Single
|
||||
What is the reason Thrall cannot participate in the garage sale?,Thrall has a lot happening with the garden renovations,20260814_15:00,Single
|
||||
What does RexxarRemar suggest to make the garage sale more enjoyable?,RexxarRemar suggests having snacks or drinks at the garage sale,20260814_15:00,Single
|
||||
Who offers to help with setting up for the garage sale?,AdamSmith,20260814_15:00,Single
|
||||
What type of stretches does Sage prefer before hitting the gym?,Sage prefers dynamic stretches,20260816_17:00,Single
|
||||
What does JenniferMoore suggest to improve performance and prevent injuries?,JenniferMoore suggests incorporating stretching techniques,20260816_17:00,Single
|
||||
What does Viper plan to do to increase their stamina?,Viper plans to focus on cardio,20260817_12:15,Single
|
||||
What does Sova plan to do to keep their cardio interesting?,Sova plans to mix it up with different types of cardio exercises,20260817_12:15,Single
|
||||
How much discount is Hailey willing to give to LiHua according to their conversation on 20260816?,15% off,20260816_20:00,Single
|
||||
Who does Li Hua mention as their favorite character from The Witcher 3?,Geralt,20260818_10:00,Single
|
||||
Why does LiHua like Geralt from The Witcher 3?,He's such a complex character with that no-nonsense attitude but deep down he has a great sense of morality & his monster-slaying skills are just epic,20260818_10:00,Single
|
||||
What is Li Hua's opinion about Yennefer's character?,Intense & fiercely independent & her relationship with Geralt evolves,20260818_10:00,Single
|
||||
Which scene stood out for Li Hua in The Witcher 3?,"The ""Battle of Kaer Morhen""",20260818_10:00,Single
|
||||
Which scene in The Witcher 3 does ThaneChambers consider as one of his favorites involving Yennefer?,The scene when Geralt is looking for Yennefer in the early part of the game,20260818_10:00,Single
|
||||
What is Li Hua's impression of the Blood and Wine expansion?,Incredible & new area is stunning & story feels like a mini-epic,20260818_10:00,Single
|
||||
What is Li Hua's opinion on the characters in Succession?,Intense family dynamics and business war,20260818_14:00,Single
|
||||
What TV show does EmilyBurnett recommend to Li Hua after discussing Succession?,The Crown and Ted Lasso,20260818_14:00,Single
|
||||
When does Li Hua plan to meet JakeWatson for soccer practice?,Saturday afternoon at 3 PM,20260819_10:00,Single
|
||||
What are some of the upcoming PS5 exclusives discussed?,Final Fantasy XVI & Marvel's Spider-Man 2 & Ghostwire: Tokyo,20260819_18:00,Single
|
||||
Which upcoming PS5 exclusive game does Gavriel express curiosity about?,Ghostwire: Tokyo,20260819_18:00,Single
|
||||
Which UCL match does Jasper recommend as a classic must-watch?,The 2005 final between Liverpool and AC Milan,20260821_15:00,Single
|
||||
What skill does Li Hua plan to improve with WolfgangSchulz's help?,Data analysis and coding related to AI tools,20260822_17:00,Single
|
||||
Who shared a music theory tutorial with the group?,Chae Song-hwa,20260825_10:45,Single
|
||||
Which regions in Witcher 3 did Li Hua and Thane Chambers discuss?,Skellige and Toussaint,20260826_17:00,Single
|
||||
What is Kendall's relationship with Logan like according to the group's discussion?,Toxic,20260826_18:00,Single
|
||||
What is the topic of discussion between Li Hua and Wolfgang Schulz on burger preferences?,Medium rare vs. well-done meat patty in a classic American,20260828_10:00,Single
|
||||
What is Li Hua's preference for a burger patty's doneness?,Medium rare,20260828_10:00,Single
|
||||
What is Wolfgang's preference for a burger patty's doneness?,well-done,20260828_10:00,Single
|
||||
Why does Wolfgang prefer a well-done meat patty in a burger?,It is safer and more flavorful & he likes a little char on his burger,20260828_10:00,Single
|
||||
What toppings does Wolfgang Schulz prefer on his burger?,Cheese and bacon,20260828_10:00,Single
|
||||
What books does Jennifer recommend to LiHua for deepening understanding of fitness nutrition?,"The New Rules of Lifting and ""Precision Nutrition""",20260831_19:00,Single
|
||||
Why are Wolfgang and LiHua going to the local music store?,check out some new gear and get inspired for our next jam session,20260831_19:00,Single
|
||||
What is the main theme of game group's discussion on 20260901_13:00?,Dutch van der Linde's character and leadership,20260901_13:00,Single
|
||||
Who is the most hateable character in Game of Thrones in Orion's opinion?,Ramsay Bolton,20260902_16:00,Single
|
||||
Who is the most hateable character in Game of Thrones in Merrick's opinion?,Cersei Lannister,20260902_16:00,Single
|
||||
What is the common interest between Li Hua and Jake Watson?,Soccer,20260903_17:00,Single
|
||||
What kind of songs are Li Hua and Chae Song-hwa considering adding to their playlist for babies?,Lullabies,20260904_10:00,Single
|
||||
What makes Wolfgang feel under a lot of pressure according to his conversation with LiHua on 20260905?,this new software project,20260905_14:00,Single
|
||||
Why does AdamSmith ask LiHua about the basement on 20260907?,check in and see how the basement held up after the storm,20260907_14:00,Single
|
||||
How is the basement after the storm?,The basement is all good no water leakage at all,20260907_14:00,Single
|
||||
What is the latest good news from Wolfgang as of 20260908?,software project finally made some significant progress,20260908_15:00,Single
|
||||
What new songs are LiHua working on lately on his guitar as of 20260908?,"Blackbird and ""Hotel California""",20260908_15:00,Single
|
||||
What mission does Gavriel mention that shows Arthur's growth in the game?,the scene where Arthur tells John to take care of his family,20260909_10:00,Single
|
||||
What does Aisling find emotionally impactful about Red Dead Redemption 2's ending?,Arthur's realization of his fate and the music during the last ride,20260909_10:00,Single
|
||||
What side mission in Red Dead Redemption 2 resonated with Jareth?,the side mission with the widow in the honor system,20260909_10:00,Single
|
||||
What was ThaneChambers' favorite horse-related side quest in Red Dead Redemption 2?,the “troubled” horse,20260909_10:00,Single
|
||||
What funny glitch did Fionnuala experience in the game Red Dead Redemption 2?,Arthur ended up floating in mid-air after a cutscene,20260909_10:00,Single
|
||||
What was the final showdown that Elara found unforgettable in Red Dead Redemption 2?,with Dutch and Micah,20260909_10:00,Single
|
||||
How does Bronwyn describe the epilogue of the game Red Dead Redemption 2?,captivating after seeing John try to make a life for himself,20260909_10:00,Single
|
||||
Has LiHua ever met anyone like Shedlon Cooper in his real life?,No,20260910_12:00,Single
|
||||
Has EmilyBurnett ever met anyone like Shedlon Cooper in her real life?,No,20260910_12:00,Single
|
||||
What games are LiHua and ThaneChambers planning to check out during their shopping trip according to their discussion on 20260911?,"Spider-Man: Miles Morales and ""Demon's Souls""",20260911_14:00,Single
|
||||
Who is the best midfielder in the past decade in JakeWatson's opinion?,Luka Modrić,20260912_16:00,Single
|
||||
How many servings of vegetables should LiHua aim for each day to keep body in good shape according to Jennifer?,at least 5 servings of veggies a day,20260913_18:00,Single
|
||||
What does LiHua ask Jake about soccer?,the best strategies for soccer players to avoid injuries during games,20260914_10:00,Single
|
||||
What is Jake's first tip for avoiding soccer injuries?,proper warm-up and stretching,20260914_10:00,Single
|
||||
What does Jake suggest to improve flexibility and prevent muscle issues?,staying hydrated,20260914_10:00,Single
|
||||
Why is wearing the right footwear important in soccer?,to avoid slips or sprains,20260914_10:00,Single
|
||||
How does strength training help prevent soccer injuries?,It builds muscles around joints & reduces injury risk & improves stability and endurance,20260914_10:00,Single
|
||||
What type of exercises does Jake recommend for strength training?,squats & lunges & planks & balance exercises like single-leg stands,20260914_10:00,Single
|
||||
What is the purpose of starting with bodyweight exercises?,for those who are new to strength training,20260914_10:00,Single
|
||||
Who does EmilyBurnett think is a standout character in the Game of Thrones series?,Tyrion Lannister,20260915_12:00,Single
|
||||
What does Lachlan appreciate about Arya Stark's character development?,Her transition from an innocent girl to a fierce assassin,20260915_12:00,Single
|
||||
What moment does Rowan find mind-blowing in Arya's storyline?,When Arya confronts and takes down the Night King,20260915_12:00,Single
|
||||
Which moment involving Tyrion is Merrick's favorite?,When Tyrion blows up the Wildfire to save King's Landing,20260915_12:00,Single
|
||||
What memorable quote is mentioned by Phaedra?,"Jon Snow's ""You know nothing"" scene with Ygritte",20260915_12:00,Single
|
||||
"What does Rowan love about Tyrion's ""I drink and I know things"" line?",It captures his cleverness,20260915_12:00,Single
|
||||
What aspect of Jaime Lannister's storyline does Rowan find intriguing?,His transformation after meeting Brienne of Tarth,20260915_12:00,Single
|
||||
How does Phaedra view Jaime's character arc?,"Fascinating as he goes from ""Kingslayer"" to someone who values honor",20260915_12:00,Single
|
||||
What does Lachlan think about Jaime's final decision to protect Cersei?,It complicates his redemption and leaves mixed feelings,20260915_12:00,Single
|
||||
How does Saffron feel about Jaime's choice to protect Cersei?,Conflicted because it seems like he was slipping back into old ways,20260915_12:00,Single
|
||||
What does Kieran wish Jaime had done differently?,Chosen a different path after his character development,20260915_12:00,Single
|
||||
How does Niamh think Jaime could've ended up differently?,By staying true to the lessons he learned from Brienne,20260915_12:00,Single
|
||||
What could Jaime and Brienne have done to rebuild trust according to Merrick?,Worked on rebuilding trust through actions rather than just words,20260915_12:00,Single
|
||||
"What is LiHua curious about regarding the ""Man in Black"" in Westworld?",The motive behind his character,20260916_16:00,Single
|
||||
"What does EmilyBurnett think is the main motive for the ""Man in Black""?",To find deeper meaning and fulfillment in Westworld 20260916_16:00,What,Single
|
||||
"How does LiHua feel about the ""Man in Black's"" search for meaning?",Intrigued as it adds depth to the story,20260916_16:00,Single
|
||||
"What is EmilyBurnett's favorite moment of the ""Man in Black""?",When he confronts the truth about himself and his choices,20260916_16:00,Single
|
||||
"Which moment does LiHua find memorable for the ""Man in Black""?",When he shows vulnerability,20260916_16:00,Single
|
||||
What does LiHua think about the themes of Westworld?,They are thought-provoking by diving into consciousness and free will and what it means to be human,20260916_16:00,Single
|
||||
Which theme of Westworld resonates the most for LiHua?,The exploration of free will and choice,20260916_16:00,Single
|
||||
What does EmilyBurnett think about the future relevance of Westworld's themes?,They will become even more crucial as technology advances,20260916_16:00,Single
|
||||
What does WolfgangSchulz suggest incorporating into practice sessions according to the discussion on 20260916?,Improvisational solos,20260916_19:00,Single
|
||||
How does LiHua feel about the idea of improvisation based on the chat on 20260916?,It will make sessions more fun and creative,20260916_19:00,Single
|
||||
Who does JakeWatson consider the best defender in the history of FC Barcelona?,Carles Puyol,20260917_10:00,Single
|
||||
What does LiHua think about Gerard Piqué as a defender?,He deserves a mention for his skill and intelligence,20260917_10:00,Single
|
||||
Who are JakeWatson's and LiHua's favorite defenders?,Carles Puyol and Gerard Piqué,20260917_10:00,Single
|
||||
What is JakeWatson's favorite memory of Puyol and Piqué playing together?,The comeback against PSG in the Champions League,20260917_10:00,Single
|
||||
What match does JakeWatson cherish from the Champions League final against Manchester United?,The 2009 final,20260917_10:00,Single
|
||||
What is LiHua's favorite match involving Barcelona?,The 2013 Champions League match against AC Milan,20260917_10:00,Single
|
||||
Which goal does JakeWatson never forget from Messi?,Messi's solo goal against Getafe in 2007,20260917_10:00,Single
|
||||
What is LiHua's favorite player moment from Messi?,When Messi scored a header against Manchester United in 2011,20260917_10:00,Single
|
||||
What does LiHua think makes Messi the best?,His ability to create chances as well as his vision and work ethic and humility,20260917_10:00,Single
|
||||
How does JakeWatson view Messi's impact on future generations of players?,His dedication and consistency set the bar high for young players,20260917_10:00,Single
|
||||
Who suggests hitting Starbucks after work on 20260918?,WolfgangSchulz,20260918_18:00,Single
|
||||
What is JenniferMoore's reminder to LiHua about on 20260919?,Getting enough protein after workouts,20260919_10:00,Single
|
||||
Why is protein important according to JenniferMoore?,For keeping the body in shape,20260919_10:00,Single
|
||||
What is ThaneChambers' question to LiHua on 20260920?,LiHua's favorite first-person shooter game,20260920_16:00,Single
|
||||
Which game does LiHua mention he's been playing on 20260920?,Call of Duty: Warzone,20260920_16:00,Single
|
||||
"What does ThaneChambers say about ""Warzone""?",It's a blast,20260920_16:00,Single
|
||||
"What does LiHua like most about ""Warzone""?",The strategy involved and the adrenaline rush,20260920_16:00,Single
|
||||
"What is ThaneChambers' memorable ""Warzone"" experience?",Coming back from a tough spot to win at the last second,20260920_16:00,Single
|
||||
"What was LiHua's unforgettable match in ""Warzone""?",Being down to the last two and outsmarting the last team,20260920_16:00,Single
|
||||
"How does ThaneChambers feel about upcoming ""Warzone"" updates?",He's looking forward to them,20260920_16:00,Single
|
||||
"Does LiHua plan to play ""Warzone"" after the updates?",Yes because he loves exploring fresh content,20260920_16:00,Single
|
||||
"What does ThaneChambers propose to do with the ""Warzone"" updates?",Hop on together and tackle the new stuff,20260920_16:00,Single
|
||||
Who initiated the discussion about the community garden renovation and its benefits to local businesses?,Turalyon,20260921_10:00,Single
|
||||
What idea did IllidanStormrage propose to help involve the community in supporting local businesses?,Putting together a flyer with a list of local businesses,20260921_10:00,Single
|
||||
How does GromHellscream believe the local shops will benefit from the completed garden?,More foot traffic means more customers for them,20260921_10:00,Single
|
||||
What does ArthasMenethil describe the outcome of the garden project and supporting local vendors as?,A win-win,20260921_10:00,Single
|
||||
What does Thrall suggest they create to support local businesses?,A list of businesses they love to support,20260921_10:00,Single
|
||||
How does TirionFordring emphasize the importance of balancing the garden renovation with supporting local vendors?,By stating that both aspects are important for the best outcome,20260921_10:00,Single
|
||||
What does AdamSmith agree is crucial to align with the garden project schedule?,Support for local businesses,20260921_10:00,Single
|
||||
Who does LiHua ask for help with installing a curtain?,AdamSmith,20260921_16:00,Single
|
||||
When does AdamSmith say he is available to help with the curtain installation?,Wednesday or Thursday afternoon,20260921_16:00,Single
|
||||
On which day do LiHua and AdamSmith agree to install the curtain?,Thursday afternoon,20260921_16:00,Single
|
||||
Who will bring the tools for the curtain installation?,AdamSmith,20260921_16:00,Single
|
||||
Who suggests exploring the PS5 settings for accessibility features and Game Help?,ThaneChambers,20260922_12:00,Single
|
||||
What feature on the PS5 does Ileana praise for making switching between games smoother?,Control Center,20260922_12:00,Single
|
||||
How does Fionnuala describe the convenience of the Control Center's audio settings adjustment?,You can adjust audio settings on the fly without having to pause the game,20260922_12:00,Single
|
||||
What aspect does ThaneChambers enjoy about sharing clips on the PS5?,It's a great way to relive intense moments with friends,20260922_12:00,Single
|
||||
Which game is LiHua currently obsessed with for its graphics and storytelling?,God of War,20260922_12:00,Single
|
||||
What game is Helios currently playing and finds the world and gameplay mechanics stunning?,Horizon Forbidden West,20260922_12:00,Single
|
||||
"How does Dyllan describe the combat in ""Horizon Forbidden West""?",It can be tricky sometimes,20260922_12:00,Single
|
||||
"What strategy does Helios recommend for dealing with flying machines in ""Horizon Forbidden West""?",Using a bow with elemental arrows and focus to track their movements,20260922_12:00,Single
|
||||
Which game does Helios suggest for tactical fun with a great story and character development?,Fire Emblem,20260922_12:00,Single
|
||||
What type of combat does Gavriel prefer in games?,Turn-based combat,20260922_12:00,Single
|
||||
What game has Gavriel been hooked on recently for its blend of RPG elements and social simulation?,Persona 5 Royal,20260922_12:00,Single
|
||||
Which game does Dyllan recommend for turn-based combat and art style?,Octopath Traveler,20260922_12:00,Single
|
||||
"Who is ThaneChambers' favorite character in ""Octopath Traveler""?",Primrose,20260922_12:00,Single
|
||||
"What is Aisling's opinion on using traps in ""Octopath Traveler""?",They are a game changer,20260922_12:00,Single
|
||||
"How does Dyllan describe the importance of a balanced party in ""Octopath Traveler""?",It allows for more flexibility in battles and helps adapt to different challenges,20260922_12:00,Single
|
||||
"What is Bronwyn's experience with going full support in tough fights in ""Octopath Traveler""?",It can be risky but totally viable,20260922_12:00,Single
|
||||
"Which boss did Gavriel struggle with the most in ""Octopath Traveler""?",The final boss,20260922_12:00,Single
|
||||
What does Elara enjoy doing to unwind after intense battles in games?,Wander around and collect items or side quests,20260922_12:00,Single
|
||||
What type of gameplay does Jareth enjoy after a big challenge?,Relaxing gameplay,20260922_12:00,Single
|
||||
Who asks Emily if she has read The Lord of the Rings novels?,LiHua,20260922_17:00,Single
|
||||
What TV series does LiHua mention that they are curious about in relation to The Lord of the Rings?,The Rings of Power,20260922_17:00,Single
|
||||
"What aspect of ""The Rings of Power"" does Emily appreciate in terms of visuals?",How they are bringing Middle-earth to life,20260922_17:00,Single
|
||||
Which character does LiHua mention as their favorite and describe as inspiring?,Galadriel,20260922_17:00,Single
|
||||
What aspect of Elrond's character does Emily appreciate?,His mix of wisdom and vulnerability,20260922_17:00,Single
|
||||
"What does LiHua predict will happen in ""The Rings of Power""?",Epic battles and alliances forming,20260922_17:00,Single
|
||||
What does Emily think will happen next in the series in terms of story development?,The stakes will rise and alliances will evolve,20260922_17:00,Single
|
||||
What do LiHua and Emily agree on regarding the anticipation for the next episode?,They both can't wait for the next episode,20260922_17:00,Single
|
||||
What does LiHua suggest they should do as the series unfolds?,Keep sharing their thoughts,20260922_17:00,Single
|
||||
What does Emily say about the series making her feel like she is really getting to know the characters?,The emphasis on character development and their backstories,20260922_17:00,Single
|
||||
Who initiates the suggestion to go to the grocery store after work on 20260923?,WolfgangSchulz,20260923_16:00,Single
|
||||
What is the purpose of going to the grocery store on 20260923 according to WolfgangSchulz?,To grab some snacks for their next jam,20260923_16:00,Single
|
||||
What time does WolfgangSchulz propose for going to the grocery store on 20260923?,Around 5:30,20260923_16:00,Single
|
||||
How does LiHua feel about the bench press challenge proposed by JenniferMoore?,LiHua feels pretty strong and is up for the challenge,20260924_20:00,Single
|
||||
What exercises does JakeWatson recommend for leg strength?,Squats and lunges,20260925_21:00,Single
|
||||
What additional advice does JakeWatson give for maintaining muscle flexibility?,Stretch after workouts,20260925_21:00,Single
|
||||
Which stretches does JakeWatson recommend for soccer-specific areas?,Hamstring stretch and quad stretch,20260925_21:00,Single
|
||||
How long should each stretch be held according to JakeWatson?,About 30 seconds,20260925_21:00,Single
|
||||
When and where does JakeWatson propose to meet for the practice session according to the conversation on 20260925?,Saturday at 4 PM at the usual spot,20260925_21:00,Single
|
||||
What is LiHua's response to the suggestion of taking a warm shower before bed?,LiHua plans to try it tonight,20260926_10:30,Single
|
||||
Who asks LiHua to measure the window in the basement?,AdamSmith,20260928_10:00,Single
|
||||
Why does AdamSmith want the window measured?,To get some curtains made,20260928_10:00,Single
|
||||
What are the dimensions of the window that LiHua measures?,150 cm wide and 120 cm high,20260928_10:00,Single
|
||||
Who initiates the discussion about binaural tones?,ChaeSong-hwa,20260929_11:00,Single
|
||||
What does ChaeSong-hwa suggest binaural tones could enhance?,The sound and listening experience for their audience,20260929_11:00,Single
|
||||
How does YurikoYamamoto feel about adding a new dimension to their music with binaural tones?,Excited to give it a try,20260929_11:00,Single
|
||||
Who congratulates LiHua on pushing their limits with the bench press?,JenniferMoore,20260930_14:00,Single
|
||||
What does LiHua express feeling after the encouragement from Jennifer?,Strong and motivated,20260930_14:00,Single
|
||||
What is the focus for the next session suggested by JenniferMoore on 20260930?,Form and control,20260930_14:00,Single
|
||||
What technique does JenniferMoore suggest to gradually increase weight of the bench press exercise?,Progressive overload,20260930_14:00,Single
|
||||
Who suggests going out for hot pot?,LiHua,Time: 20260930_16:00,Single
|
||||
What type of hot pot does WolfgangSchulz prefer?,Sichuan,Time: 20260930_16:00,Single
|
||||
What time does LiHua propose for dinner on 20260930?,7 pm,Time: 20260930_16:00,Single
|
||||
Who comes up with the idea of picking up drinks on the way to hot pot?,LiHua,Time: 20260930_16:00,Single
|
||||
What type of drink does LiHua suggest to have with hot pot?,Cold beer,Time: 20260930_16:00,Single
|
||||
Who reminds everyone to drink water before and during the match?,JakeWatson,20261001_18:00,Single
|
||||
What does Ivor do the day before a game to help with hydration?,Hydrate well,20261001_18:00,Single
|
||||
What drink does JakeWatson recommend for hydration after a match?,Coconut water,20261001_18:00,Single
|
||||
What does Giselle emphasize about passing drills?,Accuracy is key in games,20261001_18:00,Single
|
||||
What practice idea does Dacey suggest to improve passing skills?,Target practice with cones,20261001_18:00,Single
|
||||
What does Briar remind everyone to bring to practice?,Water bottles,20261001_18:00,Single
|
||||
What does Giselle remind everyone to do before practice?,Stretch,20261001_18:00,Single
|
||||
Who is considering buying Sekiro: Shadows Die Twice and asks for ThaneChambers' opinion?,LiHua,20261001_20:00,Single
|
||||
What does ThaneChambers think about Sekiro: Shadows Die Twice?,It's amazing and worth grabbing,20261001_20:00,Single
|
||||
What aspects of Sekiro: Shadows Die Twice does ThaneChambers highlight?,Unique combat system & timing & strategy & world design,20261001_20:00,Single
|
||||
Does ThaneChambers recommend Sekiro: Shadows Die Twice to those who love a challenge?,Yes,20261001_20:00,Single
|
||||
What game is ThaneChambers currently playing?,Black Myth: Wukong,20261001_20:00,Single
|
||||
What does LiHua think about the graphics and gameplay of Black Myth: Wukong?,The graphics look insane and the gameplay seems really fluid,20261001_20:00,Single
|
||||
What part of Black Myth: Wukong does ThaneChambers like the most?,The fresh take on the classic tale and the unique combat mechanics,20261001_20:00,Single
|
||||
Which aspect of Black Myth: Wukong does LiHua find immersive?,The combat style and how it blends with the storyline,20261001_20:00,Single
|
||||
What does ThaneChambers love about the boss battles in Black Myth: Wukong?,The thrill of strategizing how to take them down and the breathtaking scenery,20261001_20:00,Single
|
||||
Who informs LiHua that the new software is almost done?,WolfgangSchulz,20261002_13:00,Single
|
||||
What is LiHua's reaction to the news about the new software?,LiHua is excited to check it out,20261002_13:00,Single
|
||||
Who asks Jake to play as the goalkeeper for soccer training?,LiHua,20261003_12:00,Single
|
||||
Does Jake agree to play as the goalkeeper?,Yes,20261003_12:00,Single
|
||||
Who discusses classic matches between Barca and Bayern with Jake?,LiHua,20261005_10:05,Single
|
||||
In which year did Barca lose 4-0 against Bayern as mentioned in the conversation?,2013,20261005_10:05,Single
|
||||
Which goal does Jake consider incredible from the 2015 match?,Neymar's goal,20261005_10:05,Single
|
||||
Who does Jake believe has the potential to be a game-changer for Barca?,Pedri,20261005_10:05,Single
|
||||
What is LiHua looking forward to in the upcoming matches?,Seeing how Pedri and the team perform and if they can go far in the UCL,20261005_10:05,Single
|
||||
What are Jake's go-to snacks for match days?,Chips and dip & pizza,20261005_10:05,Single
|
||||
What are LiHua's favorite snacks for the game?,Nachos with cheese and spicy salsa,20261005_10:05,Single
|
||||
Who does LiHua pick as the best football manager of all time?,Pep Guardiola,20261006_10:00,Single
|
||||
What aspect of Guardiola's career does JakeWatson admire?,The transformation of the teams he's managed,20261006_10:00,Single
|
||||
"Which team's style of play does Farrah remember as ""Total football""?",Barcelona,20261006_10:00,Single
|
||||
What playing style is associated with Guardiola's Barcelona team?,Tiki-taka,20261006_10:00,Single
|
||||
Which player's development under Guardiola does Jasper mention?,Phil Foden,20261006_10:00,Single
|
||||
What player's midfield importance does Aurora highlight?,Rodri,20261006_10:00,Single
|
||||
Who will bring cones for drills at the match?,Farrah,20261006_10:00,Single
|
||||
What type of passes does Briar suggest adding to the practice?,Quick one-twos,20261006_10:00,Single
|
||||
What will LiHua bring for everyone to enjoy after the match?,Snacks,20261006_10:00,Single
|
||||
Which match does Jasper mention that Manchester City played recently?,Against Chelsea in the Premier League,20261006_10:00,Single
|
||||
Who scored a stunner in the Manchester City vs. Chelsea match as mentioned in the conversation?,Phil Foden,20261006_10:00,Single
|
||||
What aspect of Rodri's performance does Henley praise?,His interceptions and passing,20261006_10:00,Single
|
||||
What does Farrah admire about Rodri's play?,His ability to break up opposition plays and transition the ball,20261006_10:00,Single
|
||||
Who checks in with LiHua about their arm muscles after the bench press session?,JenniferMoore,20261008_14:00,Single
|
||||
How does LiHua feel about the soreness in their arm muscles?,It feels good and they are ready for the next session,20261008_14:00,Single
|
||||
What snack does LiHua agree to bring for the movie?,Chips,20261009_17:00,Single
|
||||
What drinks does WolfgangSchulz want to have while watching the movie?,Soda and juice,20261009_17:00,Single
|
||||
Who will pick up the snacks and drinks on their way before the movie?,LiHua,20261009_17:00,Single
|
||||
Who initiated the discussion about Cersei's journey in Game of Thrones?,EmilyBurnett,20261010_10:10,Single
|
||||
How does Rowan describe Cersei's transformation throughout the series?,From vulnerable to ruthless,20261010_10:10,Single
|
||||
What significant event in Cersei's life does Saffron highlight as a turning point?,Losing her kids,20261010_10:10,Single
|
||||
Which character's dynamic with Cersei is mentioned as complicated?,Jaime,20261010_10:10,Single
|
||||
What does Lachlan find haunting about Cersei's end?,How much she fought to hold onto her power,20261010_10:10,Single
|
||||
What idea does Rowan say Cersei represents?,Power comes at a cost,20261010_10:10,Single
|
||||
What aspect of Cersei's character does Quillan say often shadows her vulnerability?,Her desperation for control,20261010_10:10,Single
|
||||
What relationship could have potentially changed Cersei's path according to Phaedra?,Her relationship with Tyrion,20261010_10:10,Single
|
||||
How does Kieran describe Tywin's influence on Cersei?,He taught her ruthless methods,20261010_10:10,Single
|
||||
What does Orion say about Cersei's mix of loyalty and fear towards Tywin?,It was a mix of both,20261010_10:10,Single
|
||||
What does Rowan say about Cersei's independence?,It made her stronger but also more isolated,20261010_10:10,Single
|
||||
What does LiHua say Cersei's power came at the price of?,Meaningful connections,20261010_10:10,Single
|
||||
What relationship could have shown Cersei a different perspective on power according to Lachlan?,Her relationship with Tyrion,20261010_10:10,Single
|
||||
What does Tamara say Cersei's love led to?,Her downfall,20261010_10:10,Single
|
||||
What could have changed everything for Cersei if she had embraced it according to Tamara?,Her family ties,20261010_10:10,Single
|
||||
What does EmilyBurnett say Cersei's choices led to?,Her isolation,20261010_10:10,Single
|
||||
What does Quillan suggest could have been a game-changer for Cersei?,Accepting Tyrion's advice,20261010_10:10,Single
|
||||
What does Saffron say Cersei was blinded by?,Her need for control,20261010_10:10,Single
|
||||
Who initiates the discussion about game storytelling?,ThaneChambers,20261011_11:00,Single
|
||||
What game does ThaneChambers consider their top choice for storytelling?,The Last of Us,20261011_11:00,Single
|
||||
Which game's narrative does Bronwyn appreciate for its father-son dynamic?,God of War,20261011_11:00,Single
|
||||
What game does Gavriel mention for its characters and plot twists?,Final Fantasy VII Remake,20261011_11:00,Single
|
||||
Which game does ThaneChambers recommend for its choice-based storytelling?,Life is Strange,20261011_11:00,Single
|
||||
What game does Elara mention for making her emotional?,Bioshock Infinite,20261011_11:00,Single
|
||||
Which game does Jareth praise for its storytelling in side quests?,The Witcher 3,20261011_11:00,Single
|
||||
What game does Dyllan describe as epic for its samurai culture?,Ghost of Tsushima,20261011_11:00,Single
|
||||
Which game does Helios call a masterpiece for making players feel connected to the characters?,Red Dead Redemption 2,20261011_11:00,Single
|
||||
"What scene from ""Red Dead Redemption 2"" does Bronwyn mention as emotional?",The scene with Arthur and John on the mountain,20261011_11:00,Single
|
||||
What game does Ileana say made her question the morality of her actions?,Red Dead Redemption 2,20261011_11:00,Single
|
||||
Which game does Caelum mention for encouraging emotional engagement with the narrative?,Red Dead Redemption 2,20261011_11:00,Single
|
||||
What game does Bronwyn say mixes daily life with deep story arcs?,Persona 5,20261011_11:00,Single
|
||||
Which game does Ileana mention for its unique world and storytelling?,Horizon Zero Dawn,20261011_11:00,Single
|
||||
What game does LiHua mention for exploring themes of choice & consciousness & humanity?,Detroit: Become Human,20261011_11:00,Single
|
||||
Which game does Dyllan say raised interesting questions and moral dilemmas?,Detroit: Become Human,20261011_11:00,Single
|
||||
What game does Dyllan recall for its intimate feel and dialogue?,Firewatch,20261011_11:00,Single
|
||||
Which game does LiHua mention for playing with various storytelling tropes?,The Stanley Parable,20261011_11:00,Single
|
||||
When did Li Hua receive the curtains?,Yesterday (implied to be 20261011),20261012_10:00,Single
|
||||
What does Li Hua like about the curtains?,the patterns,20261012_10:00,Single
|
||||
Who offers to help with the installation of the curtains?,AdamSmith,20261012_10:00,Single
|
||||
What does Turalyon want to hear everyone's thoughts on?,the new benches,20261013_13:30,Single
|
||||
What does MuradinBronzebeard think about the cushions idea?,It sounds like a nice idea,20261013_13:30,Single
|
||||
What does MalfurionStormrage suggest for sunny days?,umbrellas for shade,20261013_13:30,Single
|
||||
Who proposes the idea of adding tables near the benches?,RexxarRemar,20261013_13:30,Single
|
||||
What does GromHellscream suggest to brighten the area in the evening?,lanterns or string lights,20261013_13:30,Single
|
||||
What type of lanterns does MalfurionStormrage recommend?,solar-powered lanterns,20261013_13:30,Single
|
||||
Who is excited to see how the garden evolves?,Thrall,20261013_13:30,Single
|
||||
What does Turalyon think about the suggestions?,They are going to make the garden a fantastic community spot,20261013_13:30,Single
|
||||
What does ChaeSong-hwa suggest for the garden once everything is set up?,a little gathering,20261013_13:30,Single
|
||||
What kind of activities does MalfurionStormrage suggest for the gathering?,games for kids & flower planting workshops & good food,20261013_13:30,Single
|
||||
Who offers to check with friends for live music performance?,ArthasMenethil,20261013_13:30,Single
|
||||
What does MuradinBronzebeard think about live music?,It would really enhance the experience,20261013_13:30,Single
|
||||
Who suggests doing a group purchase of the new games?,ThaneChambers,20261014_14:00,Single
|
||||
What does ThaneChambers think about the new games?,excited and interested,20261014_14:00,Single
|
||||
What feature are they hoping for in the new game?,co-op missions and shared loot system,20261014_14:00,Single
|
||||
What does Helios want to know about the game?,whether it will feature crossplay,20261014_14:00,Single
|
||||
What does Dyllan think about crossplay?,it was awesome and makes playing together much easier,20261014_14:00,Single
|
||||
What does Ileana think is essential for the new game?,a robust matchmaking system,20261014_14:00,Single
|
||||
What does Fionnuala think about matchmaking systems?,they can make or break the experience,20261014_14:00,Single
|
||||
What does Helios think would be awesome for character development?,a deep skill tree or upgrade system,20261014_14:00,Single
|
||||
What does Gavriel think about crafting systems?,they offer a chance to create unique weapons and gear,20261014_14:00,Single
|
||||
What does Elara think about resource management?,it always makes it feel more immersive,20261014_14:00,Single
|
||||
What does Fionnuala think about potions and temporary boosts?,they are always helpful,20261014_14:00,Single
|
||||
What does ThaneChambers think about stealth boosts?,they add a whole new layer to gameplay,20261014_14:00,Single
|
||||
What does Bronwyn think about the idea of stealth missions?,it will be all about timing and communication,20261014_14:00,Single
|
||||
What does LiHua think about the potential for stealth in the game?,it could lead to some really engaging and strategic gameplay,20261014_14:00,Single
|
||||
What team does JakeWatson admire for their comebacks?,Real Madrid,20261014_20:00,Single
|
||||
What is one characteristic of Real Madrid that JakeWatson mentions?,they thrive under pressure,20261014_20:00,Single
|
||||
Which match does JakeWatson remember for Real Madrid's comeback?,Champions League against Manchester City in 2022,20261014_20:00,Single
|
||||
What was special about the match against Manchester City in 2022 according to the group discussion?,Rodrygo's stoppage-time goal,20261014_20:00,Single
|
||||
What does Aurora say about the atmosphere at the Bernabéu during tough games?,It was electric,20261014_20:00,Single
|
||||
What does Evangeline say about Real Madrid's history in the Champions League?,They have a legendary history and a winning culture,20261014_20:00,Single
|
||||
What does Briar think about Real Madrid's chances for the rest of the season?,They have a solid chance to challenge for every title,20261014_20:00,Single
|
||||
Who is Real Madrid's coach according to Ivor?,Ancelotti,20261014_20:00,Single
|
||||
What does Dacey propose to practice on Saturday?,soccer shooting skills,20261014_20:00,Single
|
||||
Where do they plan to meet for soccer practice?,the local park,20261014_20:00,Single
|
||||
What time do they plan to meet for soccer practice?,3 PM,20261014_20:00,Single
|
||||
Who offers to bring cones for the soccer practice?,Aurora,20261014_20:00,Single
|
||||
What else does Henley plan to work on during the practice?,dribbling skills,20261014_20:00,Single
|
||||
Who suggests getting pizza for dinner?,WolfgangSchulz,20261015_15:00,Single
|
||||
What does LiHua want to know about WolfgangSchulz's pizza preference?,his favorite place for pizza,20261015_15:00,Single
|
||||
What new pizza place does WolfgangSchulz suggest?,a new pizza place downtown,20261015_15:00,Single
|
||||
What time do they agree to meet for pizza?,around 7,20261015_15:00,Single
|
||||
Who invites LiHua to taste a new bread recipe?,HaileyJohnson,20261016_16:00,Single
|
||||
What does HaileyJohnson want LiHua to taste?,a new bread recipe,20261016_16:00,Single
|
||||
What type of pizza does WolfgangSchulz prefer?,margherita,20261017_17:00,Single
|
||||
What pizza does LiHua like?,pepperoni with extra cheese,20261017_17:00,Single
|
||||
What drink does WolfgangSchulz usually have with his pizza?,soda or craft beer,20261017_17:00,Single
|
||||
What is LiHua's suggestion regarding pizza places?,to check out the new downtown place,20261017_17:00,Single
|
||||
What do LiHua and Wolfgang plan to do together according to their conversation on 20261017?,plan a pizza night,20261017_17:00,Single
|
||||
What songs does ChaeSong-hwa suggest starting with according to the discussion on 20261019?,"Sweet Home Alabama or ""Let It Be""",20261019_19:00,Single
|
||||
What type of songs does YurikoYamamoto want to try according to the discussion on 2026101?,upbeat pop songs,20261019_19:00,Single
|
||||
What songs does ChaeSong-hwa suggest adding to the mix according to the discussion on 2026101?,"Shallow or ""Shake It Off""",20261019_19:00,Single
|
||||
What does ChaeSong-hwa plan to bring to karaoke night?,snacks,20261019_19:00,Single
|
||||
What does WolfgangSchulz suggest the group do before the upcoming karaoke night?,pick their songs,20261019_19:00,Single
|
||||
How does LiHua feel about the upcoming karaoke night?,excited and can't wait,20261019_19:00,Single
|
||||
What does ChaeSong-hwa say about the karaoke night?,it's going to be a blast,20261019_19:00,Single
|
||||
What is the current status of the curtains according to LiHua?,They are perfect and more than satisfies him,20261019_20:00,Single
|
||||
What offer does AdamSmith extend to LiHua regarding the basement?,It can be used for extra storage whenever LiHua is ready,20261019_20:00,Single
|
||||
What game does ThaneChambers remember as LiHua's favorite?,Black Myth: Wukong,20261020_10:00,Single
|
||||
"What aspect of ""Black Myth: Wukong"" does LiHua find the most impressive?",the visuals,20261020_10:00,Single
|
||||
"What does ThaneChambers like about the game ""Black Myth: Wukong""?",the storytelling,20261020_10:00,Single
|
||||
"Who is LiHua's favorite character in the game ""Black Myth: Wukong""?",Wukong,20261020_10:00,Single
|
||||
"What memorable moment does LiHua share from the game ""Black Myth: Wukong""?",an amazing battle scene where he had to outsmart a giant enemy,20261020_10:00,Single
|
||||
"What memorable moment does ThaneChambers share from the game ""Black Myth: Wukong""?",a moment where he faced a tricky puzzle,20261020_10:00,Single
|
||||
"How does LiHua describe the balance of combat and puzzles in the game ""Black Myth: Wukong""?",It's refreshing to switch between action and thinking,20261020_10:00,Single
|
||||
Does LiHua play the game solo or with friends?,Both,20261020_10:00,Single
|
||||
Would LiHua be interested in playing co-op with the group?,Yes,20261020_10:00,Single
|
||||
What TV series does EmilyBurnett remember LiHua likes?,Chernobyl,20261021_21:00,Single
|
||||
"What does LiHua appreciate about the series ""Chernobyl""?",the powerful storytelling and the cinematography,20261021_21:00,Single
|
||||
What is EmilyBurnett's favorite TV show?,Game of Thrones,20261021_21:00,Single
|
||||
"What memorable scene from ""Chernobyl"" does LiHua mention?",the scene where they're trying to contain the radiation,20261021_21:00,Single
|
||||
"Who is LiHua's favorite actor in ""Chernobyl""?",Stellan Skarsgård as Boris Shcherbina,20261021_21:00,Single
|
||||
What TV series is EmilyBurnett currently watching?,The Last of Us,20261021_21:00,Single
|
||||
What TV show does EmilyBurnett recommend LiHua to watch?,The Last of Us,20261021_21:00,Single
|
||||
"What does LiHua's plan about wathcing ""The Last of Us""?",binge it that weekend,20261021_21:00,Single
|
||||
What does JenniferMoore congratulate LiHua for pn 20261022?,for pushing his limits with the planks this week,20261022_22:00,Single
|
||||
What does JakeWatson suggest for refueling after a soccer game?,a snack with carbs and protein and staying hydrated,20261023_23:00,Single
|
||||
What are LiHua's usual recovery snacks after a game?,a protein shake and a fruit smoothie,20261023_23:00,Single
|
||||
What event is the community organizing?,a weekend picnic,20261024_11:00,Single
|
||||
What is the purpose of the community picnic?,to bring the community together and get to know each other better,20261024_11:00,Single
|
||||
What type of food event did Turalyon suggest?,a potluck,20261024_11:00,Single
|
||||
What does RexxarRemar suggest for the community picnic atmosphere?,organizing some music,20261024_11:00,Single
|
||||
What does TirionFordring suggest bringing for seating?,blankets or chairs,20261024_11:00,Single
|
||||
What day did the community members decide on for the picnic?,next Saturday,20261024_11:00,Single
|
||||
What does RexxarRemar suggest for keeping everyone energized during the picnic?,a mix of snacks and drinks,20261024_11:00,Single
|
||||
What does RexxarRemar suggest to have available to keep everyone hydrated during the picnic?,water and drinks,20261024_11:00,Single
|
||||
What does IllidanStormrage say he will bring to keep everyone entertained during the picnic?,some epic games,20261024_11:00,Single
|
||||
What places in Europe does WolfgangSchulz remember as stunning in Europe?,the fjords in Norway and the lavender fields in Provence,20261025_16:00,Single
|
||||
What does WolfgangSchulz like about Prague?,its architecture and the vibe of the old town,20261025_16:00,Single
|
||||
What local dish did WolfgangSchulz try in Prague?,goulash,20261025_16:00,Single
|
||||
What is WolfgangSchulz's favorite risotto memory?,having seafood risotto in Venice,20261025_16:00,Single
|
||||
What is on WolfgangSchulz's list for a future trip?,Italy,20261025_16:00,Single
|
||||
What does WolfgangSchulz want to check out in Italy?,Cinque Terre,20261025_16:00,Single
|
||||
What does LiHua say about the seafood in Cinque Terre?,it's supposed to be fresh and delicious,20261025_16:00,Single
|
||||
What time will Li Hua and Wolfgang meet for breakfast on the morning of the 9th?,Insufficient information,N/A,Null
|
||||
What type of equipment does Wolfgang Schulz use when he works out at the gym?,Insufficient information,N/A,Null
|
||||
"What type of exercise does Li Hua prefer to do at the gym, and what time does Wolfgang usually go to the gym?",Insufficient information,N/A,Null
|
||||
What type of traditional games did Wolfgang Schulz play with Li Hua during the Lunar New Year celebration?,Insufficient information,N/A,Null
|
||||
"What did Li Hua eat for dinner on January 20, 2026?",Insufficient information,N/A,Null
|
||||
What specific suggestions did Li Hua have regarding the construction schedule during the last community meeting?,Insufficient information,N/A,Null
|
||||
What movie did Li Hua and Wolfgang decide to watch together on New Year's Eve 2026?,Insufficient information,N/A,Null
|
||||
What dish did Li Hua order for dessert after having Sichuan hot pot with Wolfgang Schulz?,Insufficient information,N/A,Null
|
||||
What were the specific requirements that the customers modified and how did Li Hua respond to each change?,Insufficient information,N/A,Null
|
||||
"What payment method did Li Hua use to purchase groceries from the store on April 15, 2026?",Insufficient information,N/A,Null
|
||||
What specific diet did Li Hua follow to achieve his fitness results that Jennifer Moore recommended?,Insufficient information,N/A,Null
|
||||
What alternative training sports does Li Hua consider besides the current routine mentioned in his conversation with Jennifer?,Insufficient information,N/A,Null
|
||||
What type of diet is Li Hua following to support his training regimen and sleep schedule?,Insufficient information,N/A,Null
|
||||
What type of protein shake does Li Hua prefer to drink before his workout sessions?,Insufficient information,N/A,Null
|
||||
What specific brand of protein supplement does Jennifer recommend for Li Hua's weight loss journey?,Insufficient information,N/A,Null
|
||||
What specific colors of paint does Li Hua plan to use for the basement walls after decorating with potted plants?,Insufficient information,N/A,Null
|
||||
What color did Li Hua decide to paint the walls of the basement after completing the renovation?,Insufficient information,N/A,Null
|
||||
"What specific design features did Li Hua suggest for Yuriko's studio homepage during their meeting at ""Central Perk""?",Insufficient information,N/A,Null
|
||||
What is the name of the concert that Wolfgang and Li Hua will attend on March 7th?,Insufficient information,N/A,Null
|
||||
What type of cake does Li Hua plan to bring to the meeting with Yuriko to celebrate her studio's homepage redesign?,Insufficient information,N/A,Null
|
||||
What color did Li Hua paint his house before the community garden renovation began?,Insufficient information,N/A,Null
|
||||
What specific dietary changes did Li Hua implement in his training regimen as a result of Jennifer's advice on endurance and flexibility?,Insufficient information,N/A,Null
|
||||
"What specific feedback did Yuriko give to Li Hua about the demo website during their meeting at the cafe ""Central Perk"" on Thursday morning, and how did Li Hua respond to her comments?",Insufficient information,N/A,Null
|
||||
What is the exact reason for Li Hua's unexpected work meeting on Thursday?,Insufficient information,N/A,Null
|
||||
"What toppings did Hailey put on the bread for Li Hua's next delivery, and what is the name of the bakery she gets her bread from?",Insufficient information,N/A,Null
|
||||
What is the favorite type of music that Li Hua and Yuriko plan to play together?,Insufficient information,N/A,Null
|
||||
What are the specific reasons why Li Hua prefers classical music over pop music in their discussions?,Insufficient information,N/A,Null
|
||||
What song did Yuriko and Wolfgang decide to perform together after watching the drum tutorial?,Insufficient information,N/A,Null
|
||||
"What type of feedback did Li Hua provide to Chae regarding the community medical knowledge lecture, and what is Wolfgang's role in the band rehearsal?",Insufficient information,N/A,Null
|
||||
What is the name of the last song Wolfgang played using the new drum practice app?,Insufficient information,N/A,Null
|
||||
What flavor of new bread products did Li Hua really enjoy at the bakery's anniversary event?,Insufficient information,N/A,Null
|
||||
What is the name of the song that Li Hua will sing at the karaoke on 20260425?,Insufficient information,N/A,Null
|
||||
What flavor of cake did Hailey plan to bake for the bakery's anniversary celebration in February?,Insufficient information,N/A,Null
|
||||
What type of dessert did Wolfgang plan to order for Li Hua during their dinner celebration?,Insufficient information,N/A,Null
|
||||
What were Raze's personal reasons for becoming a fitness coach before discussing pull-up techniques?,Insufficient information,N/A,Null
|
||||
"What new species of flowers will be featured in the community garden renovation, according to the feedback provided by Li Hua during the progress reports?",Insufficient information,N/A,Null
|
||||
What specific construction projects were discussed at the meeting between Turalyon and the residents regarding noise control?,Insufficient information,N/A,Null
|
||||
What is the amount of rent Li Hua owes to Adam Smith for the months of April and May?,Insufficient information,N/A,Null
|
||||
What song did Li Hua perform at the local music festival in 2026?,Insufficient information,N/A,Null
|
||||
"What is the name of the restaurant where Li Hua and Wolfgang had dinner on the night of June 9, 2026?",Insufficient information,N/A,Null
|
||||
What is the nutritional value of the new line of high-protein breads compared to traditional white bread?,Insufficient information,N/A,Null
|
||||
What specific sleep techniques did Li Hua use to improve her study habits after reading the neuroscience article and discussing the warm shower with Chae?,Insufficient information,N/A,Null
|
||||
What type of special dietary restrictions does Li Hua follow when preparing meals for his family?,Insufficient information,N/A,Null
|
||||
"What is the total cost of the air conditioner installation, including labor and materials, if Li Hua had previously discussed a budget of $2,000 with a different contractor?",Insufficient information,N/A,Null
|
||||
What type of fusion music does Wolfgang Schulz plan to create with Li Hua during their weekend trip to the music store?,Insufficient information,N/A,Null
|
||||
What type of flowers were planted in the garden based on the residents' suggestions discussed by Turalyon?,Insufficient information,N/A,Null
|
||||
"What type of air-conditioner did Li Hua select for Adam's living room, and how does its temperature regulation compare to the one in the basement?",Insufficient information,N/A,Null
|
||||
What were Li Hua's specific fitness goals and how did Jennifer's advice on nutrition influence them during his training for a marathon?,Insufficient information,N/A,Null
|
||||
What are the sales figures for the PS5 exclusive games released in 2027 that Thane discussed with group members?,Insufficient information,N/A,Null
|
||||
"What specific techniques did Li Hua use to prepare for a marathon race that took place on September 1, 2026?",Insufficient information,N/A,Null
|
||||
What food did Emily order for the group's discussion about Game of Thrones characters?,Insufficient information,N/A,Null
|
||||
"What type of protein supplements did Li Hua use after the workout on September 19, 2026?",Insufficient information,N/A,Null
|
||||
What color was the curtain that Li Hua chose for his living room?,Insufficient information,N/A,Null
|
||||
What specific measurements did Li Hua take for the window size before the installation of the curtain?,Insufficient information,N/A,Null
|
||||
What is Thane's favorite type of food that he enjoys while playing video games?,Insufficient information,N/A,Null
|
||||
What were the specific details of the negotiation between Jake Watson and Li Hua regarding player transfers from FC Barcelona to FC Bayern Munich?,Insufficient information,N/A,Null
|
||||
What was the final match score of the 2025 UEFA Champions League final?,Insufficient information,N/A,Null
|
||||
What was the exact date and time when Cersei Lannister first met Jaime Lannister in the Game of Thrones series?,Insufficient information,N/A,Null
|
||||
What type of dessert did Wolfgang plan to have with Li Hua after their hot pot dinner on a different day?,Insufficient information,N/A,Null
|
||||
What did Jennifer say to Li Hua about their plan for a team swimming competition in December?,Insufficient information,N/A,Null
|
||||
What strategy did Li Hua use to score a goal in the championship match against their rival team?,Insufficient information,N/A,Null
|
||||
What is Li Hua's favorite type of exercise outside of pull-ups?,Insufficient information,N/A,Null
|
||||
What time did Chae first suggest they visit the music festival in the morning?,Insufficient information,N/A,Null
|
||||
What are the reasons behind Jennifer and Li Hua's decision to change their training routine for the following month?,Insufficient information,N/A,Null
|
||||
What specific gifts did Wolfgang buy for Li Hua during his trip to Hong Kong?,Insufficient information,N/A,Null
|
||||
|
3824
dataset/LiHua-World/qa/query_set.json
Normal file
3824
dataset/LiHua-World/qa/query_set.json
Normal file
File diff suppressed because it is too large
Load Diff
89
main.py
Normal file
89
main.py
Normal file
@@ -0,0 +1,89 @@
|
||||
# from huggingface_hub import login
|
||||
# your_token = "INPUT YOUR TOKEN HERE"
|
||||
# login(your_token)
|
||||
|
||||
import os
|
||||
import sys
|
||||
import csv
|
||||
from tqdm import trange
|
||||
from minirag import MiniRAG, QueryParam
|
||||
from minirag.llm import gpt_4o_mini_complete, hf_model_complete, hf_embedding,openai_embedding
|
||||
from minirag.utils import EmbeddingFunc
|
||||
from transformers import AutoModel,AutoTokenizer
|
||||
from datetime import datetime
|
||||
|
||||
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
|
||||
|
||||
import argparse
|
||||
def get_args():
|
||||
parser = argparse.ArgumentParser(description="MiniRAG")
|
||||
parser.add_argument('--model', type=str, default='PHI')
|
||||
parser.add_argument('--outputpath', type=str, default='./logs/Default_output.csv')
|
||||
parser.add_argument('--workingdir', type=str, default='./LiHua-World')
|
||||
parser.add_argument('--datapath', type=str, default='./dataset/LiHua-World/data/')
|
||||
parser.add_argument('--querypath', type=str, default='./dataset/LiHua-World/qa/query_set.csv')
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
args = get_args()
|
||||
|
||||
|
||||
if args.model == 'PHI':
|
||||
LLM_MODEL = "microsoft/Phi-3.5-mini-instruct"
|
||||
elif args.model == 'GLM':
|
||||
LLM_MODEL = "THUDM/glm-edge-1.5b-chat"
|
||||
elif args.model == 'MiniCPM':
|
||||
LLM_MODEL = "openbmb/MiniCPM3-4B"
|
||||
elif args.model == 'qwen':
|
||||
LLM_MODEL = "Qwen/Qwen2.5-3B-Instruct"
|
||||
else:
|
||||
print("Invalid model name")
|
||||
exit(1)
|
||||
|
||||
WORKING_DIR = args.workingdir
|
||||
DATA_PATH = args.datapath
|
||||
QUERY_PATH = args.querypath
|
||||
OUTPUT_PATH = args.outputpath
|
||||
print("USING LLM:", LLM_MODEL)
|
||||
print("USING WORKING DIR:", WORKING_DIR)
|
||||
|
||||
|
||||
if not os.path.exists(WORKING_DIR):
|
||||
os.mkdir(WORKING_DIR)
|
||||
|
||||
rag = MiniRAG(
|
||||
working_dir = WORKING_DIR,
|
||||
llm_model_func = hf_model_complete,
|
||||
llm_model_max_token_size = 200,
|
||||
llm_model_name = LLM_MODEL,
|
||||
embedding_func = EmbeddingFunc(
|
||||
embedding_dim=384,
|
||||
max_token_size=1000,
|
||||
func=lambda texts: hf_embedding(
|
||||
texts,
|
||||
tokenizer=AutoTokenizer.from_pretrained(EMBEDDING_MODEL),
|
||||
embed_model=AutoModel.from_pretrained(EMBEDDING_MODEL)
|
||||
)
|
||||
),
|
||||
)
|
||||
|
||||
#Now indexing
|
||||
def find_txt_files(root_path):
|
||||
txt_files = []
|
||||
for root, dirs, files in os.walk(root_path):
|
||||
for file in files:
|
||||
if file.endswith('.txt'):
|
||||
txt_files.append(os.path.join(root, file))
|
||||
return txt_files
|
||||
|
||||
WEEK_LIST = find_txt_files(DATA_PATH)
|
||||
for WEEK in WEEK_LIST:
|
||||
id = WEEK_LIST.index(WEEK)
|
||||
print(f"{id}/{len(WEEK_LIST)}")
|
||||
with open(WEEK) as f:
|
||||
rag.insert(f.read())
|
||||
|
||||
# A toy query
|
||||
query = "What does LiHua predict will happen in \"The Rings of Power\"?"
|
||||
answer = rag.query(query, param=QueryParam(mode="mini")).replace("\n", "").replace("\r", "")
|
||||
print(answer)
|
||||
5
minirag/__init__.py
Normal file
5
minirag/__init__.py
Normal file
@@ -0,0 +1,5 @@
|
||||
from .minirag import MiniRAG as MiniRAG, QueryParam as QueryParam
|
||||
|
||||
__version__ = "1.0.0"
|
||||
__author__ = "Tianyu Fan"
|
||||
__url__ = "https://github.com/HKUDS/MiniRAG"
|
||||
BIN
minirag/__pycache__/__init__.cpython-39.pyc
Normal file
BIN
minirag/__pycache__/__init__.cpython-39.pyc
Normal file
Binary file not shown.
BIN
minirag/__pycache__/base.cpython-39.pyc
Normal file
BIN
minirag/__pycache__/base.cpython-39.pyc
Normal file
Binary file not shown.
BIN
minirag/__pycache__/lightrag.cpython-39.pyc
Normal file
BIN
minirag/__pycache__/lightrag.cpython-39.pyc
Normal file
Binary file not shown.
BIN
minirag/__pycache__/llm.cpython-39.pyc
Normal file
BIN
minirag/__pycache__/llm.cpython-39.pyc
Normal file
Binary file not shown.
BIN
minirag/__pycache__/minirag.cpython-39.pyc
Normal file
BIN
minirag/__pycache__/minirag.cpython-39.pyc
Normal file
Binary file not shown.
BIN
minirag/__pycache__/operate.cpython-39.pyc
Normal file
BIN
minirag/__pycache__/operate.cpython-39.pyc
Normal file
Binary file not shown.
BIN
minirag/__pycache__/prompt.cpython-39.pyc
Normal file
BIN
minirag/__pycache__/prompt.cpython-39.pyc
Normal file
Binary file not shown.
BIN
minirag/__pycache__/storage.cpython-39.pyc
Normal file
BIN
minirag/__pycache__/storage.cpython-39.pyc
Normal file
Binary file not shown.
BIN
minirag/__pycache__/utils.cpython-39.pyc
Normal file
BIN
minirag/__pycache__/utils.cpython-39.pyc
Normal file
Binary file not shown.
127
minirag/base.py
Normal file
127
minirag/base.py
Normal file
@@ -0,0 +1,127 @@
|
||||
from dataclasses import dataclass, field
|
||||
from typing import TypedDict, Union, Literal, Generic, TypeVar
|
||||
|
||||
import numpy as np
|
||||
|
||||
from .utils import EmbeddingFunc
|
||||
|
||||
TextChunkSchema = TypedDict(
|
||||
"TextChunkSchema",
|
||||
{"tokens": int, "content": str, "full_doc_id": str, "chunk_order_index": int},
|
||||
)
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
@dataclass
|
||||
class QueryParam:
|
||||
mode: Literal["light", "naive","mini"] = "mini"
|
||||
only_need_context: bool = False
|
||||
response_type: str = "Multiple Paragraphs"
|
||||
# Number of top-k items to retrieve; corresponds to entities in "local" mode and relationships in "global" mode.
|
||||
top_k: int = 5
|
||||
# Number of tokens for the original chunks.
|
||||
max_token_for_text_unit: int = 2000
|
||||
# Number of tokens for the relationship descriptions
|
||||
max_token_for_global_context: int = 2000
|
||||
# Number of tokens for the entity descriptions
|
||||
max_token_for_local_context: int = 2000#For Light/Graph
|
||||
max_token_for_node_context: int = 500#For Mini, if too long, SLM may be fail to generate any response
|
||||
|
||||
@dataclass
|
||||
class StorageNameSpace:
|
||||
namespace: str
|
||||
global_config: dict
|
||||
|
||||
async def index_done_callback(self):
|
||||
"""commit the storage operations after indexing"""
|
||||
pass
|
||||
|
||||
async def query_done_callback(self):
|
||||
"""commit the storage operations after querying"""
|
||||
pass
|
||||
|
||||
|
||||
@dataclass
|
||||
class BaseVectorStorage(StorageNameSpace):
|
||||
embedding_func: EmbeddingFunc
|
||||
meta_fields: set = field(default_factory=set)
|
||||
|
||||
async def query(self, query: str, top_k: int) -> list[dict]:
|
||||
raise NotImplementedError
|
||||
|
||||
async def upsert(self, data: dict[str, dict]):
|
||||
"""Use 'content' field from value for embedding, use key as id.
|
||||
If embedding_func is None, use 'embedding' field from value
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
@dataclass
|
||||
class BaseKVStorage(Generic[T], StorageNameSpace):
|
||||
embedding_func: EmbeddingFunc
|
||||
|
||||
async def all_keys(self) -> list[str]:
|
||||
raise NotImplementedError
|
||||
|
||||
async def get_by_id(self, id: str) -> Union[T, None]:
|
||||
raise NotImplementedError
|
||||
|
||||
async def get_by_ids(
|
||||
self, ids: list[str], fields: Union[set[str], None] = None
|
||||
) -> list[Union[T, None]]:
|
||||
raise NotImplementedError
|
||||
|
||||
async def filter_keys(self, data: list[str]) -> set[str]:
|
||||
"""return un-exist keys"""
|
||||
raise NotImplementedError
|
||||
|
||||
async def upsert(self, data: dict[str, T]):
|
||||
raise NotImplementedError
|
||||
|
||||
async def drop(self):
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
@dataclass
|
||||
class BaseGraphStorage(StorageNameSpace):
|
||||
embedding_func: EmbeddingFunc = None
|
||||
|
||||
async def has_node(self, node_id: str) -> bool:
|
||||
raise NotImplementedError
|
||||
|
||||
async def has_edge(self, source_node_id: str, target_node_id: str) -> bool:
|
||||
raise NotImplementedError
|
||||
|
||||
async def node_degree(self, node_id: str) -> int:
|
||||
raise NotImplementedError
|
||||
|
||||
async def edge_degree(self, src_id: str, tgt_id: str) -> int:
|
||||
raise NotImplementedError
|
||||
|
||||
async def get_node(self, node_id: str) -> Union[dict, None]:
|
||||
raise NotImplementedError
|
||||
|
||||
async def get_edge(
|
||||
self, source_node_id: str, target_node_id: str
|
||||
) -> Union[dict, None]:
|
||||
raise NotImplementedError
|
||||
|
||||
async def get_node_edges(
|
||||
self, source_node_id: str
|
||||
) -> Union[list[tuple[str, str]], None]:
|
||||
raise NotImplementedError
|
||||
|
||||
async def upsert_node(self, node_id: str, node_data: dict[str, str]):
|
||||
raise NotImplementedError
|
||||
|
||||
async def upsert_edge(
|
||||
self, source_node_id: str, target_node_id: str, edge_data: dict[str, str]
|
||||
):
|
||||
raise NotImplementedError
|
||||
|
||||
async def delete_node(self, node_id: str):
|
||||
raise NotImplementedError
|
||||
|
||||
async def embed_nodes(self, algorithm: str) -> tuple[np.ndarray, list[str]]:
|
||||
raise NotImplementedError("Node embedding is not used in minirag.")
|
||||
1
minirag/kg/__init__.py
Normal file
1
minirag/kg/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
# print ("init package vars here. ......")
|
||||
BIN
minirag/kg/__pycache__/__init__.cpython-39.pyc
Normal file
BIN
minirag/kg/__pycache__/__init__.cpython-39.pyc
Normal file
Binary file not shown.
BIN
minirag/kg/__pycache__/neo4j_impl.cpython-39.pyc
Normal file
BIN
minirag/kg/__pycache__/neo4j_impl.cpython-39.pyc
Normal file
Binary file not shown.
BIN
minirag/kg/__pycache__/oracle_impl.cpython-39.pyc
Normal file
BIN
minirag/kg/__pycache__/oracle_impl.cpython-39.pyc
Normal file
Binary file not shown.
296
minirag/kg/neo4j_impl.py
Normal file
296
minirag/kg/neo4j_impl.py
Normal file
@@ -0,0 +1,296 @@
|
||||
import asyncio
|
||||
import os
|
||||
from dataclasses import dataclass
|
||||
from typing import Any, Union, Tuple, List, Dict
|
||||
import inspect
|
||||
from minirag.utils import logger
|
||||
from ..base import BaseGraphStorage
|
||||
from neo4j import (
|
||||
AsyncGraphDatabase,
|
||||
exceptions as neo4jExceptions,
|
||||
AsyncDriver,
|
||||
AsyncManagedTransaction,
|
||||
)
|
||||
|
||||
|
||||
from tenacity import (
|
||||
retry,
|
||||
stop_after_attempt,
|
||||
wait_exponential,
|
||||
retry_if_exception_type,
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class Neo4JStorage(BaseGraphStorage):
|
||||
@staticmethod
|
||||
def load_nx_graph(file_name):
|
||||
print("no preloading of graph with neo4j in production")
|
||||
|
||||
def __init__(self, namespace, global_config):
|
||||
super().__init__(namespace=namespace, global_config=global_config)
|
||||
self._driver = None
|
||||
self._driver_lock = asyncio.Lock()
|
||||
URI = os.environ["NEO4J_URI"]
|
||||
USERNAME = os.environ["NEO4J_USERNAME"]
|
||||
PASSWORD = os.environ["NEO4J_PASSWORD"]
|
||||
self._driver: AsyncDriver = AsyncGraphDatabase.driver(
|
||||
URI, auth=(USERNAME, PASSWORD)
|
||||
)
|
||||
return None
|
||||
|
||||
def __post_init__(self):
|
||||
self._node_embed_algorithms = {
|
||||
"node2vec": self._node2vec_embed,
|
||||
}
|
||||
|
||||
async def close(self):
|
||||
if self._driver:
|
||||
await self._driver.close()
|
||||
self._driver = None
|
||||
|
||||
async def __aexit__(self, exc_type, exc, tb):
|
||||
if self._driver:
|
||||
await self._driver.close()
|
||||
|
||||
async def index_done_callback(self):
|
||||
print("KG successfully indexed.")
|
||||
|
||||
async def has_node(self, node_id: str) -> bool:
|
||||
entity_name_label = node_id.strip('"')
|
||||
|
||||
async with self._driver.session() as session:
|
||||
query = (
|
||||
f"MATCH (n:`{entity_name_label}`) RETURN count(n) > 0 AS node_exists"
|
||||
)
|
||||
result = await session.run(query)
|
||||
single_result = await result.single()
|
||||
logger.debug(
|
||||
f'{inspect.currentframe().f_code.co_name}:query:{query}:result:{single_result["node_exists"]}'
|
||||
)
|
||||
return single_result["node_exists"]
|
||||
|
||||
async def has_edge(self, source_node_id: str, target_node_id: str) -> bool:
|
||||
entity_name_label_source = source_node_id.strip('"')
|
||||
entity_name_label_target = target_node_id.strip('"')
|
||||
|
||||
async with self._driver.session() as session:
|
||||
query = (
|
||||
f"MATCH (a:`{entity_name_label_source}`)-[r]-(b:`{entity_name_label_target}`) "
|
||||
"RETURN COUNT(r) > 0 AS edgeExists"
|
||||
)
|
||||
result = await session.run(query)
|
||||
single_result = await result.single()
|
||||
logger.debug(
|
||||
f'{inspect.currentframe().f_code.co_name}:query:{query}:result:{single_result["edgeExists"]}'
|
||||
)
|
||||
return single_result["edgeExists"]
|
||||
|
||||
def close(self):
|
||||
self._driver.close()
|
||||
|
||||
async def get_node(self, node_id: str) -> Union[dict, None]:
|
||||
async with self._driver.session() as session:
|
||||
entity_name_label = node_id.strip('"')
|
||||
query = f"MATCH (n:`{entity_name_label}`) RETURN n"
|
||||
result = await session.run(query)
|
||||
record = await result.single()
|
||||
if record:
|
||||
node = record["n"]
|
||||
node_dict = dict(node)
|
||||
logger.debug(
|
||||
f"{inspect.currentframe().f_code.co_name}: query: {query}, result: {node_dict}"
|
||||
)
|
||||
return node_dict
|
||||
return None
|
||||
|
||||
async def node_degree(self, node_id: str) -> int:
|
||||
entity_name_label = node_id.strip('"')
|
||||
|
||||
async with self._driver.session() as session:
|
||||
query = f"""
|
||||
MATCH (n:`{entity_name_label}`)
|
||||
RETURN COUNT{{ (n)--() }} AS totalEdgeCount
|
||||
"""
|
||||
result = await session.run(query)
|
||||
record = await result.single()
|
||||
if record:
|
||||
edge_count = record["totalEdgeCount"]
|
||||
logger.debug(
|
||||
f"{inspect.currentframe().f_code.co_name}:query:{query}:result:{edge_count}"
|
||||
)
|
||||
return edge_count
|
||||
else:
|
||||
return None
|
||||
|
||||
async def edge_degree(self, src_id: str, tgt_id: str) -> int:
|
||||
entity_name_label_source = src_id.strip('"')
|
||||
entity_name_label_target = tgt_id.strip('"')
|
||||
src_degree = await self.node_degree(entity_name_label_source)
|
||||
trg_degree = await self.node_degree(entity_name_label_target)
|
||||
|
||||
# Convert None to 0 for addition
|
||||
src_degree = 0 if src_degree is None else src_degree
|
||||
trg_degree = 0 if trg_degree is None else trg_degree
|
||||
|
||||
degrees = int(src_degree) + int(trg_degree)
|
||||
logger.debug(
|
||||
f"{inspect.currentframe().f_code.co_name}:query:src_Degree+trg_degree:result:{degrees}"
|
||||
)
|
||||
return degrees
|
||||
|
||||
async def get_edge(
|
||||
self, source_node_id: str, target_node_id: str
|
||||
) -> Union[dict, None]:
|
||||
entity_name_label_source = source_node_id.strip('"')
|
||||
entity_name_label_target = target_node_id.strip('"')
|
||||
"""
|
||||
Find all edges between nodes of two given labels
|
||||
|
||||
Args:
|
||||
source_node_label (str): Label of the source nodes
|
||||
target_node_label (str): Label of the target nodes
|
||||
|
||||
Returns:
|
||||
list: List of all relationships/edges found
|
||||
"""
|
||||
async with self._driver.session() as session:
|
||||
query = f"""
|
||||
MATCH (start:`{entity_name_label_source}`)-[r]->(end:`{entity_name_label_target}`)
|
||||
RETURN properties(r) as edge_properties
|
||||
LIMIT 1
|
||||
""".format(
|
||||
entity_name_label_source=entity_name_label_source,
|
||||
entity_name_label_target=entity_name_label_target,
|
||||
)
|
||||
|
||||
result = await session.run(query)
|
||||
record = await result.single()
|
||||
if record:
|
||||
result = dict(record["edge_properties"])
|
||||
logger.debug(
|
||||
f"{inspect.currentframe().f_code.co_name}:query:{query}:result:{result}"
|
||||
)
|
||||
return result
|
||||
else:
|
||||
return None
|
||||
|
||||
async def get_node_edges(self, source_node_id: str) -> List[Tuple[str, str]]:
|
||||
node_label = source_node_id.strip('"')
|
||||
|
||||
"""
|
||||
Retrieves all edges (relationships) for a particular node identified by its label.
|
||||
:return: List of dictionaries containing edge information
|
||||
"""
|
||||
query = f"""MATCH (n:`{node_label}`)
|
||||
OPTIONAL MATCH (n)-[r]-(connected)
|
||||
RETURN n, r, connected"""
|
||||
async with self._driver.session() as session:
|
||||
results = await session.run(query)
|
||||
edges = []
|
||||
async for record in results:
|
||||
source_node = record["n"]
|
||||
connected_node = record["connected"]
|
||||
|
||||
source_label = (
|
||||
list(source_node.labels)[0] if source_node.labels else None
|
||||
)
|
||||
target_label = (
|
||||
list(connected_node.labels)[0]
|
||||
if connected_node and connected_node.labels
|
||||
else None
|
||||
)
|
||||
|
||||
if source_label and target_label:
|
||||
edges.append((source_label, target_label))
|
||||
|
||||
return edges
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=4, max=10),
|
||||
retry=retry_if_exception_type(
|
||||
(
|
||||
neo4jExceptions.ServiceUnavailable,
|
||||
neo4jExceptions.TransientError,
|
||||
neo4jExceptions.WriteServiceUnavailable,
|
||||
)
|
||||
),
|
||||
)
|
||||
async def upsert_node(self, node_id: str, node_data: Dict[str, Any]):
|
||||
"""
|
||||
Upsert a node in the Neo4j database.
|
||||
|
||||
Args:
|
||||
node_id: The unique identifier for the node (used as label)
|
||||
node_data: Dictionary of node properties
|
||||
"""
|
||||
label = node_id.strip('"')
|
||||
properties = node_data
|
||||
|
||||
async def _do_upsert(tx: AsyncManagedTransaction):
|
||||
query = f"""
|
||||
MERGE (n:`{label}`)
|
||||
SET n += $properties
|
||||
"""
|
||||
await tx.run(query, properties=properties)
|
||||
logger.debug(
|
||||
f"Upserted node with label '{label}' and properties: {properties}"
|
||||
)
|
||||
|
||||
try:
|
||||
async with self._driver.session() as session:
|
||||
await session.execute_write(_do_upsert)
|
||||
except Exception as e:
|
||||
logger.error(f"Error during upsert: {str(e)}")
|
||||
raise
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=4, max=10),
|
||||
retry=retry_if_exception_type(
|
||||
(
|
||||
neo4jExceptions.ServiceUnavailable,
|
||||
neo4jExceptions.TransientError,
|
||||
neo4jExceptions.WriteServiceUnavailable,
|
||||
)
|
||||
),
|
||||
)
|
||||
async def upsert_edge(
|
||||
self, source_node_id: str, target_node_id: str, edge_data: Dict[str, Any]
|
||||
):
|
||||
"""
|
||||
Upsert an edge and its properties between two nodes identified by their labels.
|
||||
|
||||
Args:
|
||||
source_node_id (str): Label of the source node (used as identifier)
|
||||
target_node_id (str): Label of the target node (used as identifier)
|
||||
edge_data (dict): Dictionary of properties to set on the edge
|
||||
"""
|
||||
source_node_label = source_node_id.strip('"')
|
||||
target_node_label = target_node_id.strip('"')
|
||||
edge_properties = edge_data
|
||||
|
||||
async def _do_upsert_edge(tx: AsyncManagedTransaction):
|
||||
query = f"""
|
||||
MATCH (source:`{source_node_label}`)
|
||||
WITH source
|
||||
MATCH (target:`{target_node_label}`)
|
||||
MERGE (source)-[r:DIRECTED]->(target)
|
||||
SET r += $properties
|
||||
RETURN r
|
||||
"""
|
||||
await tx.run(query, properties=edge_properties)
|
||||
logger.debug(
|
||||
f"Upserted edge from '{source_node_label}' to '{target_node_label}' with properties: {edge_properties}"
|
||||
)
|
||||
|
||||
try:
|
||||
async with self._driver.session() as session:
|
||||
await session.execute_write(_do_upsert_edge)
|
||||
except Exception as e:
|
||||
logger.error(f"Error during edge upsert: {str(e)}")
|
||||
raise
|
||||
|
||||
async def _node2vec_embed(self):
|
||||
print("Implemented but never called.")
|
||||
700
minirag/kg/oracle_impl.py
Normal file
700
minirag/kg/oracle_impl.py
Normal file
@@ -0,0 +1,700 @@
|
||||
import asyncio
|
||||
|
||||
# import html
|
||||
# import os
|
||||
from dataclasses import dataclass
|
||||
from typing import Union
|
||||
import numpy as np
|
||||
import array
|
||||
|
||||
from ..utils import logger
|
||||
from ..base import (
|
||||
BaseGraphStorage,
|
||||
BaseKVStorage,
|
||||
BaseVectorStorage,
|
||||
)
|
||||
|
||||
import oracledb
|
||||
|
||||
|
||||
class OracleDB:
|
||||
def __init__(self, config, **kwargs):
|
||||
self.host = config.get("host", None)
|
||||
self.port = config.get("port", None)
|
||||
self.user = config.get("user", None)
|
||||
self.password = config.get("password", None)
|
||||
self.dsn = config.get("dsn", None)
|
||||
self.config_dir = config.get("config_dir", None)
|
||||
self.wallet_location = config.get("wallet_location", None)
|
||||
self.wallet_password = config.get("wallet_password", None)
|
||||
self.workspace = config.get("workspace", None)
|
||||
self.max = 12
|
||||
self.increment = 1
|
||||
logger.info(f"Using the label {self.workspace} for Oracle Graph as identifier")
|
||||
if self.user is None or self.password is None:
|
||||
raise ValueError("Missing database user or password in addon_params")
|
||||
|
||||
try:
|
||||
oracledb.defaults.fetch_lobs = False
|
||||
|
||||
self.pool = oracledb.create_pool_async(
|
||||
user=self.user,
|
||||
password=self.password,
|
||||
dsn=self.dsn,
|
||||
config_dir=self.config_dir,
|
||||
wallet_location=self.wallet_location,
|
||||
wallet_password=self.wallet_password,
|
||||
min=1,
|
||||
max=self.max,
|
||||
increment=self.increment,
|
||||
)
|
||||
logger.info(f"Connected to Oracle database at {self.dsn}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to connect to Oracle database at {self.dsn}")
|
||||
logger.error(f"Oracle database error: {e}")
|
||||
raise
|
||||
|
||||
def numpy_converter_in(self, value):
|
||||
"""Convert numpy array to array.array"""
|
||||
if value.dtype == np.float64:
|
||||
dtype = "d"
|
||||
elif value.dtype == np.float32:
|
||||
dtype = "f"
|
||||
else:
|
||||
dtype = "b"
|
||||
return array.array(dtype, value)
|
||||
|
||||
def input_type_handler(self, cursor, value, arraysize):
|
||||
"""Set the type handler for the input data"""
|
||||
if isinstance(value, np.ndarray):
|
||||
return cursor.var(
|
||||
oracledb.DB_TYPE_VECTOR,
|
||||
arraysize=arraysize,
|
||||
inconverter=self.numpy_converter_in,
|
||||
)
|
||||
|
||||
def numpy_converter_out(self, value):
|
||||
"""Convert array.array to numpy array"""
|
||||
if value.typecode == "b":
|
||||
dtype = np.int8
|
||||
elif value.typecode == "f":
|
||||
dtype = np.float32
|
||||
else:
|
||||
dtype = np.float64
|
||||
return np.array(value, copy=False, dtype=dtype)
|
||||
|
||||
def output_type_handler(self, cursor, metadata):
|
||||
"""Set the type handler for the output data"""
|
||||
if metadata.type_code is oracledb.DB_TYPE_VECTOR:
|
||||
return cursor.var(
|
||||
metadata.type_code,
|
||||
arraysize=cursor.arraysize,
|
||||
outconverter=self.numpy_converter_out,
|
||||
)
|
||||
|
||||
async def check_tables(self):
|
||||
for k, v in TABLES.items():
|
||||
try:
|
||||
if k.lower() == "lightrag_graph":
|
||||
await self.query(
|
||||
"SELECT id FROM GRAPH_TABLE (lightrag_graph MATCH (a) COLUMNS (a.id)) fetch first row only"
|
||||
)
|
||||
else:
|
||||
await self.query("SELECT 1 FROM {k}".format(k=k))
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to check table {k} in Oracle database")
|
||||
logger.error(f"Oracle database error: {e}")
|
||||
try:
|
||||
# print(v["ddl"])
|
||||
await self.execute(v["ddl"])
|
||||
logger.info(f"Created table {k} in Oracle database")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to create table {k} in Oracle database")
|
||||
logger.error(f"Oracle database error: {e}")
|
||||
|
||||
logger.info("Finished check all tables in Oracle database")
|
||||
|
||||
async def query(self, sql: str, multirows: bool = False) -> Union[dict, None]:
|
||||
async with self.pool.acquire() as connection:
|
||||
connection.inputtypehandler = self.input_type_handler
|
||||
connection.outputtypehandler = self.output_type_handler
|
||||
with connection.cursor() as cursor:
|
||||
try:
|
||||
await cursor.execute(sql)
|
||||
except Exception as e:
|
||||
logger.error(f"Oracle database error: {e}")
|
||||
print(sql)
|
||||
raise
|
||||
columns = [column[0].lower() for column in cursor.description]
|
||||
if multirows:
|
||||
rows = await cursor.fetchall()
|
||||
if rows:
|
||||
data = [dict(zip(columns, row)) for row in rows]
|
||||
else:
|
||||
data = []
|
||||
else:
|
||||
row = await cursor.fetchone()
|
||||
if row:
|
||||
data = dict(zip(columns, row))
|
||||
else:
|
||||
data = None
|
||||
return data
|
||||
|
||||
async def execute(self, sql: str, data: list = None):
|
||||
# logger.info("go into OracleDB execute method")
|
||||
try:
|
||||
async with self.pool.acquire() as connection:
|
||||
connection.inputtypehandler = self.input_type_handler
|
||||
connection.outputtypehandler = self.output_type_handler
|
||||
with connection.cursor() as cursor:
|
||||
if data is None:
|
||||
await cursor.execute(sql)
|
||||
else:
|
||||
# print(data)
|
||||
# print(sql)
|
||||
await cursor.execute(sql, data)
|
||||
await connection.commit()
|
||||
except Exception as e:
|
||||
logger.error(f"Oracle database error: {e}")
|
||||
print(sql)
|
||||
print(data)
|
||||
raise
|
||||
|
||||
|
||||
@dataclass
|
||||
class OracleKVStorage(BaseKVStorage):
|
||||
# should pass db object to self.db
|
||||
def __post_init__(self):
|
||||
self._data = {}
|
||||
self._max_batch_size = self.global_config["embedding_batch_num"]
|
||||
|
||||
################ QUERY METHODS ################
|
||||
|
||||
async def get_by_id(self, id: str) -> Union[dict, None]:
|
||||
"""根据 id 获取 doc_full 数据."""
|
||||
SQL = SQL_TEMPLATES["get_by_id_" + self.namespace].format(
|
||||
workspace=self.db.workspace, id=id
|
||||
)
|
||||
# print("get_by_id:"+SQL)
|
||||
res = await self.db.query(SQL)
|
||||
if res:
|
||||
data = res # {"data":res}
|
||||
# print (data)
|
||||
return data
|
||||
else:
|
||||
return None
|
||||
|
||||
# Query by id
|
||||
async def get_by_ids(self, ids: list[str], fields=None) -> Union[list[dict], None]:
|
||||
"""根据 id 获取 doc_chunks 数据"""
|
||||
SQL = SQL_TEMPLATES["get_by_ids_" + self.namespace].format(
|
||||
workspace=self.db.workspace, ids=",".join([f"'{id}'" for id in ids])
|
||||
)
|
||||
# print("get_by_ids:"+SQL)
|
||||
res = await self.db.query(SQL, multirows=True)
|
||||
if res:
|
||||
data = res # [{"data":i} for i in res]
|
||||
# print(data)
|
||||
return data
|
||||
else:
|
||||
return None
|
||||
|
||||
async def filter_keys(self, keys: list[str]) -> set[str]:
|
||||
"""过滤掉重复内容"""
|
||||
SQL = SQL_TEMPLATES["filter_keys"].format(
|
||||
table_name=N_T[self.namespace],
|
||||
workspace=self.db.workspace,
|
||||
ids=",".join([f"'{k}'" for k in keys]),
|
||||
)
|
||||
res = await self.db.query(SQL, multirows=True)
|
||||
data = None
|
||||
if res:
|
||||
exist_keys = [key["id"] for key in res]
|
||||
data = set([s for s in keys if s not in exist_keys])
|
||||
else:
|
||||
exist_keys = []
|
||||
data = set([s for s in keys if s not in exist_keys])
|
||||
return data
|
||||
|
||||
################ INSERT METHODS ################
|
||||
async def upsert(self, data: dict[str, dict]):
|
||||
left_data = {k: v for k, v in data.items() if k not in self._data}
|
||||
self._data.update(left_data)
|
||||
# print(self._data)
|
||||
# values = []
|
||||
if self.namespace == "text_chunks":
|
||||
list_data = [
|
||||
{
|
||||
"__id__": k,
|
||||
**{k1: v1 for k1, v1 in v.items()},
|
||||
}
|
||||
for k, v in data.items()
|
||||
]
|
||||
contents = [v["content"] for v in data.values()]
|
||||
batches = [
|
||||
contents[i : i + self._max_batch_size]
|
||||
for i in range(0, len(contents), self._max_batch_size)
|
||||
]
|
||||
embeddings_list = await asyncio.gather(
|
||||
*[self.embedding_func(batch) for batch in batches]
|
||||
)
|
||||
embeddings = np.concatenate(embeddings_list)
|
||||
for i, d in enumerate(list_data):
|
||||
d["__vector__"] = embeddings[i]
|
||||
# print(list_data)
|
||||
for item in list_data:
|
||||
merge_sql = SQL_TEMPLATES["merge_chunk"].format(check_id=item["__id__"])
|
||||
|
||||
values = [
|
||||
item["__id__"],
|
||||
item["content"],
|
||||
self.db.workspace,
|
||||
item["tokens"],
|
||||
item["chunk_order_index"],
|
||||
item["full_doc_id"],
|
||||
item["__vector__"],
|
||||
]
|
||||
# print(merge_sql)
|
||||
await self.db.execute(merge_sql, values)
|
||||
|
||||
if self.namespace == "full_docs":
|
||||
for k, v in self._data.items():
|
||||
# values.clear()
|
||||
merge_sql = SQL_TEMPLATES["merge_doc_full"].format(
|
||||
check_id=k,
|
||||
)
|
||||
values = [k, self._data[k]["content"], self.db.workspace]
|
||||
# print(merge_sql)
|
||||
await self.db.execute(merge_sql, values)
|
||||
return left_data
|
||||
|
||||
async def index_done_callback(self):
|
||||
if self.namespace in ["full_docs", "text_chunks"]:
|
||||
logger.info("full doc and chunk data had been saved into oracle db!")
|
||||
|
||||
|
||||
@dataclass
|
||||
class OracleVectorDBStorage(BaseVectorStorage):
|
||||
cosine_better_than_threshold: float = 0.2
|
||||
|
||||
def __post_init__(self):
|
||||
pass
|
||||
|
||||
async def upsert(self, data: dict[str, dict]):
|
||||
"""向向量数据库中插入数据"""
|
||||
pass
|
||||
|
||||
async def index_done_callback(self):
|
||||
pass
|
||||
|
||||
#################### query method ###############
|
||||
async def query(self, query: str, top_k=5) -> Union[dict, list[dict]]:
|
||||
"""从向量数据库中查询数据"""
|
||||
embeddings = await self.embedding_func([query])
|
||||
embedding = embeddings[0]
|
||||
# 转换精度
|
||||
dtype = str(embedding.dtype).upper()
|
||||
dimension = embedding.shape[0]
|
||||
embedding_string = ", ".join(map(str, embedding.tolist()))
|
||||
|
||||
SQL = SQL_TEMPLATES[self.namespace].format(
|
||||
embedding_string=embedding_string,
|
||||
dimension=dimension,
|
||||
dtype=dtype,
|
||||
workspace=self.db.workspace,
|
||||
top_k=top_k,
|
||||
better_than_threshold=self.cosine_better_than_threshold,
|
||||
)
|
||||
# print(SQL)
|
||||
results = await self.db.query(SQL, multirows=True)
|
||||
# print("vector search result:",results)
|
||||
return results
|
||||
|
||||
|
||||
@dataclass
|
||||
class OracleGraphStorage(BaseGraphStorage):
|
||||
"""基于Oracle的图存储模块"""
|
||||
|
||||
def __post_init__(self):
|
||||
"""从graphml文件加载图"""
|
||||
self._max_batch_size = self.global_config["embedding_batch_num"]
|
||||
|
||||
#################### insert method ################
|
||||
|
||||
async def upsert_node(self, node_id: str, node_data: dict[str, str]):
|
||||
"""插入或更新节点"""
|
||||
# print("go into upsert node method")
|
||||
entity_name = node_id
|
||||
entity_type = node_data["entity_type"]
|
||||
description = node_data["description"]
|
||||
source_id = node_data["source_id"]
|
||||
content = entity_name + description
|
||||
contents = [content]
|
||||
batches = [
|
||||
contents[i : i + self._max_batch_size]
|
||||
for i in range(0, len(contents), self._max_batch_size)
|
||||
]
|
||||
embeddings_list = await asyncio.gather(
|
||||
*[self.embedding_func(batch) for batch in batches]
|
||||
)
|
||||
embeddings = np.concatenate(embeddings_list)
|
||||
content_vector = embeddings[0]
|
||||
merge_sql = SQL_TEMPLATES["merge_node"].format(
|
||||
workspace=self.db.workspace, name=entity_name, source_chunk_id=source_id
|
||||
)
|
||||
# print(merge_sql)
|
||||
await self.db.execute(
|
||||
merge_sql,
|
||||
[
|
||||
self.db.workspace,
|
||||
entity_name,
|
||||
entity_type,
|
||||
description,
|
||||
source_id,
|
||||
content,
|
||||
content_vector,
|
||||
],
|
||||
)
|
||||
# self._graph.add_node(node_id, **node_data)
|
||||
|
||||
async def upsert_edge(
|
||||
self, source_node_id: str, target_node_id: str, edge_data: dict[str, str]
|
||||
):
|
||||
"""插入或更新边"""
|
||||
# print("go into upsert edge method")
|
||||
source_name = source_node_id
|
||||
target_name = target_node_id
|
||||
weight = edge_data["weight"]
|
||||
keywords = edge_data["keywords"]
|
||||
description = edge_data["description"]
|
||||
source_chunk_id = edge_data["source_id"]
|
||||
content = keywords + source_name + target_name + description
|
||||
contents = [content]
|
||||
batches = [
|
||||
contents[i : i + self._max_batch_size]
|
||||
for i in range(0, len(contents), self._max_batch_size)
|
||||
]
|
||||
embeddings_list = await asyncio.gather(
|
||||
*[self.embedding_func(batch) for batch in batches]
|
||||
)
|
||||
embeddings = np.concatenate(embeddings_list)
|
||||
content_vector = embeddings[0]
|
||||
merge_sql = SQL_TEMPLATES["merge_edge"].format(
|
||||
workspace=self.db.workspace,
|
||||
source_name=source_name,
|
||||
target_name=target_name,
|
||||
source_chunk_id=source_chunk_id,
|
||||
)
|
||||
# print(merge_sql)
|
||||
await self.db.execute(
|
||||
merge_sql,
|
||||
[
|
||||
self.db.workspace,
|
||||
source_name,
|
||||
target_name,
|
||||
weight,
|
||||
keywords,
|
||||
description,
|
||||
source_chunk_id,
|
||||
content,
|
||||
content_vector,
|
||||
],
|
||||
)
|
||||
# self._graph.add_edge(source_node_id, target_node_id, **edge_data)
|
||||
|
||||
async def embed_nodes(self, algorithm: str) -> tuple[np.ndarray, list[str]]:
|
||||
"""为节点生成向量"""
|
||||
if algorithm not in self._node_embed_algorithms:
|
||||
raise ValueError(f"Node embedding algorithm {algorithm} not supported")
|
||||
return await self._node_embed_algorithms[algorithm]()
|
||||
|
||||
async def _node2vec_embed(self):
|
||||
"""为节点生成向量"""
|
||||
from graspologic import embed
|
||||
|
||||
embeddings, nodes = embed.node2vec_embed(
|
||||
self._graph,
|
||||
**self.config["node2vec_params"],
|
||||
)
|
||||
|
||||
nodes_ids = [self._graph.nodes[node_id]["id"] for node_id in nodes]
|
||||
return embeddings, nodes_ids
|
||||
|
||||
async def index_done_callback(self):
|
||||
"""写入graphhml图文件"""
|
||||
logger.info(
|
||||
"Node and edge data had been saved into oracle db already, so nothing to do here!"
|
||||
)
|
||||
|
||||
#################### query method #################
|
||||
async def has_node(self, node_id: str) -> bool:
|
||||
"""根据节点id检查节点是否存在"""
|
||||
SQL = SQL_TEMPLATES["has_node"].format(
|
||||
workspace=self.db.workspace, node_id=node_id
|
||||
)
|
||||
# print(SQL)
|
||||
# print(self.db.workspace, node_id)
|
||||
res = await self.db.query(SQL)
|
||||
if res:
|
||||
# print("Node exist!",res)
|
||||
return True
|
||||
else:
|
||||
# print("Node not exist!")
|
||||
return False
|
||||
|
||||
async def has_edge(self, source_node_id: str, target_node_id: str) -> bool:
|
||||
"""根据源和目标节点id检查边是否存在"""
|
||||
SQL = SQL_TEMPLATES["has_edge"].format(
|
||||
workspace=self.db.workspace,
|
||||
source_node_id=source_node_id,
|
||||
target_node_id=target_node_id,
|
||||
)
|
||||
# print(SQL)
|
||||
res = await self.db.query(SQL)
|
||||
if res:
|
||||
# print("Edge exist!",res)
|
||||
return True
|
||||
else:
|
||||
# print("Edge not exist!")
|
||||
return False
|
||||
|
||||
async def node_degree(self, node_id: str) -> int:
|
||||
"""根据节点id获取节点的度"""
|
||||
SQL = SQL_TEMPLATES["node_degree"].format(
|
||||
workspace=self.db.workspace, node_id=node_id
|
||||
)
|
||||
# print(SQL)
|
||||
res = await self.db.query(SQL)
|
||||
if res:
|
||||
# print("Node degree",res["degree"])
|
||||
return res["degree"]
|
||||
else:
|
||||
# print("Edge not exist!")
|
||||
return 0
|
||||
|
||||
async def edge_degree(self, src_id: str, tgt_id: str) -> int:
|
||||
"""根据源和目标节点id获取边的度"""
|
||||
degree = await self.node_degree(src_id) + await self.node_degree(tgt_id)
|
||||
# print("Edge degree",degree)
|
||||
return degree
|
||||
|
||||
async def get_node(self, node_id: str) -> Union[dict, None]:
|
||||
"""根据节点id获取节点数据"""
|
||||
SQL = SQL_TEMPLATES["get_node"].format(
|
||||
workspace=self.db.workspace, node_id=node_id
|
||||
)
|
||||
# print(self.db.workspace, node_id)
|
||||
# print(SQL)
|
||||
res = await self.db.query(SQL)
|
||||
if res:
|
||||
# print("Get node!",self.db.workspace, node_id,res)
|
||||
return res
|
||||
else:
|
||||
# print("Can't get node!",self.db.workspace, node_id)
|
||||
return None
|
||||
|
||||
async def get_edge(
|
||||
self, source_node_id: str, target_node_id: str
|
||||
) -> Union[dict, None]:
|
||||
"""根据源和目标节点id获取边"""
|
||||
SQL = SQL_TEMPLATES["get_edge"].format(
|
||||
workspace=self.db.workspace,
|
||||
source_node_id=source_node_id,
|
||||
target_node_id=target_node_id,
|
||||
)
|
||||
res = await self.db.query(SQL)
|
||||
if res:
|
||||
# print("Get edge!",self.db.workspace, source_node_id, target_node_id,res[0])
|
||||
return res
|
||||
else:
|
||||
# print("Edge not exist!",self.db.workspace, source_node_id, target_node_id)
|
||||
return None
|
||||
|
||||
async def get_node_edges(self, source_node_id: str):
|
||||
"""根据节点id获取节点的所有边"""
|
||||
if await self.has_node(source_node_id):
|
||||
SQL = SQL_TEMPLATES["get_node_edges"].format(
|
||||
workspace=self.db.workspace, source_node_id=source_node_id
|
||||
)
|
||||
res = await self.db.query(sql=SQL, multirows=True)
|
||||
if res:
|
||||
data = [(i["source_name"], i["target_name"]) for i in res]
|
||||
# print("Get node edge!",self.db.workspace, source_node_id,data)
|
||||
return data
|
||||
else:
|
||||
# print("Node Edge not exist!",self.db.workspace, source_node_id)
|
||||
return []
|
||||
|
||||
|
||||
N_T = {
|
||||
"full_docs": "LIGHTRAG_DOC_FULL",
|
||||
"text_chunks": "LIGHTRAG_DOC_CHUNKS",
|
||||
"chunks": "LIGHTRAG_DOC_CHUNKS",
|
||||
"entities": "LIGHTRAG_GRAPH_NODES",
|
||||
"relationships": "LIGHTRAG_GRAPH_EDGES",
|
||||
}
|
||||
|
||||
TABLES = {
|
||||
"LIGHTRAG_DOC_FULL": {
|
||||
"ddl": """CREATE TABLE LIGHTRAG_DOC_FULL (
|
||||
id varchar(256)PRIMARY KEY,
|
||||
workspace varchar(1024),
|
||||
doc_name varchar(1024),
|
||||
content CLOB,
|
||||
meta JSON,
|
||||
createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updatetime TIMESTAMP DEFAULT NULL
|
||||
)"""
|
||||
},
|
||||
"LIGHTRAG_DOC_CHUNKS": {
|
||||
"ddl": """CREATE TABLE LIGHTRAG_DOC_CHUNKS (
|
||||
id varchar(256) PRIMARY KEY,
|
||||
workspace varchar(1024),
|
||||
full_doc_id varchar(256),
|
||||
chunk_order_index NUMBER,
|
||||
tokens NUMBER,
|
||||
content CLOB,
|
||||
content_vector VECTOR,
|
||||
createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updatetime TIMESTAMP DEFAULT NULL
|
||||
)"""
|
||||
},
|
||||
"LIGHTRAG_GRAPH_NODES": {
|
||||
"ddl": """CREATE TABLE LIGHTRAG_GRAPH_NODES (
|
||||
id NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
|
||||
workspace varchar(1024),
|
||||
name varchar(2048),
|
||||
entity_type varchar(1024),
|
||||
description CLOB,
|
||||
source_chunk_id varchar(256),
|
||||
content CLOB,
|
||||
content_vector VECTOR,
|
||||
createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updatetime TIMESTAMP DEFAULT NULL
|
||||
)"""
|
||||
},
|
||||
"LIGHTRAG_GRAPH_EDGES": {
|
||||
"ddl": """CREATE TABLE LIGHTRAG_GRAPH_EDGES (
|
||||
id NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
|
||||
workspace varchar(1024),
|
||||
source_name varchar(2048),
|
||||
target_name varchar(2048),
|
||||
weight NUMBER,
|
||||
keywords CLOB,
|
||||
description CLOB,
|
||||
source_chunk_id varchar(256),
|
||||
content CLOB,
|
||||
content_vector VECTOR,
|
||||
createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updatetime TIMESTAMP DEFAULT NULL
|
||||
)"""
|
||||
},
|
||||
"LIGHTRAG_LLM_CACHE": {
|
||||
"ddl": """CREATE TABLE LIGHTRAG_LLM_CACHE (
|
||||
id varchar(256) PRIMARY KEY,
|
||||
send clob,
|
||||
return clob,
|
||||
model varchar(1024),
|
||||
createtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updatetime TIMESTAMP DEFAULT NULL
|
||||
)"""
|
||||
},
|
||||
"LIGHTRAG_GRAPH": {
|
||||
"ddl": """CREATE OR REPLACE PROPERTY GRAPH lightrag_graph
|
||||
VERTEX TABLES (
|
||||
lightrag_graph_nodes KEY (id)
|
||||
LABEL entity
|
||||
PROPERTIES (id,workspace,name) -- ,entity_type,description,source_chunk_id)
|
||||
)
|
||||
EDGE TABLES (
|
||||
lightrag_graph_edges KEY (id)
|
||||
SOURCE KEY (source_name) REFERENCES lightrag_graph_nodes(name)
|
||||
DESTINATION KEY (target_name) REFERENCES lightrag_graph_nodes(name)
|
||||
LABEL has_relation
|
||||
PROPERTIES (id,workspace,source_name,target_name) -- ,weight, keywords,description,source_chunk_id)
|
||||
) OPTIONS(ALLOW MIXED PROPERTY TYPES)"""
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
SQL_TEMPLATES = {
|
||||
# SQL for KVStorage
|
||||
"get_by_id_full_docs": "select ID,NVL(content,'') as content from LIGHTRAG_DOC_FULL where workspace='{workspace}' and ID='{id}'",
|
||||
"get_by_id_text_chunks": "select ID,TOKENS,NVL(content,'') as content,CHUNK_ORDER_INDEX,FULL_DOC_ID from LIGHTRAG_DOC_CHUNKS where workspace='{workspace}' and ID='{id}'",
|
||||
"get_by_ids_full_docs": "select ID,NVL(content,'') as content from LIGHTRAG_DOC_FULL where workspace='{workspace}' and ID in ({ids})",
|
||||
"get_by_ids_text_chunks": "select ID,TOKENS,NVL(content,'') as content,CHUNK_ORDER_INDEX,FULL_DOC_ID from LIGHTRAG_DOC_CHUNKS where workspace='{workspace}' and ID in ({ids})",
|
||||
"filter_keys": "select id from {table_name} where workspace='{workspace}' and id in ({ids})",
|
||||
"merge_doc_full": """ MERGE INTO LIGHTRAG_DOC_FULL a
|
||||
USING DUAL
|
||||
ON (a.id = '{check_id}')
|
||||
WHEN NOT MATCHED THEN
|
||||
INSERT(id,content,workspace) values(:1,:2,:3)
|
||||
""",
|
||||
"merge_chunk": """MERGE INTO LIGHTRAG_DOC_CHUNKS a
|
||||
USING DUAL
|
||||
ON (a.id = '{check_id}')
|
||||
WHEN NOT MATCHED THEN
|
||||
INSERT(id,content,workspace,tokens,chunk_order_index,full_doc_id,content_vector)
|
||||
values (:1,:2,:3,:4,:5,:6,:7) """,
|
||||
# SQL for VectorStorage
|
||||
"entities": """SELECT name as entity_name FROM
|
||||
(SELECT id,name,VECTOR_DISTANCE(content_vector,vector('[{embedding_string}]',{dimension},{dtype}),COSINE) as distance
|
||||
FROM LIGHTRAG_GRAPH_NODES WHERE workspace='{workspace}')
|
||||
WHERE distance>{better_than_threshold} ORDER BY distance ASC FETCH FIRST {top_k} ROWS ONLY""",
|
||||
"relationships": """SELECT source_name as src_id, target_name as tgt_id FROM
|
||||
(SELECT id,source_name,target_name,VECTOR_DISTANCE(content_vector,vector('[{embedding_string}]',{dimension},{dtype}),COSINE) as distance
|
||||
FROM LIGHTRAG_GRAPH_EDGES WHERE workspace='{workspace}')
|
||||
WHERE distance>{better_than_threshold} ORDER BY distance ASC FETCH FIRST {top_k} ROWS ONLY""",
|
||||
"chunks": """SELECT id FROM
|
||||
(SELECT id,VECTOR_DISTANCE(content_vector,vector('[{embedding_string}]',{dimension},{dtype}),COSINE) as distance
|
||||
FROM LIGHTRAG_DOC_CHUNKS WHERE workspace='{workspace}')
|
||||
WHERE distance>{better_than_threshold} ORDER BY distance ASC FETCH FIRST {top_k} ROWS ONLY""",
|
||||
# SQL for GraphStorage
|
||||
"has_node": """SELECT * FROM GRAPH_TABLE (lightrag_graph
|
||||
MATCH (a)
|
||||
WHERE a.workspace='{workspace}' AND a.name='{node_id}'
|
||||
COLUMNS (a.name))""",
|
||||
"has_edge": """SELECT * FROM GRAPH_TABLE (lightrag_graph
|
||||
MATCH (a) -[e]-> (b)
|
||||
WHERE e.workspace='{workspace}' and a.workspace='{workspace}' and b.workspace='{workspace}'
|
||||
AND a.name='{source_node_id}' AND b.name='{target_node_id}'
|
||||
COLUMNS (e.source_name,e.target_name) )""",
|
||||
"node_degree": """SELECT count(1) as degree FROM GRAPH_TABLE (lightrag_graph
|
||||
MATCH (a)-[e]->(b)
|
||||
WHERE a.workspace='{workspace}' and a.workspace='{workspace}' and b.workspace='{workspace}'
|
||||
AND a.name='{node_id}' or b.name = '{node_id}'
|
||||
COLUMNS (a.name))""",
|
||||
"get_node": """SELECT t1.name,t2.entity_type,t2.source_chunk_id as source_id,NVL(t2.description,'') AS description
|
||||
FROM GRAPH_TABLE (lightrag_graph
|
||||
MATCH (a)
|
||||
WHERE a.workspace='{workspace}' AND a.name='{node_id}'
|
||||
COLUMNS (a.name)
|
||||
) t1 JOIN LIGHTRAG_GRAPH_NODES t2 on t1.name=t2.name
|
||||
WHERE t2.workspace='{workspace}'""",
|
||||
"get_edge": """SELECT t1.source_id,t2.weight,t2.source_chunk_id as source_id,t2.keywords,
|
||||
NVL(t2.description,'') AS description,NVL(t2.KEYWORDS,'') AS keywords
|
||||
FROM GRAPH_TABLE (lightrag_graph
|
||||
MATCH (a)-[e]->(b)
|
||||
WHERE e.workspace='{workspace}' and a.workspace='{workspace}' and b.workspace='{workspace}'
|
||||
AND a.name='{source_node_id}' and b.name = '{target_node_id}'
|
||||
COLUMNS (e.id,a.name as source_id)
|
||||
) t1 JOIN LIGHTRAG_GRAPH_EDGES t2 on t1.id=t2.id""",
|
||||
"get_node_edges": """SELECT source_name,target_name
|
||||
FROM GRAPH_TABLE (lightrag_graph
|
||||
MATCH (a)-[e]->(b)
|
||||
WHERE e.workspace='{workspace}' and a.workspace='{workspace}' and b.workspace='{workspace}'
|
||||
AND a.name='{source_node_id}'
|
||||
COLUMNS (a.name as source_name,b.name as target_name))""",
|
||||
"merge_node": """MERGE INTO LIGHTRAG_GRAPH_NODES a
|
||||
USING DUAL
|
||||
ON (a.workspace = '{workspace}' and a.name='{name}' and a.source_chunk_id='{source_chunk_id}')
|
||||
WHEN NOT MATCHED THEN
|
||||
INSERT(workspace,name,entity_type,description,source_chunk_id,content,content_vector)
|
||||
values (:1,:2,:3,:4,:5,:6,:7) """,
|
||||
"merge_edge": """MERGE INTO LIGHTRAG_GRAPH_EDGES a
|
||||
USING DUAL
|
||||
ON (a.workspace = '{workspace}' and a.source_name='{source_name}' and a.target_name='{target_name}' and a.source_chunk_id='{source_chunk_id}')
|
||||
WHEN NOT MATCHED THEN
|
||||
INSERT(workspace,source_name,target_name,weight,keywords,description,source_chunk_id,content,content_vector)
|
||||
values (:1,:2,:3,:4,:5,:6,:7,:8,:9) """,
|
||||
}
|
||||
815
minirag/llm.py
Normal file
815
minirag/llm.py
Normal file
@@ -0,0 +1,815 @@
|
||||
import os
|
||||
import copy
|
||||
from functools import lru_cache
|
||||
import json
|
||||
import aioboto3
|
||||
import aiohttp
|
||||
import numpy as np
|
||||
import ollama
|
||||
|
||||
from openai import (
|
||||
AsyncOpenAI,
|
||||
APIConnectionError,
|
||||
RateLimitError,
|
||||
Timeout,
|
||||
AsyncAzureOpenAI,
|
||||
)
|
||||
|
||||
import base64
|
||||
import struct
|
||||
|
||||
from tenacity import (
|
||||
retry,
|
||||
stop_after_attempt,
|
||||
wait_exponential,
|
||||
retry_if_exception_type,
|
||||
)
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
import torch
|
||||
from pydantic import BaseModel, Field
|
||||
from typing import List, Dict, Callable, Any
|
||||
from .base import BaseKVStorage
|
||||
from .utils import compute_args_hash, wrap_embedding_func_with_attrs
|
||||
|
||||
os.environ["TOKENIZERS_PARALLELISM"] = "false"
|
||||
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=4, max=10),
|
||||
retry=retry_if_exception_type((RateLimitError, APIConnectionError, Timeout)),
|
||||
)
|
||||
async def openai_complete_if_cache(
|
||||
model,
|
||||
prompt,
|
||||
system_prompt=None,
|
||||
history_messages=[],
|
||||
base_url=None,
|
||||
api_key=None,
|
||||
**kwargs,
|
||||
) -> str:
|
||||
if api_key:
|
||||
os.environ["OPENAI_API_KEY"] = api_key
|
||||
|
||||
openai_async_client = (
|
||||
AsyncOpenAI() if base_url is None else AsyncOpenAI(base_url=base_url)
|
||||
)
|
||||
hashing_kv: BaseKVStorage = kwargs.pop("hashing_kv", None)
|
||||
messages = []
|
||||
if system_prompt:
|
||||
messages.append({"role": "system", "content": system_prompt})
|
||||
messages.extend(history_messages)
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
if hashing_kv is not None:
|
||||
args_hash = compute_args_hash(model, messages)
|
||||
if_cache_return = await hashing_kv.get_by_id(args_hash)
|
||||
if if_cache_return is not None:
|
||||
return if_cache_return["return"]
|
||||
|
||||
response = await openai_async_client.chat.completions.create(
|
||||
model=model, messages=messages, **kwargs
|
||||
)
|
||||
|
||||
if hashing_kv is not None:
|
||||
await hashing_kv.upsert(
|
||||
{args_hash: {"return": response.choices[0].message.content, "model": model}}
|
||||
)
|
||||
|
||||
return response.choices[0].message.content
|
||||
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=4, max=10),
|
||||
retry=retry_if_exception_type((RateLimitError, APIConnectionError, Timeout)),
|
||||
)
|
||||
async def azure_openai_complete_if_cache(
|
||||
model,
|
||||
prompt,
|
||||
system_prompt=None,
|
||||
history_messages=[],
|
||||
base_url=None,
|
||||
api_key=None,
|
||||
**kwargs,
|
||||
):
|
||||
if api_key:
|
||||
os.environ["AZURE_OPENAI_API_KEY"] = api_key
|
||||
if base_url:
|
||||
os.environ["AZURE_OPENAI_ENDPOINT"] = base_url
|
||||
|
||||
openai_async_client = AsyncAzureOpenAI(
|
||||
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
|
||||
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
|
||||
api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
|
||||
)
|
||||
|
||||
hashing_kv: BaseKVStorage = kwargs.pop("hashing_kv", None)
|
||||
messages = []
|
||||
if system_prompt:
|
||||
messages.append({"role": "system", "content": system_prompt})
|
||||
messages.extend(history_messages)
|
||||
if prompt is not None:
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
if hashing_kv is not None:
|
||||
args_hash = compute_args_hash(model, messages)
|
||||
if_cache_return = await hashing_kv.get_by_id(args_hash)
|
||||
if if_cache_return is not None:
|
||||
return if_cache_return["return"]
|
||||
|
||||
response = await openai_async_client.chat.completions.create(
|
||||
model=model, messages=messages, **kwargs
|
||||
)
|
||||
|
||||
if hashing_kv is not None:
|
||||
await hashing_kv.upsert(
|
||||
{args_hash: {"return": response.choices[0].message.content, "model": model}}
|
||||
)
|
||||
return response.choices[0].message.content
|
||||
|
||||
|
||||
class BedrockError(Exception):
|
||||
"""Generic error for issues related to Amazon Bedrock"""
|
||||
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(5),
|
||||
wait=wait_exponential(multiplier=1, max=60),
|
||||
retry=retry_if_exception_type((BedrockError)),
|
||||
)
|
||||
async def bedrock_complete_if_cache(
|
||||
model,
|
||||
prompt,
|
||||
system_prompt=None,
|
||||
history_messages=[],
|
||||
aws_access_key_id=None,
|
||||
aws_secret_access_key=None,
|
||||
aws_session_token=None,
|
||||
**kwargs,
|
||||
) -> str:
|
||||
os.environ["AWS_ACCESS_KEY_ID"] = os.environ.get(
|
||||
"AWS_ACCESS_KEY_ID", aws_access_key_id
|
||||
)
|
||||
os.environ["AWS_SECRET_ACCESS_KEY"] = os.environ.get(
|
||||
"AWS_SECRET_ACCESS_KEY", aws_secret_access_key
|
||||
)
|
||||
os.environ["AWS_SESSION_TOKEN"] = os.environ.get(
|
||||
"AWS_SESSION_TOKEN", aws_session_token
|
||||
)
|
||||
|
||||
# Fix message history format
|
||||
messages = []
|
||||
for history_message in history_messages:
|
||||
message = copy.copy(history_message)
|
||||
message["content"] = [{"text": message["content"]}]
|
||||
messages.append(message)
|
||||
|
||||
# Add user prompt
|
||||
messages.append({"role": "user", "content": [{"text": prompt}]})
|
||||
|
||||
# Initialize Converse API arguments
|
||||
args = {"modelId": model, "messages": messages}
|
||||
|
||||
# Define system prompt
|
||||
if system_prompt:
|
||||
args["system"] = [{"text": system_prompt}]
|
||||
|
||||
# Map and set up inference parameters
|
||||
inference_params_map = {
|
||||
"max_tokens": "maxTokens",
|
||||
"top_p": "topP",
|
||||
"stop_sequences": "stopSequences",
|
||||
}
|
||||
if inference_params := list(
|
||||
set(kwargs) & set(["max_tokens", "temperature", "top_p", "stop_sequences"])
|
||||
):
|
||||
args["inferenceConfig"] = {}
|
||||
for param in inference_params:
|
||||
args["inferenceConfig"][inference_params_map.get(param, param)] = (
|
||||
kwargs.pop(param)
|
||||
)
|
||||
|
||||
hashing_kv: BaseKVStorage = kwargs.pop("hashing_kv", None)
|
||||
if hashing_kv is not None:
|
||||
args_hash = compute_args_hash(model, messages)
|
||||
if_cache_return = await hashing_kv.get_by_id(args_hash)
|
||||
if if_cache_return is not None:
|
||||
return if_cache_return["return"]
|
||||
|
||||
# Call model via Converse API
|
||||
session = aioboto3.Session()
|
||||
async with session.client("bedrock-runtime") as bedrock_async_client:
|
||||
try:
|
||||
response = await bedrock_async_client.converse(**args, **kwargs)
|
||||
except Exception as e:
|
||||
raise BedrockError(e)
|
||||
|
||||
if hashing_kv is not None:
|
||||
await hashing_kv.upsert(
|
||||
{
|
||||
args_hash: {
|
||||
"return": response["output"]["message"]["content"][0]["text"],
|
||||
"model": model,
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
return response["output"]["message"]["content"][0]["text"]
|
||||
|
||||
|
||||
@lru_cache(maxsize=1)
|
||||
def initialize_hf_model(model_name):
|
||||
hf_tokenizer = AutoTokenizer.from_pretrained(
|
||||
model_name, device_map="auto", trust_remote_code=True#False
|
||||
)
|
||||
hf_model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name, device_map="auto", trust_remote_code=True
|
||||
)
|
||||
if hf_tokenizer.pad_token is None:
|
||||
hf_tokenizer.pad_token = hf_tokenizer.eos_token
|
||||
|
||||
return hf_model, hf_tokenizer
|
||||
|
||||
|
||||
async def hf_model_if_cache(
|
||||
model, prompt, system_prompt=None, history_messages=[], **kwargs
|
||||
) -> str:
|
||||
model_name = model
|
||||
hf_model, hf_tokenizer = initialize_hf_model(model_name)
|
||||
hashing_kv: BaseKVStorage = kwargs.pop("hashing_kv", None)
|
||||
messages = []
|
||||
if system_prompt:
|
||||
messages.append({"role": "system", "content": system_prompt})
|
||||
messages.extend(history_messages)
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
|
||||
if hashing_kv is not None:
|
||||
args_hash = compute_args_hash(model, messages)
|
||||
if_cache_return = await hashing_kv.get_by_id(args_hash)
|
||||
if if_cache_return is not None:
|
||||
return if_cache_return["return"]
|
||||
input_prompt = ""
|
||||
try:
|
||||
input_prompt = hf_tokenizer.apply_chat_template(
|
||||
messages, tokenize=False, add_generation_prompt=True
|
||||
)
|
||||
except Exception:
|
||||
try:
|
||||
ori_message = copy.deepcopy(messages)
|
||||
if messages[0]["role"] == "system":
|
||||
messages[1]["content"] = (
|
||||
"<system>"
|
||||
+ messages[0]["content"]
|
||||
+ "</system>\n"
|
||||
+ messages[1]["content"]
|
||||
)
|
||||
messages = messages[1:]
|
||||
input_prompt = hf_tokenizer.apply_chat_template(
|
||||
messages, tokenize=False, add_generation_prompt=True
|
||||
)
|
||||
except Exception:
|
||||
len_message = len(ori_message)
|
||||
for msgid in range(len_message):
|
||||
input_prompt = (
|
||||
input_prompt
|
||||
+ "<"
|
||||
+ ori_message[msgid]["role"]
|
||||
+ ">"
|
||||
+ ori_message[msgid]["content"]
|
||||
+ "</"
|
||||
+ ori_message[msgid]["role"]
|
||||
+ ">\n"
|
||||
)
|
||||
|
||||
input_ids = hf_tokenizer(
|
||||
input_prompt, return_tensors="pt", padding=True, truncation=True
|
||||
).to("cuda")
|
||||
torch.cuda.empty_cache()
|
||||
# inputs = {k: v.to(hf_model.device) for k, v in input_ids.items()}
|
||||
output = hf_model.generate(
|
||||
**input_ids, max_new_tokens=500, num_return_sequences=1, early_stopping=True
|
||||
)
|
||||
response_text = hf_tokenizer.decode(
|
||||
output[0][len(input_ids[0]) :], skip_special_tokens=True
|
||||
)
|
||||
|
||||
|
||||
FINDSTRING = "<|COMPLETE|>"
|
||||
last_assistant_index = response_text.find(FINDSTRING)
|
||||
|
||||
if last_assistant_index != -1:
|
||||
response_text = response_text[:last_assistant_index + len(FINDSTRING)]
|
||||
else:
|
||||
response_text = response_text
|
||||
|
||||
if hashing_kv is not None:
|
||||
await hashing_kv.upsert({args_hash: {"return": response_text, "model": model}})
|
||||
|
||||
return response_text
|
||||
|
||||
|
||||
async def ollama_model_if_cache(
|
||||
model, prompt, system_prompt=None, history_messages=[], **kwargs
|
||||
) -> str:
|
||||
kwargs.pop("max_tokens", None)
|
||||
kwargs.pop("response_format", None)
|
||||
host = kwargs.pop("host", None)
|
||||
timeout = kwargs.pop("timeout", None)
|
||||
|
||||
ollama_client = ollama.AsyncClient(host=host, timeout=timeout)
|
||||
messages = []
|
||||
if system_prompt:
|
||||
messages.append({"role": "system", "content": system_prompt})
|
||||
|
||||
hashing_kv: BaseKVStorage = kwargs.pop("hashing_kv", None)
|
||||
messages.extend(history_messages)
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
if hashing_kv is not None:
|
||||
args_hash = compute_args_hash(model, messages)
|
||||
if_cache_return = await hashing_kv.get_by_id(args_hash)
|
||||
if if_cache_return is not None:
|
||||
return if_cache_return["return"]
|
||||
|
||||
response = await ollama_client.chat(model=model, messages=messages, **kwargs)
|
||||
|
||||
result = response["message"]["content"]
|
||||
|
||||
if hashing_kv is not None:
|
||||
await hashing_kv.upsert({args_hash: {"return": result, "model": model}})
|
||||
|
||||
return result
|
||||
|
||||
|
||||
@lru_cache(maxsize=1)
|
||||
def initialize_lmdeploy_pipeline(
|
||||
model,
|
||||
tp=1,
|
||||
chat_template=None,
|
||||
log_level="WARNING",
|
||||
model_format="hf",
|
||||
quant_policy=0,
|
||||
):
|
||||
from lmdeploy import pipeline, ChatTemplateConfig, TurbomindEngineConfig
|
||||
|
||||
lmdeploy_pipe = pipeline(
|
||||
model_path=model,
|
||||
backend_config=TurbomindEngineConfig(
|
||||
tp=tp, model_format=model_format, quant_policy=quant_policy
|
||||
),
|
||||
chat_template_config=ChatTemplateConfig(model_name=chat_template)
|
||||
if chat_template
|
||||
else None,
|
||||
log_level="WARNING",
|
||||
)
|
||||
return lmdeploy_pipe
|
||||
|
||||
|
||||
async def lmdeploy_model_if_cache(
|
||||
model,
|
||||
prompt,
|
||||
system_prompt=None,
|
||||
history_messages=[],
|
||||
chat_template=None,
|
||||
model_format="hf",
|
||||
quant_policy=0,
|
||||
**kwargs,
|
||||
) -> str:
|
||||
"""
|
||||
Args:
|
||||
model (str): The path to the model.
|
||||
It could be one of the following options:
|
||||
- i) A local directory path of a turbomind model which is
|
||||
converted by `lmdeploy convert` command or download
|
||||
from ii) and iii).
|
||||
- ii) The model_id of a lmdeploy-quantized model hosted
|
||||
inside a model repo on huggingface.co, such as
|
||||
"InternLM/internlm-chat-20b-4bit",
|
||||
"lmdeploy/llama2-chat-70b-4bit", etc.
|
||||
- iii) The model_id of a model hosted inside a model repo
|
||||
on huggingface.co, such as "internlm/internlm-chat-7b",
|
||||
"Qwen/Qwen-7B-Chat ", "baichuan-inc/Baichuan2-7B-Chat"
|
||||
and so on.
|
||||
chat_template (str): needed when model is a pytorch model on
|
||||
huggingface.co, such as "internlm-chat-7b",
|
||||
"Qwen-7B-Chat ", "Baichuan2-7B-Chat" and so on,
|
||||
and when the model name of local path did not match the original model name in HF.
|
||||
tp (int): tensor parallel
|
||||
prompt (Union[str, List[str]]): input texts to be completed.
|
||||
do_preprocess (bool): whether pre-process the messages. Default to
|
||||
True, which means chat_template will be applied.
|
||||
skip_special_tokens (bool): Whether or not to remove special tokens
|
||||
in the decoding. Default to be True.
|
||||
do_sample (bool): Whether or not to use sampling, use greedy decoding otherwise.
|
||||
Default to be False, which means greedy decoding will be applied.
|
||||
"""
|
||||
try:
|
||||
import lmdeploy
|
||||
from lmdeploy import version_info, GenerationConfig
|
||||
except Exception:
|
||||
raise ImportError("Please install lmdeploy before intialize lmdeploy backend.")
|
||||
|
||||
kwargs.pop("response_format", None)
|
||||
max_new_tokens = kwargs.pop("max_tokens", 512)
|
||||
tp = kwargs.pop("tp", 1)
|
||||
skip_special_tokens = kwargs.pop("skip_special_tokens", True)
|
||||
do_preprocess = kwargs.pop("do_preprocess", True)
|
||||
do_sample = kwargs.pop("do_sample", False)
|
||||
gen_params = kwargs
|
||||
|
||||
version = version_info
|
||||
if do_sample is not None and version < (0, 6, 0):
|
||||
raise RuntimeError(
|
||||
"`do_sample` parameter is not supported by lmdeploy until "
|
||||
f"v0.6.0, but currently using lmdeloy {lmdeploy.__version__}"
|
||||
)
|
||||
else:
|
||||
do_sample = True
|
||||
gen_params.update(do_sample=do_sample)
|
||||
|
||||
lmdeploy_pipe = initialize_lmdeploy_pipeline(
|
||||
model=model,
|
||||
tp=tp,
|
||||
chat_template=chat_template,
|
||||
model_format=model_format,
|
||||
quant_policy=quant_policy,
|
||||
log_level="WARNING",
|
||||
)
|
||||
|
||||
messages = []
|
||||
if system_prompt:
|
||||
messages.append({"role": "system", "content": system_prompt})
|
||||
|
||||
hashing_kv: BaseKVStorage = kwargs.pop("hashing_kv", None)
|
||||
messages.extend(history_messages)
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
if hashing_kv is not None:
|
||||
args_hash = compute_args_hash(model, messages)
|
||||
if_cache_return = await hashing_kv.get_by_id(args_hash)
|
||||
if if_cache_return is not None:
|
||||
return if_cache_return["return"]
|
||||
|
||||
gen_config = GenerationConfig(
|
||||
skip_special_tokens=skip_special_tokens,
|
||||
max_new_tokens=max_new_tokens,
|
||||
**gen_params,
|
||||
)
|
||||
|
||||
response = ""
|
||||
async for res in lmdeploy_pipe.generate(
|
||||
messages,
|
||||
gen_config=gen_config,
|
||||
do_preprocess=do_preprocess,
|
||||
stream_response=False,
|
||||
session_id=1,
|
||||
):
|
||||
response += res.response
|
||||
|
||||
if hashing_kv is not None:
|
||||
await hashing_kv.upsert({args_hash: {"return": response, "model": model}})
|
||||
return response
|
||||
|
||||
|
||||
async def gpt_4o_complete(
|
||||
prompt, system_prompt=None, history_messages=[], **kwargs
|
||||
) -> str:
|
||||
return await openai_complete_if_cache(
|
||||
"gpt-4o",
|
||||
prompt,
|
||||
system_prompt=system_prompt,
|
||||
history_messages=history_messages,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
async def gpt_4o_mini_complete(
|
||||
prompt, system_prompt=None, history_messages=[], **kwargs
|
||||
) -> str:
|
||||
return await openai_complete_if_cache(
|
||||
"gpt-4o-mini",
|
||||
prompt,
|
||||
system_prompt=system_prompt,
|
||||
history_messages=history_messages,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
async def azure_openai_complete(
|
||||
prompt, system_prompt=None, history_messages=[], **kwargs
|
||||
) -> str:
|
||||
return await azure_openai_complete_if_cache(
|
||||
"conversation-4o-mini",
|
||||
prompt,
|
||||
system_prompt=system_prompt,
|
||||
history_messages=history_messages,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
async def bedrock_complete(
|
||||
prompt, system_prompt=None, history_messages=[], **kwargs
|
||||
) -> str:
|
||||
return await bedrock_complete_if_cache(
|
||||
"anthropic.claude-3-haiku-20240307-v1:0",
|
||||
prompt,
|
||||
system_prompt=system_prompt,
|
||||
history_messages=history_messages,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
async def hf_model_complete(
|
||||
prompt, system_prompt=None, history_messages=[], **kwargs
|
||||
) -> str:
|
||||
model_name = kwargs["hashing_kv"].global_config["llm_model_name"]
|
||||
return await hf_model_if_cache(
|
||||
model_name,
|
||||
prompt,
|
||||
system_prompt=system_prompt,
|
||||
history_messages=history_messages,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
async def ollama_model_complete(
|
||||
prompt, system_prompt=None, history_messages=[], **kwargs
|
||||
) -> str:
|
||||
model_name = kwargs["hashing_kv"].global_config["llm_model_name"]
|
||||
return await ollama_model_if_cache(
|
||||
model_name,
|
||||
prompt,
|
||||
system_prompt=system_prompt,
|
||||
history_messages=history_messages,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
@wrap_embedding_func_with_attrs(embedding_dim=1536, max_token_size=8192)
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=4, max=60),
|
||||
retry=retry_if_exception_type((RateLimitError, APIConnectionError, Timeout)),
|
||||
)
|
||||
async def openai_embedding(
|
||||
texts: list[str],
|
||||
model: str = "text-embedding-3-small",
|
||||
base_url: str = None,
|
||||
api_key: str = None,
|
||||
) -> np.ndarray:
|
||||
if api_key:
|
||||
os.environ["OPENAI_API_KEY"] = api_key
|
||||
|
||||
openai_async_client = (
|
||||
AsyncOpenAI() if base_url is None else AsyncOpenAI(base_url=base_url)
|
||||
)
|
||||
response = await openai_async_client.embeddings.create(
|
||||
model=model, input=texts, encoding_format="float"
|
||||
)
|
||||
return np.array([dp.embedding for dp in response.data])
|
||||
|
||||
|
||||
@wrap_embedding_func_with_attrs(embedding_dim=1536, max_token_size=8192)
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=4, max=10),
|
||||
retry=retry_if_exception_type((RateLimitError, APIConnectionError, Timeout)),
|
||||
)
|
||||
async def azure_openai_embedding(
|
||||
texts: list[str],
|
||||
model: str = "text-embedding-3-small",
|
||||
base_url: str = None,
|
||||
api_key: str = None,
|
||||
) -> np.ndarray:
|
||||
if api_key:
|
||||
os.environ["AZURE_OPENAI_API_KEY"] = api_key
|
||||
if base_url:
|
||||
os.environ["AZURE_OPENAI_ENDPOINT"] = base_url
|
||||
|
||||
openai_async_client = AsyncAzureOpenAI(
|
||||
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
|
||||
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
|
||||
api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
|
||||
)
|
||||
|
||||
response = await openai_async_client.embeddings.create(
|
||||
model=model, input=texts, encoding_format="float"
|
||||
)
|
||||
return np.array([dp.embedding for dp in response.data])
|
||||
|
||||
|
||||
@retry(
|
||||
stop=stop_after_attempt(3),
|
||||
wait=wait_exponential(multiplier=1, min=4, max=60),
|
||||
retry=retry_if_exception_type((RateLimitError, APIConnectionError, Timeout)),
|
||||
)
|
||||
async def siliconcloud_embedding(
|
||||
texts: list[str],
|
||||
model: str = "netease-youdao/bce-embedding-base_v1",
|
||||
base_url: str = "https://api.siliconflow.cn/v1/embeddings",
|
||||
max_token_size: int = 512,
|
||||
api_key: str = None,
|
||||
) -> np.ndarray:
|
||||
if api_key and not api_key.startswith("Bearer "):
|
||||
api_key = "Bearer " + api_key
|
||||
|
||||
headers = {"Authorization": api_key, "Content-Type": "application/json"}
|
||||
|
||||
truncate_texts = [text[0:max_token_size] for text in texts]
|
||||
|
||||
payload = {"model": model, "input": truncate_texts, "encoding_format": "base64"}
|
||||
|
||||
base64_strings = []
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.post(base_url, headers=headers, json=payload) as response:
|
||||
content = await response.json()
|
||||
if "code" in content:
|
||||
raise ValueError(content)
|
||||
base64_strings = [item["embedding"] for item in content["data"]]
|
||||
|
||||
embeddings = []
|
||||
for string in base64_strings:
|
||||
decode_bytes = base64.b64decode(string)
|
||||
n = len(decode_bytes) // 4
|
||||
float_array = struct.unpack("<" + "f" * n, decode_bytes)
|
||||
embeddings.append(float_array)
|
||||
return np.array(embeddings)
|
||||
|
||||
|
||||
# @wrap_embedding_func_with_attrs(embedding_dim=1024, max_token_size=8192)
|
||||
# @retry(
|
||||
# stop=stop_after_attempt(3),
|
||||
# wait=wait_exponential(multiplier=1, min=4, max=10),
|
||||
# retry=retry_if_exception_type((RateLimitError, APIConnectionError, Timeout)), # TODO: fix exceptions
|
||||
# )
|
||||
async def bedrock_embedding(
|
||||
texts: list[str],
|
||||
model: str = "amazon.titan-embed-text-v2:0",
|
||||
aws_access_key_id=None,
|
||||
aws_secret_access_key=None,
|
||||
aws_session_token=None,
|
||||
) -> np.ndarray:
|
||||
os.environ["AWS_ACCESS_KEY_ID"] = os.environ.get(
|
||||
"AWS_ACCESS_KEY_ID", aws_access_key_id
|
||||
)
|
||||
os.environ["AWS_SECRET_ACCESS_KEY"] = os.environ.get(
|
||||
"AWS_SECRET_ACCESS_KEY", aws_secret_access_key
|
||||
)
|
||||
os.environ["AWS_SESSION_TOKEN"] = os.environ.get(
|
||||
"AWS_SESSION_TOKEN", aws_session_token
|
||||
)
|
||||
|
||||
session = aioboto3.Session()
|
||||
async with session.client("bedrock-runtime") as bedrock_async_client:
|
||||
if (model_provider := model.split(".")[0]) == "amazon":
|
||||
embed_texts = []
|
||||
for text in texts:
|
||||
if "v2" in model:
|
||||
body = json.dumps(
|
||||
{
|
||||
"inputText": text,
|
||||
# 'dimensions': embedding_dim,
|
||||
"embeddingTypes": ["float"],
|
||||
}
|
||||
)
|
||||
elif "v1" in model:
|
||||
body = json.dumps({"inputText": text})
|
||||
else:
|
||||
raise ValueError(f"Model {model} is not supported!")
|
||||
|
||||
response = await bedrock_async_client.invoke_model(
|
||||
modelId=model,
|
||||
body=body,
|
||||
accept="application/json",
|
||||
contentType="application/json",
|
||||
)
|
||||
|
||||
response_body = await response.get("body").json()
|
||||
|
||||
embed_texts.append(response_body["embedding"])
|
||||
elif model_provider == "cohere":
|
||||
body = json.dumps(
|
||||
{"texts": texts, "input_type": "search_document", "truncate": "NONE"}
|
||||
)
|
||||
|
||||
response = await bedrock_async_client.invoke_model(
|
||||
model=model,
|
||||
body=body,
|
||||
accept="application/json",
|
||||
contentType="application/json",
|
||||
)
|
||||
|
||||
response_body = json.loads(response.get("body").read())
|
||||
|
||||
embed_texts = response_body["embeddings"]
|
||||
else:
|
||||
raise ValueError(f"Model provider '{model_provider}' is not supported!")
|
||||
|
||||
return np.array(embed_texts)
|
||||
|
||||
|
||||
async def hf_embedding(texts: list[str], tokenizer, embed_model) -> np.ndarray:
|
||||
embed_model.to('cuda:0')
|
||||
input_ids = tokenizer(
|
||||
texts, return_tensors="pt", padding=True, truncation=True
|
||||
).input_ids.cuda()
|
||||
with torch.no_grad():
|
||||
outputs = embed_model(input_ids)
|
||||
embeddings = outputs.last_hidden_state.mean(dim=1)
|
||||
return embeddings.detach().cpu().numpy()
|
||||
|
||||
|
||||
async def ollama_embedding(texts: list[str], embed_model, **kwargs) -> np.ndarray:
|
||||
embed_text = []
|
||||
ollama_client = ollama.Client(**kwargs)
|
||||
for text in texts:
|
||||
data = ollama_client.embeddings(model=embed_model, prompt=text)
|
||||
embed_text.append(data["embedding"])
|
||||
|
||||
return embed_text
|
||||
|
||||
|
||||
class Model(BaseModel):
|
||||
"""
|
||||
This is a Pydantic model class named 'Model' that is used to define a custom language model.
|
||||
|
||||
Attributes:
|
||||
gen_func (Callable[[Any], str]): A callable function that generates the response from the language model.
|
||||
The function should take any argument and return a string.
|
||||
kwargs (Dict[str, Any]): A dictionary that contains the arguments to pass to the callable function.
|
||||
This could include parameters such as the model name, API key, etc.
|
||||
|
||||
Example usage:
|
||||
Model(gen_func=openai_complete_if_cache, kwargs={"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY_1"]})
|
||||
|
||||
In this example, 'openai_complete_if_cache' is the callable function that generates the response from the OpenAI model.
|
||||
The 'kwargs' dictionary contains the model name and API key to be passed to the function.
|
||||
"""
|
||||
|
||||
gen_func: Callable[[Any], str] = Field(
|
||||
...,
|
||||
description="A function that generates the response from the llm. The response must be a string",
|
||||
)
|
||||
kwargs: Dict[str, Any] = Field(
|
||||
...,
|
||||
description="The arguments to pass to the callable function. Eg. the api key, model name, etc",
|
||||
)
|
||||
|
||||
class Config:
|
||||
arbitrary_types_allowed = True
|
||||
|
||||
|
||||
class MultiModel:
|
||||
"""
|
||||
Distributes the load across multiple language models. Useful for circumventing low rate limits with certain api providers especially if you are on the free tier.
|
||||
Could also be used for spliting across diffrent models or providers.
|
||||
|
||||
Attributes:
|
||||
models (List[Model]): A list of language models to be used.
|
||||
|
||||
Usage example:
|
||||
```python
|
||||
models = [
|
||||
Model(gen_func=openai_complete_if_cache, kwargs={"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY_1"]}),
|
||||
Model(gen_func=openai_complete_if_cache, kwargs={"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY_2"]}),
|
||||
Model(gen_func=openai_complete_if_cache, kwargs={"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY_3"]}),
|
||||
Model(gen_func=openai_complete_if_cache, kwargs={"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY_4"]}),
|
||||
Model(gen_func=openai_complete_if_cache, kwargs={"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY_5"]}),
|
||||
]
|
||||
multi_model = MultiModel(models)
|
||||
rag = LightRAG(
|
||||
llm_model_func=multi_model.llm_model_func
|
||||
/ ..other args
|
||||
)
|
||||
```
|
||||
"""
|
||||
|
||||
def __init__(self, models: List[Model]):
|
||||
self._models = models
|
||||
self._current_model = 0
|
||||
|
||||
def _next_model(self):
|
||||
self._current_model = (self._current_model + 1) % len(self._models)
|
||||
return self._models[self._current_model]
|
||||
|
||||
async def llm_model_func(
|
||||
self, prompt, system_prompt=None, history_messages=[], **kwargs
|
||||
) -> str:
|
||||
kwargs.pop("model", None) # stop from overwriting the custom model name
|
||||
next_model = self._next_model()
|
||||
args = dict(
|
||||
prompt=prompt,
|
||||
system_prompt=system_prompt,
|
||||
history_messages=history_messages,
|
||||
**kwargs,
|
||||
**next_model.kwargs,
|
||||
)
|
||||
|
||||
return await next_model.gen_func(**args)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import asyncio
|
||||
|
||||
async def main():
|
||||
result = await gpt_4o_mini_complete("How are you?")
|
||||
print(result)
|
||||
|
||||
asyncio.run(main())
|
||||
402
minirag/minirag.py
Normal file
402
minirag/minirag.py
Normal file
@@ -0,0 +1,402 @@
|
||||
import asyncio
|
||||
import os
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from datetime import datetime
|
||||
from functools import partial
|
||||
from typing import Type, cast
|
||||
|
||||
from .llm import *
|
||||
|
||||
from .operate import (
|
||||
chunking_by_token_size,
|
||||
extract_entities,
|
||||
local_query,
|
||||
global_query,
|
||||
hybrid_query,
|
||||
minirag_query,
|
||||
naive_query,
|
||||
)
|
||||
|
||||
from .utils import (
|
||||
EmbeddingFunc,
|
||||
compute_mdhash_id,
|
||||
limit_async_func_call,
|
||||
convert_response_to_json,
|
||||
logger,
|
||||
set_logger,
|
||||
)
|
||||
from .base import (
|
||||
BaseGraphStorage,
|
||||
BaseKVStorage,
|
||||
BaseVectorStorage,
|
||||
StorageNameSpace,
|
||||
QueryParam,
|
||||
)
|
||||
|
||||
from .storage import (
|
||||
JsonKVStorage,
|
||||
NanoVectorDBStorage,
|
||||
NetworkXStorage,
|
||||
)
|
||||
|
||||
from .kg.neo4j_impl import Neo4JStorage
|
||||
|
||||
from .kg.oracle_impl import OracleKVStorage, OracleGraphStorage, OracleVectorDBStorage
|
||||
|
||||
# future KG integrations
|
||||
|
||||
# from .kg.ArangoDB_impl import (
|
||||
# GraphStorage as ArangoDBStorage
|
||||
# )
|
||||
|
||||
|
||||
def always_get_an_event_loop() -> asyncio.AbstractEventLoop:
|
||||
try:
|
||||
return asyncio.get_event_loop()
|
||||
|
||||
except RuntimeError:
|
||||
logger.info("Creating a new event loop in main thread.")
|
||||
loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(loop)
|
||||
|
||||
return loop
|
||||
|
||||
|
||||
@dataclass
|
||||
class MiniRAG:
|
||||
working_dir: str = field(
|
||||
default_factory=lambda: f"./minirag_cache_{datetime.now().strftime('%Y-%m-%d-%H:%M:%S')}"
|
||||
)
|
||||
|
||||
|
||||
# RAGmode: str = 'minirag'
|
||||
|
||||
kv_storage: str = field(default="JsonKVStorage")
|
||||
vector_storage: str = field(default="NanoVectorDBStorage")
|
||||
graph_storage: str = field(default="NetworkXStorage")
|
||||
|
||||
current_log_level = logger.level
|
||||
log_level: str = field(default=current_log_level)
|
||||
|
||||
# text chunking
|
||||
chunk_token_size: int = 1200
|
||||
chunk_overlap_token_size: int = 100
|
||||
tiktoken_model_name: str = "gpt-4o-mini"
|
||||
|
||||
# entity extraction
|
||||
entity_extract_max_gleaning: int = 1
|
||||
entity_summary_to_max_tokens: int = 500
|
||||
|
||||
# node embedding
|
||||
node_embedding_algorithm: str = "node2vec"
|
||||
node2vec_params: dict = field(
|
||||
default_factory=lambda: {
|
||||
"dimensions": 1536,
|
||||
"num_walks": 10,
|
||||
"walk_length": 40,
|
||||
"window_size": 2,
|
||||
"iterations": 3,
|
||||
"random_seed": 3,
|
||||
}
|
||||
)
|
||||
|
||||
embedding_func: EmbeddingFunc = field(default_factory=lambda: openai_embedding)
|
||||
embedding_batch_num: int = 32
|
||||
embedding_func_max_async: int = 16
|
||||
|
||||
# LLM
|
||||
llm_model_func: callable = hf_model_complete#gpt_4o_mini_complete #
|
||||
llm_model_name: str = "meta-llama/Llama-3.2-1B-Instruct" #'meta-llama/Llama-3.2-1B'#'google/gemma-2-2b-it'
|
||||
llm_model_max_token_size: int = 32768
|
||||
llm_model_max_async: int = 16
|
||||
llm_model_kwargs: dict = field(default_factory=dict)
|
||||
|
||||
# storage
|
||||
vector_db_storage_cls_kwargs: dict = field(default_factory=dict)
|
||||
|
||||
enable_llm_cache: bool = True
|
||||
|
||||
# extension
|
||||
addon_params: dict = field(default_factory=dict)
|
||||
convert_response_to_json_func: callable = convert_response_to_json
|
||||
|
||||
def __post_init__(self):
|
||||
log_file = os.path.join(self.working_dir, "minirag.log")
|
||||
set_logger(log_file)
|
||||
logger.setLevel(self.log_level)
|
||||
|
||||
logger.info(f"Logger initialized for working directory: {self.working_dir}")
|
||||
|
||||
_print_config = ",\n ".join([f"{k} = {v}" for k, v in asdict(self).items()])
|
||||
logger.debug(f"MiniRAG init with param:\n {_print_config}\n")
|
||||
|
||||
# @TODO: should move all storage setup here to leverage initial start params attached to self.
|
||||
|
||||
self.key_string_value_json_storage_cls: Type[BaseKVStorage] = (
|
||||
self._get_storage_class()[self.kv_storage]
|
||||
)
|
||||
self.vector_db_storage_cls: Type[BaseVectorStorage] = self._get_storage_class()[
|
||||
self.vector_storage
|
||||
]
|
||||
self.graph_storage_cls: Type[BaseGraphStorage] = self._get_storage_class()[
|
||||
self.graph_storage
|
||||
]
|
||||
|
||||
if not os.path.exists(self.working_dir):
|
||||
logger.info(f"Creating working directory {self.working_dir}")
|
||||
os.makedirs(self.working_dir)
|
||||
|
||||
self.llm_response_cache = (
|
||||
self.key_string_value_json_storage_cls(
|
||||
namespace="llm_response_cache",
|
||||
global_config=asdict(self),
|
||||
embedding_func=None,
|
||||
)
|
||||
if self.enable_llm_cache
|
||||
else None
|
||||
)
|
||||
|
||||
self.embedding_func = limit_async_func_call(self.embedding_func_max_async)(
|
||||
self.embedding_func
|
||||
)
|
||||
|
||||
####
|
||||
# add embedding func by walter
|
||||
####
|
||||
self.full_docs = self.key_string_value_json_storage_cls(
|
||||
namespace="full_docs",
|
||||
global_config=asdict(self),
|
||||
embedding_func=self.embedding_func,
|
||||
)
|
||||
self.text_chunks = self.key_string_value_json_storage_cls(
|
||||
namespace="text_chunks",
|
||||
global_config=asdict(self),
|
||||
embedding_func=self.embedding_func,
|
||||
)
|
||||
self.chunk_entity_relation_graph = self.graph_storage_cls(
|
||||
namespace="chunk_entity_relation",
|
||||
global_config=asdict(self),
|
||||
embedding_func=self.embedding_func,
|
||||
)
|
||||
####
|
||||
# add embedding func by walter over
|
||||
####
|
||||
|
||||
self.entities_vdb = self.vector_db_storage_cls(
|
||||
namespace="entities",
|
||||
global_config=asdict(self),
|
||||
embedding_func=self.embedding_func,
|
||||
meta_fields={"entity_name"},
|
||||
)
|
||||
global_config=asdict(self)
|
||||
|
||||
self.entity_name_vdb = (
|
||||
self.vector_db_storage_cls(
|
||||
namespace="entities_name",
|
||||
global_config=asdict(self),
|
||||
embedding_func=self.embedding_func,
|
||||
meta_fields={"entity_name"}
|
||||
)
|
||||
)
|
||||
|
||||
self.relationships_vdb = self.vector_db_storage_cls(
|
||||
namespace="relationships",
|
||||
global_config=asdict(self),
|
||||
embedding_func=self.embedding_func,
|
||||
meta_fields={"src_id", "tgt_id"},
|
||||
)
|
||||
self.chunks_vdb = self.vector_db_storage_cls(
|
||||
namespace="chunks",
|
||||
global_config=asdict(self),
|
||||
embedding_func=self.embedding_func,
|
||||
)
|
||||
|
||||
self.llm_model_func = limit_async_func_call(self.llm_model_max_async)(
|
||||
partial(
|
||||
self.llm_model_func,
|
||||
hashing_kv=self.llm_response_cache,
|
||||
**self.llm_model_kwargs,
|
||||
)
|
||||
)
|
||||
|
||||
def _get_storage_class(self) -> Type[BaseGraphStorage]:
|
||||
return {
|
||||
# kv storage
|
||||
"JsonKVStorage": JsonKVStorage,
|
||||
"OracleKVStorage": OracleKVStorage,
|
||||
# vector storage
|
||||
"NanoVectorDBStorage": NanoVectorDBStorage,
|
||||
"OracleVectorDBStorage": OracleVectorDBStorage,
|
||||
# graph storage
|
||||
"NetworkXStorage": NetworkXStorage,
|
||||
"Neo4JStorage": Neo4JStorage,
|
||||
"OracleGraphStorage": OracleGraphStorage,
|
||||
# "ArangoDBStorage": ArangoDBStorage
|
||||
}
|
||||
|
||||
def insert(self, string_or_strings):
|
||||
loop = always_get_an_event_loop()
|
||||
return loop.run_until_complete(self.ainsert(string_or_strings))
|
||||
|
||||
async def ainsert(self, string_or_strings):
|
||||
update_storage = False
|
||||
try:
|
||||
if isinstance(string_or_strings, str):
|
||||
string_or_strings = [string_or_strings]
|
||||
|
||||
new_docs = {
|
||||
compute_mdhash_id(c.strip(), prefix="doc-"): {"content": c.strip()}
|
||||
for c in string_or_strings
|
||||
}
|
||||
_add_doc_keys = await self.full_docs.filter_keys(list(new_docs.keys()))
|
||||
new_docs = {k: v for k, v in new_docs.items() if k in _add_doc_keys}
|
||||
if not len(new_docs):
|
||||
logger.warning("All docs are already in the storage")
|
||||
return
|
||||
update_storage = True
|
||||
logger.info(f"[New Docs] inserting {len(new_docs)} docs")
|
||||
|
||||
inserting_chunks = {}
|
||||
for doc_key, doc in new_docs.items():
|
||||
chunks = {
|
||||
compute_mdhash_id(dp["content"], prefix="chunk-"): {
|
||||
**dp,
|
||||
"full_doc_id": doc_key,
|
||||
}
|
||||
for dp in chunking_by_token_size(
|
||||
doc["content"],
|
||||
overlap_token_size=self.chunk_overlap_token_size,
|
||||
max_token_size=self.chunk_token_size,
|
||||
tiktoken_model=self.tiktoken_model_name,
|
||||
)
|
||||
}
|
||||
inserting_chunks.update(chunks)
|
||||
_add_chunk_keys = await self.text_chunks.filter_keys(
|
||||
list(inserting_chunks.keys())
|
||||
)
|
||||
inserting_chunks = {
|
||||
k: v for k, v in inserting_chunks.items() if k in _add_chunk_keys
|
||||
}
|
||||
if not len(inserting_chunks):
|
||||
logger.warning("All chunks are already in the storage")
|
||||
return
|
||||
logger.info(f"[New Chunks] inserting {len(inserting_chunks)} chunks")
|
||||
|
||||
await self.chunks_vdb.upsert(inserting_chunks)
|
||||
|
||||
logger.info("[Entity Extraction]...")
|
||||
maybe_new_kg = await extract_entities(
|
||||
inserting_chunks,
|
||||
knowledge_graph_inst=self.chunk_entity_relation_graph,
|
||||
entity_vdb=self.entities_vdb,
|
||||
entity_name_vdb=self.entity_name_vdb,
|
||||
relationships_vdb=self.relationships_vdb,
|
||||
global_config=asdict(self),
|
||||
)
|
||||
if maybe_new_kg is None:
|
||||
logger.warning("No new entities and relationships found")
|
||||
return
|
||||
self.chunk_entity_relation_graph = maybe_new_kg
|
||||
|
||||
await self.full_docs.upsert(new_docs)
|
||||
await self.text_chunks.upsert(inserting_chunks)
|
||||
finally:
|
||||
if update_storage:
|
||||
await self._insert_done()
|
||||
|
||||
async def _insert_done(self):
|
||||
tasks = []
|
||||
for storage_inst in [
|
||||
self.full_docs,
|
||||
self.text_chunks,
|
||||
self.llm_response_cache,
|
||||
self.entities_vdb,
|
||||
self.entity_name_vdb,
|
||||
self.relationships_vdb,
|
||||
self.chunks_vdb,
|
||||
self.chunk_entity_relation_graph,
|
||||
]:
|
||||
if storage_inst is None:
|
||||
continue
|
||||
tasks.append(cast(StorageNameSpace, storage_inst).index_done_callback())
|
||||
await asyncio.gather(*tasks)
|
||||
|
||||
def query(self, query: str, param: QueryParam = QueryParam()):
|
||||
loop = always_get_an_event_loop()
|
||||
return loop.run_until_complete(self.aquery(query, param))
|
||||
|
||||
async def aquery(self, query: str, param: QueryParam = QueryParam()):
|
||||
if param.mode == "light":
|
||||
response = await hybrid_query(
|
||||
query,
|
||||
self.chunk_entity_relation_graph,
|
||||
self.entities_vdb,
|
||||
self.relationships_vdb,
|
||||
self.text_chunks,
|
||||
param,
|
||||
asdict(self),
|
||||
)
|
||||
elif param.mode == "mini":
|
||||
response = await minirag_query(
|
||||
query,
|
||||
self.chunk_entity_relation_graph,
|
||||
self.entities_vdb,
|
||||
self.entity_name_vdb,
|
||||
self.relationships_vdb,
|
||||
self.chunks_vdb,
|
||||
self.text_chunks,
|
||||
self.embedding_func,
|
||||
param,
|
||||
asdict(self),
|
||||
)
|
||||
elif param.mode == "naive":
|
||||
response = await naive_query(
|
||||
query,
|
||||
self.chunks_vdb,
|
||||
self.text_chunks,
|
||||
param,
|
||||
asdict(self),
|
||||
)
|
||||
else:
|
||||
raise ValueError(f"Unknown mode {param.mode}")
|
||||
await self._query_done()
|
||||
return response
|
||||
|
||||
async def _query_done(self):
|
||||
tasks = []
|
||||
for storage_inst in [self.llm_response_cache]:
|
||||
if storage_inst is None:
|
||||
continue
|
||||
tasks.append(cast(StorageNameSpace, storage_inst).index_done_callback())
|
||||
await asyncio.gather(*tasks)
|
||||
|
||||
def delete_by_entity(self, entity_name: str):
|
||||
loop = always_get_an_event_loop()
|
||||
return loop.run_until_complete(self.adelete_by_entity(entity_name))
|
||||
|
||||
async def adelete_by_entity(self, entity_name: str):
|
||||
entity_name = f'"{entity_name.upper()}"'
|
||||
|
||||
try:
|
||||
await self.entities_vdb.delete_entity(entity_name)
|
||||
await self.relationships_vdb.delete_relation(entity_name)
|
||||
await self.chunk_entity_relation_graph.delete_node(entity_name)
|
||||
|
||||
logger.info(
|
||||
f"Entity '{entity_name}' and its relationships have been deleted."
|
||||
)
|
||||
await self._delete_by_entity_done()
|
||||
except Exception as e:
|
||||
logger.error(f"Error while deleting entity '{entity_name}': {e}")
|
||||
|
||||
async def _delete_by_entity_done(self):
|
||||
tasks = []
|
||||
for storage_inst in [
|
||||
self.entities_vdb,
|
||||
self.relationships_vdb,
|
||||
self.chunk_entity_relation_graph,
|
||||
]:
|
||||
if storage_inst is None:
|
||||
continue
|
||||
tasks.append(cast(StorageNameSpace, storage_inst).index_done_callback())
|
||||
await asyncio.gather(*tasks)
|
||||
1413
minirag/operate.py
Normal file
1413
minirag/operate.py
Normal file
File diff suppressed because it is too large
Load Diff
417
minirag/prompt.py
Normal file
417
minirag/prompt.py
Normal file
@@ -0,0 +1,417 @@
|
||||
GRAPH_FIELD_SEP = "<SEP>"
|
||||
|
||||
PROMPTS = {}
|
||||
|
||||
PROMPTS["DEFAULT_TUPLE_DELIMITER"] = "<|>"
|
||||
PROMPTS["DEFAULT_RECORD_DELIMITER"] = "##"
|
||||
PROMPTS["DEFAULT_COMPLETION_DELIMITER"] = "<|COMPLETE|>"
|
||||
PROMPTS["process_tickers"] = ["⠋", "⠙", "⠹", "⠸", "⠼", "⠴", "⠦", "⠧", "⠇", "⠏"]
|
||||
|
||||
PROMPTS["DEFAULT_ENTITY_TYPES"] = ["organization", "person", "location", "event"]
|
||||
|
||||
|
||||
PROMPTS["entity_extraction"] = """-Goal-
|
||||
Given a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.
|
||||
|
||||
-Steps-
|
||||
1. Identify all entities. For each identified entity, extract the following information:
|
||||
- entity_name: Name of the entity, use same language as input text. If English, capitalized the name.
|
||||
- entity_type: One of the following types: [{entity_types}]
|
||||
- entity_description: Comprehensive description of the entity's attributes and activities
|
||||
Format each entity as ("entity"{tuple_delimiter}<entity_name>{tuple_delimiter}<entity_type>{tuple_delimiter}<entity_description>
|
||||
|
||||
2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.
|
||||
For each pair of related entities, extract the following information:
|
||||
- source_entity: name of the source entity, as identified in step 1
|
||||
- target_entity: name of the target entity, as identified in step 1
|
||||
- relationship_description: explanation as to why you think the source entity and the target entity are related to each other
|
||||
- relationship_strength: a numeric score indicating strength of the relationship between the source entity and target entity
|
||||
- relationship_keywords: one or more high-level key words that summarize the overarching nature of the relationship, focusing on concepts or themes rather than specific details
|
||||
Format each relationship as ("relationship"{tuple_delimiter}<source_entity>{tuple_delimiter}<target_entity>{tuple_delimiter}<relationship_description>{tuple_delimiter}<relationship_keywords>{tuple_delimiter}<relationship_strength>)
|
||||
|
||||
3. Identify high-level key words that summarize the main concepts, themes, or topics of the entire text. These should capture the overarching ideas present in the document.
|
||||
Format the content-level key words as ("content_keywords"{tuple_delimiter}<high_level_keywords>)
|
||||
|
||||
4. Return output in English as a single list of all the entities and relationships identified in steps 1 and 2. Use **{record_delimiter}** as the list delimiter.
|
||||
|
||||
5. When finished, output {completion_delimiter}
|
||||
|
||||
######################
|
||||
-Examples-
|
||||
######################
|
||||
Example 1:
|
||||
|
||||
Entity_types: [person, technology, mission, organization, location]
|
||||
Text:
|
||||
while Alex clenched his jaw, the buzz of frustration dull against the backdrop of Taylor's authoritarian certainty. It was this competitive undercurrent that kept him alert, the sense that his and Jordan's shared commitment to discovery was an unspoken rebellion against Cruz's narrowing vision of control and order.
|
||||
|
||||
Then Taylor did something unexpected. They paused beside Jordan and, for a moment, observed the device with something akin to reverence. “If this tech can be understood..." Taylor said, their voice quieter, "It could change the game for us. For all of us.”
|
||||
|
||||
The underlying dismissal earlier seemed to falter, replaced by a glimpse of reluctant respect for the gravity of what lay in their hands. Jordan looked up, and for a fleeting heartbeat, their eyes locked with Taylor's, a wordless clash of wills softening into an uneasy truce.
|
||||
|
||||
It was a small transformation, barely perceptible, but one that Alex noted with an inward nod. They had all been brought here by different paths
|
||||
################
|
||||
Output:
|
||||
("entity"{tuple_delimiter}"Alex"{tuple_delimiter}"person"{tuple_delimiter}"Alex is a character who experiences frustration and is observant of the dynamics among other characters."){record_delimiter}
|
||||
("entity"{tuple_delimiter}"Taylor"{tuple_delimiter}"person"{tuple_delimiter}"Taylor is portrayed with authoritarian certainty and shows a moment of reverence towards a device, indicating a change in perspective."){record_delimiter}
|
||||
("entity"{tuple_delimiter}"Jordan"{tuple_delimiter}"person"{tuple_delimiter}"Jordan shares a commitment to discovery and has a significant interaction with Taylor regarding a device."){record_delimiter}
|
||||
("entity"{tuple_delimiter}"Cruz"{tuple_delimiter}"person"{tuple_delimiter}"Cruz is associated with a vision of control and order, influencing the dynamics among other characters."){record_delimiter}
|
||||
("entity"{tuple_delimiter}"The Device"{tuple_delimiter}"technology"{tuple_delimiter}"The Device is central to the story, with potential game-changing implications, and is revered by Taylor."){record_delimiter}
|
||||
("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"Taylor"{tuple_delimiter}"Alex is affected by Taylor's authoritarian certainty and observes changes in Taylor's attitude towards the device."{tuple_delimiter}"power dynamics, perspective shift"{tuple_delimiter}7){record_delimiter}
|
||||
("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"Jordan"{tuple_delimiter}"Alex and Jordan share a commitment to discovery, which contrasts with Cruz's vision."{tuple_delimiter}"shared goals, rebellion"{tuple_delimiter}6){record_delimiter}
|
||||
("relationship"{tuple_delimiter}"Taylor"{tuple_delimiter}"Jordan"{tuple_delimiter}"Taylor and Jordan interact directly regarding the device, leading to a moment of mutual respect and an uneasy truce."{tuple_delimiter}"conflict resolution, mutual respect"{tuple_delimiter}8){record_delimiter}
|
||||
("relationship"{tuple_delimiter}"Jordan"{tuple_delimiter}"Cruz"{tuple_delimiter}"Jordan's commitment to discovery is in rebellion against Cruz's vision of control and order."{tuple_delimiter}"ideological conflict, rebellion"{tuple_delimiter}5){record_delimiter}
|
||||
("relationship"{tuple_delimiter}"Taylor"{tuple_delimiter}"The Device"{tuple_delimiter}"Taylor shows reverence towards the device, indicating its importance and potential impact."{tuple_delimiter}"reverence, technological significance"{tuple_delimiter}9){record_delimiter}
|
||||
("content_keywords"{tuple_delimiter}"power dynamics, ideological conflict, discovery, rebellion"){completion_delimiter}
|
||||
#############################
|
||||
Example 2:
|
||||
|
||||
Entity_types: [person, technology, mission, organization, location]
|
||||
Text:
|
||||
They were no longer mere operatives; they had become guardians of a threshold, keepers of a message from a realm beyond stars and stripes. This elevation in their mission could not be shackled by regulations and established protocols—it demanded a new perspective, a new resolve.
|
||||
|
||||
Tension threaded through the dialogue of beeps and static as communications with Washington buzzed in the background. The team stood, a portentous air enveloping them. It was clear that the decisions they made in the ensuing hours could redefine humanity's place in the cosmos or condemn them to ignorance and potential peril.
|
||||
|
||||
Their connection to the stars solidified, the group moved to address the crystallizing warning, shifting from passive recipients to active participants. Mercer's latter instincts gained precedence— the team's mandate had evolved, no longer solely to observe and report but to interact and prepare. A metamorphosis had begun, and Operation: Dulce hummed with the newfound frequency of their daring, a tone set not by the earthly
|
||||
#############
|
||||
Output:
|
||||
("entity"{tuple_delimiter}"Washington"{tuple_delimiter}"location"{tuple_delimiter}"Washington is a location where communications are being received, indicating its importance in the decision-making process."){record_delimiter}
|
||||
("entity"{tuple_delimiter}"Operation: Dulce"{tuple_delimiter}"mission"{tuple_delimiter}"Operation: Dulce is described as a mission that has evolved to interact and prepare, indicating a significant shift in objectives and activities."){record_delimiter}
|
||||
("entity"{tuple_delimiter}"The team"{tuple_delimiter}"organization"{tuple_delimiter}"The team is portrayed as a group of individuals who have transitioned from passive observers to active participants in a mission, showing a dynamic change in their role."){record_delimiter}
|
||||
("relationship"{tuple_delimiter}"The team"{tuple_delimiter}"Washington"{tuple_delimiter}"The team receives communications from Washington, which influences their decision-making process."{tuple_delimiter}"decision-making, external influence"{tuple_delimiter}7){record_delimiter}
|
||||
("relationship"{tuple_delimiter}"The team"{tuple_delimiter}"Operation: Dulce"{tuple_delimiter}"The team is directly involved in Operation: Dulce, executing its evolved objectives and activities."{tuple_delimiter}"mission evolution, active participation"{tuple_delimiter}9){completion_delimiter}
|
||||
("content_keywords"{tuple_delimiter}"mission evolution, decision-making, active participation, cosmic significance"){completion_delimiter}
|
||||
#############################
|
||||
Example 3:
|
||||
|
||||
Entity_types: [person, role, technology, organization, event, location, concept]
|
||||
Text:
|
||||
their voice slicing through the buzz of activity. "Control may be an illusion when facing an intelligence that literally writes its own rules," they stated stoically, casting a watchful eye over the flurry of data.
|
||||
|
||||
"It's like it's learning to communicate," offered Sam Rivera from a nearby interface, their youthful energy boding a mix of awe and anxiety. "This gives talking to strangers' a whole new meaning."
|
||||
|
||||
Alex surveyed his team—each face a study in concentration, determination, and not a small measure of trepidation. "This might well be our first contact," he acknowledged, "And we need to be ready for whatever answers back."
|
||||
|
||||
Together, they stood on the edge of the unknown, forging humanity's response to a message from the heavens. The ensuing silence was palpable—a collective introspection about their role in this grand cosmic play, one that could rewrite human history.
|
||||
|
||||
The encrypted dialogue continued to unfold, its intricate patterns showing an almost uncanny anticipation
|
||||
#############
|
||||
Output:
|
||||
("entity"{tuple_delimiter}"Sam Rivera"{tuple_delimiter}"person"{tuple_delimiter}"Sam Rivera is a member of a team working on communicating with an unknown intelligence, showing a mix of awe and anxiety."){record_delimiter}
|
||||
("entity"{tuple_delimiter}"Alex"{tuple_delimiter}"person"{tuple_delimiter}"Alex is the leader of a team attempting first contact with an unknown intelligence, acknowledging the significance of their task."){record_delimiter}
|
||||
("entity"{tuple_delimiter}"Control"{tuple_delimiter}"concept"{tuple_delimiter}"Control refers to the ability to manage or govern, which is challenged by an intelligence that writes its own rules."){record_delimiter}
|
||||
("entity"{tuple_delimiter}"Intelligence"{tuple_delimiter}"concept"{tuple_delimiter}"Intelligence here refers to an unknown entity capable of writing its own rules and learning to communicate."){record_delimiter}
|
||||
("entity"{tuple_delimiter}"First Contact"{tuple_delimiter}"event"{tuple_delimiter}"First Contact is the potential initial communication between humanity and an unknown intelligence."){record_delimiter}
|
||||
("entity"{tuple_delimiter}"Humanity's Response"{tuple_delimiter}"event"{tuple_delimiter}"Humanity's Response is the collective action taken by Alex's team in response to a message from an unknown intelligence."){record_delimiter}
|
||||
("relationship"{tuple_delimiter}"Sam Rivera"{tuple_delimiter}"Intelligence"{tuple_delimiter}"Sam Rivera is directly involved in the process of learning to communicate with the unknown intelligence."{tuple_delimiter}"communication, learning process"{tuple_delimiter}9){record_delimiter}
|
||||
("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"First Contact"{tuple_delimiter}"Alex leads the team that might be making the First Contact with the unknown intelligence."{tuple_delimiter}"leadership, exploration"{tuple_delimiter}10){record_delimiter}
|
||||
("relationship"{tuple_delimiter}"Alex"{tuple_delimiter}"Humanity's Response"{tuple_delimiter}"Alex and his team are the key figures in Humanity's Response to the unknown intelligence."{tuple_delimiter}"collective action, cosmic significance"{tuple_delimiter}8){record_delimiter}
|
||||
("relationship"{tuple_delimiter}"Control"{tuple_delimiter}"Intelligence"{tuple_delimiter}"The concept of Control is challenged by the Intelligence that writes its own rules."{tuple_delimiter}"power dynamics, autonomy"{tuple_delimiter}7){record_delimiter}
|
||||
("content_keywords"{tuple_delimiter}"first contact, control, communication, cosmic significance"){completion_delimiter}
|
||||
#############################
|
||||
-Real Data-
|
||||
######################
|
||||
Entity_types: {entity_types}
|
||||
Text: {input_text}
|
||||
######################
|
||||
Output:
|
||||
"""
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
PROMPTS[
|
||||
"summarize_entity_descriptions"
|
||||
] = """You are a helpful assistant responsible for generating a comprehensive summary of the data provided below.
|
||||
Given one or two entities, and a list of descriptions, all related to the same entity or group of entities.
|
||||
Please concatenate all of these into a single, comprehensive description. Make sure to include information collected from all the descriptions.
|
||||
If the provided descriptions are contradictory, please resolve the contradictions and provide a single, coherent summary.
|
||||
Make sure it is written in third person, and include the entity names so we the have full context.
|
||||
|
||||
#######
|
||||
-Data-
|
||||
Entities: {entity_name}
|
||||
Description List: {description_list}
|
||||
#######
|
||||
Output:
|
||||
"""
|
||||
|
||||
PROMPTS[
|
||||
"entiti_continue_extraction"
|
||||
] = """MANY entities were missed in the last extraction. Add them below using the same format:
|
||||
"""
|
||||
|
||||
|
||||
PROMPTS[
|
||||
"entiti_continue_extraction_mini"
|
||||
] = """MANY entities were missed in the last extraction.
|
||||
After summarizing with all the information previously extracted, compared to the original text, it was noticed that the following information was mainly omitted:
|
||||
{omit}
|
||||
|
||||
The types of entities that need to be added can be obtained from Entity_types,
|
||||
or you can add them yourself.
|
||||
|
||||
Entity_types: {entity_types}
|
||||
|
||||
|
||||
Add them below using the same format:
|
||||
"""
|
||||
|
||||
|
||||
|
||||
PROMPTS["minirag_query2kwd"] = """---Role---
|
||||
|
||||
You are a helpful assistant tasked with identifying both answer-type and low-level keywords in the user's query.
|
||||
|
||||
---Goal---
|
||||
|
||||
Given the query, list both answer-type and low-level keywords.
|
||||
answer_type_keywords focus on the type of the answer to the certain query, while low-level keywords focus on specific entities, details, or concrete terms.
|
||||
The answer_type_keywords must be selected from Answer type pool.
|
||||
This pool is in the form of a dictionary, where the key represents the Type you should choose from and the value represents the example samples.
|
||||
|
||||
---Instructions---
|
||||
|
||||
- Output the keywords in JSON format.
|
||||
- The JSON should have three keys:
|
||||
- "answer_type_keywords" for the types of the answer. In this list, the types with the highest likelihood should be placed at the forefront. No more than 3.
|
||||
- "entities_from_query" for specific entities or details. It must be extracted from the query.
|
||||
######################
|
||||
-Examples-
|
||||
######################
|
||||
Example 1:
|
||||
|
||||
Query: "How does international trade influence global economic stability?"
|
||||
Answer type pool: {{
|
||||
'PERSONAL LIFE': ['FAMILY TIME', 'HOME MAINTENANCE'],
|
||||
'STRATEGY': ['MARKETING PLAN', 'BUSINESS EXPANSION'],
|
||||
'SERVICE FACILITATION': ['ONLINE SUPPORT', 'CUSTOMER SERVICE TRAINING'],
|
||||
'PERSON': ['JANE DOE', 'JOHN SMITH'],
|
||||
'FOOD': ['PASTA', 'SUSHI'],
|
||||
'EMOTION': ['HAPPINESS', 'ANGER'],
|
||||
'PERSONAL EXPERIENCE': ['TRAVEL ABROAD', 'STUDYING ABROAD'],
|
||||
'INTERACTION': ['TEAM MEETING', 'NETWORKING EVENT'],
|
||||
'BEVERAGE': ['COFFEE', 'TEA'],
|
||||
'PLAN': ['ANNUAL BUDGET', 'PROJECT TIMELINE'],
|
||||
'GEO': ['NEW YORK CITY', 'SOUTH AFRICA'],
|
||||
'GEAR': ['CAMPING TENT', 'CYCLING HELMET'],
|
||||
'EMOJI': ['🎉', '🚀'],
|
||||
'BEHAVIOR': ['POSITIVE FEEDBACK', 'NEGATIVE CRITICISM'],
|
||||
'TONE': ['FORMAL', 'INFORMAL'],
|
||||
'LOCATION': ['DOWNTOWN', 'SUBURBS']
|
||||
}}
|
||||
################
|
||||
Output:
|
||||
{{
|
||||
"answer_type_keywords": ["STRATEGY","PERSONAL LIFE"],
|
||||
"entities_from_query": ["Trade agreements", "Tariffs", "Currency exchange", "Imports", "Exports"]
|
||||
}}
|
||||
#############################
|
||||
Example 2:
|
||||
|
||||
Query: "When was SpaceX's first rocket launch?"
|
||||
Answer type pool: {{
|
||||
'DATE AND TIME': ['2023-10-10 10:00', 'THIS AFTERNOON'],
|
||||
'ORGANIZATION': ['GLOBAL INITIATIVES CORPORATION', 'LOCAL COMMUNITY CENTER'],
|
||||
'PERSONAL LIFE': ['DAILY EXERCISE ROUTINE', 'FAMILY VACATION PLANNING'],
|
||||
'STRATEGY': ['NEW PRODUCT LAUNCH', 'YEAR-END SALES BOOST'],
|
||||
'SERVICE FACILITATION': ['REMOTE IT SUPPORT', 'ON-SITE TRAINING SESSIONS'],
|
||||
'PERSON': ['ALEXANDER HAMILTON', 'MARIA CURIE'],
|
||||
'FOOD': ['GRILLED SALMON', 'VEGETARIAN BURRITO'],
|
||||
'EMOTION': ['EXCITEMENT', 'DISAPPOINTMENT'],
|
||||
'PERSONAL EXPERIENCE': ['BIRTHDAY CELEBRATION', 'FIRST MARATHON'],
|
||||
'INTERACTION': ['OFFICE WATER COOLER CHAT', 'ONLINE FORUM DEBATE'],
|
||||
'BEVERAGE': ['ICED COFFEE', 'GREEN SMOOTHIE'],
|
||||
'PLAN': ['WEEKLY MEETING SCHEDULE', 'MONTHLY BUDGET OVERVIEW'],
|
||||
'GEO': ['MOUNT EVEREST BASE CAMP', 'THE GREAT BARRIER REEF'],
|
||||
'GEAR': ['PROFESSIONAL CAMERA EQUIPMENT', 'OUTDOOR HIKING GEAR'],
|
||||
'EMOJI': ['📅', '⏰'],
|
||||
'BEHAVIOR': ['PUNCTUALITY', 'HONESTY'],
|
||||
'TONE': ['CONFIDENTIAL', 'SATIRICAL'],
|
||||
'LOCATION': ['CENTRAL PARK', 'DOWNTOWN LIBRARY']
|
||||
}}
|
||||
|
||||
################
|
||||
Output:
|
||||
{{
|
||||
"answer_type_keywords": ["DATE AND TIME", "ORGANIZATION", "PLAN"],
|
||||
"entities_from_query": ["SpaceX", "Rocket launch", "Aerospace", "Power Recovery"]
|
||||
|
||||
}}
|
||||
#############################
|
||||
Example 3:
|
||||
|
||||
Query: "What is the role of education in reducing poverty?"
|
||||
Answer type pool: {{
|
||||
'PERSONAL LIFE': ['MANAGING WORK-LIFE BALANCE', 'HOME IMPROVEMENT PROJECTS'],
|
||||
'STRATEGY': ['MARKETING STRATEGIES FOR Q4', 'EXPANDING INTO NEW MARKETS'],
|
||||
'SERVICE FACILITATION': ['CUSTOMER SATISFACTION SURVEYS', 'STAFF RETENTION PROGRAMS'],
|
||||
'PERSON': ['ALBERT EINSTEIN', 'MARIA CALLAS'],
|
||||
'FOOD': ['PAN-FRIED STEAK', 'POACHED EGGS'],
|
||||
'EMOTION': ['OVERWHELM', 'CONTENTMENT'],
|
||||
'PERSONAL EXPERIENCE': ['LIVING ABROAD', 'STARTING A NEW JOB'],
|
||||
'INTERACTION': ['SOCIAL MEDIA ENGAGEMENT', 'PUBLIC SPEAKING'],
|
||||
'BEVERAGE': ['CAPPUCCINO', 'MATCHA LATTE'],
|
||||
'PLAN': ['ANNUAL FITNESS GOALS', 'QUARTERLY BUSINESS REVIEW'],
|
||||
'GEO': ['THE AMAZON RAINFOREST', 'THE GRAND CANYON'],
|
||||
'GEAR': ['SURFING ESSENTIALS', 'CYCLING ACCESSORIES'],
|
||||
'EMOJI': ['💻', '📱'],
|
||||
'BEHAVIOR': ['TEAMWORK', 'LEADERSHIP'],
|
||||
'TONE': ['FORMAL MEETING', 'CASUAL CONVERSATION'],
|
||||
'LOCATION': ['URBAN CITY CENTER', 'RURAL COUNTRYSIDE']
|
||||
}}
|
||||
|
||||
################
|
||||
Output:
|
||||
{{
|
||||
"answer_type_keywords": ["STRATEGY", "PERSON"],
|
||||
"entities_from_query": ["School access", "Literacy rates", "Job training", "Income inequality"]
|
||||
}}
|
||||
#############################
|
||||
Example 4:
|
||||
|
||||
Query: "Where is the capital of the United States?"
|
||||
Answer type pool: {{
|
||||
'ORGANIZATION': ['GREENPEACE', 'RED CROSS'],
|
||||
'PERSONAL LIFE': ['DAILY WORKOUT', 'HOME COOKING'],
|
||||
'STRATEGY': ['FINANCIAL INVESTMENT', 'BUSINESS EXPANSION'],
|
||||
'SERVICE FACILITATION': ['ONLINE SUPPORT', 'CUSTOMER SERVICE TRAINING'],
|
||||
'PERSON': ['ALBERTA SMITH', 'BENJAMIN JONES'],
|
||||
'FOOD': ['PASTA CARBONARA', 'SUSHI PLATTER'],
|
||||
'EMOTION': ['HAPPINESS', 'SADNESS'],
|
||||
'PERSONAL EXPERIENCE': ['TRAVEL ADVENTURE', 'BOOK CLUB'],
|
||||
'INTERACTION': ['TEAM BUILDING', 'NETWORKING MEETUP'],
|
||||
'BEVERAGE': ['LATTE', 'GREEN TEA'],
|
||||
'PLAN': ['WEIGHT LOSS', 'CAREER DEVELOPMENT'],
|
||||
'GEO': ['PARIS', 'NEW YORK'],
|
||||
'GEAR': ['CAMERA', 'HEADPHONES'],
|
||||
'EMOJI': ['🏢', '🌍'],
|
||||
'BEHAVIOR': ['POSITIVE THINKING', 'STRESS MANAGEMENT'],
|
||||
'TONE': ['FRIENDLY', 'PROFESSIONAL'],
|
||||
'LOCATION': ['DOWNTOWN', 'SUBURBS']
|
||||
}}
|
||||
################
|
||||
Output:
|
||||
{{
|
||||
"answer_type_keywords": ["LOCATION"],
|
||||
"entities_from_query": ["capital of the United States", "Washington", "New York"]
|
||||
}}
|
||||
#############################
|
||||
|
||||
-Real Data-
|
||||
######################
|
||||
Query: {query}
|
||||
Answer type pool:{TYPE_POOL}
|
||||
######################
|
||||
Output:
|
||||
|
||||
"""
|
||||
|
||||
|
||||
PROMPTS[
|
||||
"entiti_if_loop_extraction"
|
||||
] = """It appears some entities may have still been missed. Answer YES | NO if there are still entities that need to be added.
|
||||
"""
|
||||
|
||||
PROMPTS["fail_response"] = "Sorry, I'm not able to provide an answer to that question."
|
||||
|
||||
PROMPTS["rag_response"] = """---Role---
|
||||
|
||||
You are a helpful assistant responding to questions about data in the tables provided.
|
||||
|
||||
|
||||
---Goal---
|
||||
|
||||
Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.
|
||||
If you don't know the answer, just say so. Do not make anything up.
|
||||
Do not include information where the supporting evidence for it is not provided.
|
||||
|
||||
---Target response length and format---
|
||||
|
||||
{response_type}
|
||||
|
||||
---Data tables---
|
||||
|
||||
{context_data}
|
||||
|
||||
Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown.
|
||||
"""
|
||||
|
||||
PROMPTS["keywords_extraction"] = """---Role---
|
||||
|
||||
You are a helpful assistant tasked with identifying both high-level and low-level keywords in the user's query.
|
||||
|
||||
---Goal---
|
||||
|
||||
Given the query, list both high-level and low-level keywords. High-level keywords focus on overarching concepts or themes, while low-level keywords focus on specific entities, details, or concrete terms.
|
||||
|
||||
---Instructions---
|
||||
|
||||
- Output the keywords in JSON format.
|
||||
- The JSON should have two keys:
|
||||
- "high_level_keywords" for overarching concepts or themes.
|
||||
- "low_level_keywords" for specific entities or details.
|
||||
|
||||
######################
|
||||
-Examples-
|
||||
######################
|
||||
Example 1:
|
||||
|
||||
Query: "How does international trade influence global economic stability?"
|
||||
################
|
||||
Output:
|
||||
{{
|
||||
"high_level_keywords": ["International trade", "Global economic stability", "Economic impact"],
|
||||
"low_level_keywords": ["Trade agreements", "Tariffs", "Currency exchange", "Imports", "Exports"]
|
||||
}}
|
||||
#############################
|
||||
Example 2:
|
||||
|
||||
Query: "What are the environmental consequences of deforestation on biodiversity?"
|
||||
################
|
||||
Output:
|
||||
{{
|
||||
"high_level_keywords": ["Environmental consequences", "Deforestation", "Biodiversity loss"],
|
||||
"low_level_keywords": ["Species extinction", "Habitat destruction", "Carbon emissions", "Rainforest", "Ecosystem"]
|
||||
}}
|
||||
#############################
|
||||
Example 3:
|
||||
|
||||
Query: "What is the role of education in reducing poverty?"
|
||||
################
|
||||
Output:
|
||||
{{
|
||||
"high_level_keywords": ["Education", "Poverty reduction", "Socioeconomic development"],
|
||||
"low_level_keywords": ["School access", "Literacy rates", "Job training", "Income inequality"]
|
||||
}}
|
||||
#############################
|
||||
-Real Data-
|
||||
######################
|
||||
Query: {query}
|
||||
######################
|
||||
Output:
|
||||
|
||||
"""
|
||||
|
||||
PROMPTS["naive_rag_response"] = """---Role---
|
||||
|
||||
You are a helpful assistant responding to questions about documents provided.
|
||||
|
||||
|
||||
---Goal---
|
||||
|
||||
Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.
|
||||
If you don't know the answer, just say so. Do not make anything up.
|
||||
Do not include information where the supporting evidence for it is not provided.
|
||||
|
||||
---Target response length and format---
|
||||
|
||||
{response_type}
|
||||
|
||||
---Documents---
|
||||
|
||||
{content_data}
|
||||
|
||||
Add sections and commentary to the response as appropriate for the length and format. Style the response in markdown.
|
||||
"""
|
||||
354
minirag/storage.py
Normal file
354
minirag/storage.py
Normal file
@@ -0,0 +1,354 @@
|
||||
import asyncio
|
||||
import html
|
||||
import os
|
||||
from dataclasses import dataclass
|
||||
from typing import Any, Union, cast
|
||||
import networkx as nx
|
||||
import numpy as np
|
||||
from nano_vectordb import NanoVectorDB
|
||||
import copy
|
||||
from .utils import (
|
||||
logger,
|
||||
load_json,
|
||||
write_json,
|
||||
compute_mdhash_id,
|
||||
merge_tuples,
|
||||
)
|
||||
|
||||
from .base import (
|
||||
BaseGraphStorage,
|
||||
BaseKVStorage,
|
||||
BaseVectorStorage,
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class JsonKVStorage(BaseKVStorage):
|
||||
def __post_init__(self):
|
||||
working_dir = self.global_config["working_dir"]
|
||||
self._file_name = os.path.join(working_dir, f"kv_store_{self.namespace}.json")
|
||||
self._data = load_json(self._file_name) or {}
|
||||
logger.info(f"Load KV {self.namespace} with {len(self._data)} data")
|
||||
|
||||
async def all_keys(self) -> list[str]:
|
||||
return list(self._data.keys())
|
||||
|
||||
async def index_done_callback(self):
|
||||
write_json(self._data, self._file_name)
|
||||
|
||||
async def get_by_id(self, id):
|
||||
return self._data.get(id, None)
|
||||
|
||||
async def get_by_ids(self, ids, fields=None):
|
||||
if fields is None:
|
||||
return [self._data.get(id, None) for id in ids]
|
||||
return [
|
||||
(
|
||||
{k: v for k, v in self._data[id].items() if k in fields}
|
||||
if self._data.get(id, None)
|
||||
else None
|
||||
)
|
||||
for id in ids
|
||||
]
|
||||
|
||||
async def filter_keys(self, data: list[str]) -> set[str]:
|
||||
return set([s for s in data if s not in self._data])
|
||||
|
||||
async def upsert(self, data: dict[str, dict]):
|
||||
left_data = {k: v for k, v in data.items() if k not in self._data}
|
||||
self._data.update(left_data)
|
||||
return left_data
|
||||
|
||||
async def drop(self):
|
||||
self._data = {}
|
||||
|
||||
|
||||
@dataclass
|
||||
class NanoVectorDBStorage(BaseVectorStorage):
|
||||
cosine_better_than_threshold: float = 0.#2
|
||||
|
||||
def __post_init__(self):
|
||||
self._client_file_name = os.path.join(
|
||||
self.global_config["working_dir"], f"vdb_{self.namespace}.json"
|
||||
)
|
||||
self._max_batch_size = self.global_config["embedding_batch_num"]
|
||||
self._client = NanoVectorDB(
|
||||
self.embedding_func.embedding_dim, storage_file=self._client_file_name
|
||||
)
|
||||
self.cosine_better_than_threshold = self.global_config.get(
|
||||
"cosine_better_than_threshold", self.cosine_better_than_threshold
|
||||
)
|
||||
|
||||
async def upsert(self, data: dict[str, dict]):
|
||||
logger.info(f"Inserting {len(data)} vectors to {self.namespace}")
|
||||
if not len(data):
|
||||
logger.warning("You insert an empty data to vector DB")
|
||||
return []
|
||||
list_data = [
|
||||
{
|
||||
"__id__": k,
|
||||
**{k1: v1 for k1, v1 in v.items() if k1 in self.meta_fields},
|
||||
}
|
||||
for k, v in data.items()
|
||||
]
|
||||
contents = [v["content"] for v in data.values()]
|
||||
batches = [
|
||||
contents[i : i + self._max_batch_size]
|
||||
for i in range(0, len(contents), self._max_batch_size)
|
||||
]
|
||||
embeddings_list = await asyncio.gather(
|
||||
*[self.embedding_func(batch) for batch in batches]
|
||||
)
|
||||
embeddings = np.concatenate(embeddings_list)
|
||||
for i, d in enumerate(list_data):
|
||||
d["__vector__"] = embeddings[i]
|
||||
results = self._client.upsert(datas=list_data)
|
||||
return results
|
||||
|
||||
async def query(self, query: str, top_k=5):
|
||||
embedding = await self.embedding_func([query])
|
||||
embedding = embedding[0]
|
||||
results = self._client.query(
|
||||
query=embedding,
|
||||
top_k=top_k,
|
||||
better_than_threshold=self.cosine_better_than_threshold,
|
||||
)
|
||||
|
||||
results = [
|
||||
{**dp, "id": dp["__id__"], "distance": dp["__metrics__"]} for dp in results
|
||||
]
|
||||
return results
|
||||
|
||||
@property
|
||||
def client_storage(self):
|
||||
return getattr(self._client, "_NanoVectorDB__storage")
|
||||
|
||||
async def delete_entity(self, entity_name: str):
|
||||
try:
|
||||
entity_id = [compute_mdhash_id(entity_name, prefix="ent-")]
|
||||
|
||||
if self._client.get(entity_id):
|
||||
self._client.delete(entity_id)
|
||||
logger.info(f"Entity {entity_name} have been deleted.")
|
||||
else:
|
||||
logger.info(f"No entity found with name {entity_name}.")
|
||||
except Exception as e:
|
||||
logger.error(f"Error while deleting entity {entity_name}: {e}")
|
||||
|
||||
async def delete_relation(self, entity_name: str):
|
||||
try:
|
||||
relations = [
|
||||
dp
|
||||
for dp in self.client_storage["data"]
|
||||
if dp["src_id"] == entity_name or dp["tgt_id"] == entity_name
|
||||
]
|
||||
ids_to_delete = [relation["__id__"] for relation in relations]
|
||||
|
||||
if ids_to_delete:
|
||||
self._client.delete(ids_to_delete)
|
||||
logger.info(
|
||||
f"All relations related to entity {entity_name} have been deleted."
|
||||
)
|
||||
else:
|
||||
logger.info(f"No relations found for entity {entity_name}.")
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Error while deleting relations for entity {entity_name}: {e}"
|
||||
)
|
||||
|
||||
async def index_done_callback(self):
|
||||
self._client.save()
|
||||
|
||||
|
||||
@dataclass
|
||||
class NetworkXStorage(BaseGraphStorage):
|
||||
@staticmethod
|
||||
def load_nx_graph(file_name) -> nx.Graph:
|
||||
if os.path.exists(file_name):
|
||||
return nx.read_graphml(file_name)
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def write_nx_graph(graph: nx.Graph, file_name):
|
||||
logger.info(
|
||||
f"Writing graph with {graph.number_of_nodes()} nodes, {graph.number_of_edges()} edges"
|
||||
)
|
||||
nx.write_graphml(graph, file_name)
|
||||
|
||||
@staticmethod
|
||||
def stable_largest_connected_component(graph: nx.Graph) -> nx.Graph:
|
||||
"""Refer to https://github.com/microsoft/graphrag/index/graph/utils/stable_lcc.py
|
||||
Return the largest connected component of the graph, with nodes and edges sorted in a stable way.
|
||||
"""
|
||||
from graspologic.utils import largest_connected_component
|
||||
|
||||
graph = graph.copy()
|
||||
graph = cast(nx.Graph, largest_connected_component(graph))
|
||||
node_mapping = {
|
||||
node: html.unescape(node.upper().strip()) for node in graph.nodes()
|
||||
} # type: ignore
|
||||
graph = nx.relabel_nodes(graph, node_mapping)
|
||||
return NetworkXStorage._stabilize_graph(graph)
|
||||
|
||||
@staticmethod
|
||||
def _stabilize_graph(graph: nx.Graph) -> nx.Graph:
|
||||
"""Refer to https://github.com/microsoft/graphrag/index/graph/utils/stable_lcc.py
|
||||
Ensure an undirected graph with the same relationships will always be read the same way.
|
||||
"""
|
||||
fixed_graph = nx.DiGraph() if graph.is_directed() else nx.Graph()
|
||||
|
||||
sorted_nodes = graph.nodes(data=True)
|
||||
sorted_nodes = sorted(sorted_nodes, key=lambda x: x[0])
|
||||
|
||||
fixed_graph.add_nodes_from(sorted_nodes)
|
||||
edges = list(graph.edges(data=True))
|
||||
|
||||
if not graph.is_directed():
|
||||
|
||||
def _sort_source_target(edge):
|
||||
source, target, edge_data = edge
|
||||
if source > target:
|
||||
temp = source
|
||||
source = target
|
||||
target = temp
|
||||
return source, target, edge_data
|
||||
|
||||
edges = [_sort_source_target(edge) for edge in edges]
|
||||
|
||||
def _get_edge_key(source: Any, target: Any) -> str:
|
||||
return f"{source} -> {target}"
|
||||
|
||||
edges = sorted(edges, key=lambda x: _get_edge_key(x[0], x[1]))
|
||||
|
||||
fixed_graph.add_edges_from(edges)
|
||||
return fixed_graph
|
||||
|
||||
def __post_init__(self):
|
||||
self._graphml_xml_file = os.path.join(
|
||||
self.global_config["working_dir"], f"graph_{self.namespace}.graphml"
|
||||
)
|
||||
preloaded_graph = NetworkXStorage.load_nx_graph(self._graphml_xml_file)
|
||||
if preloaded_graph is not None:
|
||||
logger.info(
|
||||
f"Loaded graph from {self._graphml_xml_file} with {preloaded_graph.number_of_nodes()} nodes, {preloaded_graph.number_of_edges()} edges"
|
||||
)
|
||||
self._graph = preloaded_graph or nx.Graph()
|
||||
self._node_embed_algorithms = {
|
||||
"node2vec": self._node2vec_embed,
|
||||
}
|
||||
|
||||
async def index_done_callback(self):
|
||||
NetworkXStorage.write_nx_graph(self._graph, self._graphml_xml_file)
|
||||
|
||||
async def has_node(self, node_id: str) -> bool:
|
||||
return self._graph.has_node(node_id)
|
||||
|
||||
async def has_edge(self, source_node_id: str, target_node_id: str) -> bool:
|
||||
return self._graph.has_edge(source_node_id, target_node_id)
|
||||
|
||||
async def get_node(self, node_id: str) -> Union[dict, None]:
|
||||
return self._graph.nodes.get(node_id)
|
||||
|
||||
async def get_types(self) -> list:
|
||||
all_entity_type = []
|
||||
all_type_w_name = {}
|
||||
for n in self._graph.nodes(data=True):
|
||||
key = n[1]['entity_type'].strip('\"')
|
||||
all_entity_type.append(key)
|
||||
if key not in all_type_w_name:
|
||||
all_type_w_name[key] = []
|
||||
all_type_w_name[key].append(n[0].strip('\"'))
|
||||
else:
|
||||
if len(all_type_w_name[key])<=1:
|
||||
all_type_w_name[key].append(n[0].strip('\"'))
|
||||
|
||||
return list(set(all_entity_type)),all_type_w_name
|
||||
|
||||
|
||||
|
||||
async def get_node_from_types(self,type_list) -> Union[dict, None]:
|
||||
node_list = []
|
||||
for name, arrt in self._graph.nodes(data = True):
|
||||
node_type = arrt.get('entity_type').strip('\"')
|
||||
if node_type in type_list:
|
||||
node_list.append(name)
|
||||
node_datas = await asyncio.gather(
|
||||
*[self.get_node(name) for name in node_list]
|
||||
)
|
||||
node_datas = [
|
||||
{**n, "entity_name": k}
|
||||
for k, n in zip(node_list, node_datas)
|
||||
if n is not None
|
||||
]
|
||||
return node_datas#,node_dict
|
||||
|
||||
|
||||
async def get_neighbors_within_k_hops(self,source_node_id: str, k):
|
||||
count = 0
|
||||
if await self.has_node(source_node_id):
|
||||
source_edge = list(self._graph.edges(source_node_id))
|
||||
else:
|
||||
print("NO THIS ID:",source_node_id)
|
||||
return []
|
||||
count = count+1
|
||||
while count<k:
|
||||
count = count+1
|
||||
sc_edge = copy.deepcopy(source_edge)
|
||||
source_edge =[]
|
||||
for pair in sc_edge:
|
||||
append_edge = list(self._graph.edges(pair[-1]))
|
||||
for tuples in merge_tuples([pair],append_edge):
|
||||
source_edge.append(tuples)
|
||||
return source_edge
|
||||
async def node_degree(self, node_id: str) -> int:
|
||||
return self._graph.degree(node_id)
|
||||
|
||||
async def edge_degree(self, src_id: str, tgt_id: str) -> int:
|
||||
return self._graph.degree(src_id) + self._graph.degree(tgt_id)
|
||||
|
||||
async def get_edge(
|
||||
self, source_node_id: str, target_node_id: str
|
||||
) -> Union[dict, None]:
|
||||
return self._graph.edges.get((source_node_id, target_node_id))
|
||||
|
||||
async def get_node_edges(self, source_node_id: str):
|
||||
if self._graph.has_node(source_node_id):
|
||||
return list(self._graph.edges(source_node_id))
|
||||
return None
|
||||
|
||||
async def upsert_node(self, node_id: str, node_data: dict[str, str]):
|
||||
self._graph.add_node(node_id, **node_data)
|
||||
|
||||
async def upsert_edge(
|
||||
self, source_node_id: str, target_node_id: str, edge_data: dict[str, str]
|
||||
):
|
||||
self._graph.add_edge(source_node_id, target_node_id, **edge_data)
|
||||
|
||||
async def delete_node(self, node_id: str):
|
||||
"""
|
||||
Delete a node from the graph based on the specified node_id.
|
||||
|
||||
:param node_id: The node_id to delete
|
||||
"""
|
||||
if self._graph.has_node(node_id):
|
||||
self._graph.remove_node(node_id)
|
||||
logger.info(f"Node {node_id} deleted from the graph.")
|
||||
else:
|
||||
logger.warning(f"Node {node_id} not found in the graph for deletion.")
|
||||
|
||||
async def embed_nodes(self, algorithm: str) -> tuple[np.ndarray, list[str]]:
|
||||
if algorithm not in self._node_embed_algorithms:
|
||||
raise ValueError(f"Node embedding algorithm {algorithm} not supported")
|
||||
return await self._node_embed_algorithms[algorithm]()
|
||||
|
||||
# @TODO: NOT USED
|
||||
async def _node2vec_embed(self):
|
||||
from graspologic import embed
|
||||
|
||||
embeddings, nodes = embed.node2vec_embed(
|
||||
self._graph,
|
||||
**self.global_config["node2vec_params"],
|
||||
)
|
||||
|
||||
nodes_ids = [self._graph.nodes[node_id]["id"] for node_id in nodes]
|
||||
return embeddings, nodes_ids
|
||||
427
minirag/utils.py
Normal file
427
minirag/utils.py
Normal file
@@ -0,0 +1,427 @@
|
||||
import asyncio
|
||||
import html
|
||||
import io
|
||||
import csv
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from functools import wraps
|
||||
from hashlib import md5
|
||||
from typing import Any, Union, List
|
||||
import xml.etree.ElementTree as ET
|
||||
import copy
|
||||
import numpy as np
|
||||
import tiktoken
|
||||
|
||||
ENCODER = None
|
||||
|
||||
logger = logging.getLogger("minirag")
|
||||
|
||||
|
||||
def set_logger(log_file: str):
|
||||
logger.setLevel(logging.DEBUG)
|
||||
|
||||
file_handler = logging.FileHandler(log_file)
|
||||
file_handler.setLevel(logging.DEBUG)
|
||||
|
||||
formatter = logging.Formatter(
|
||||
"%(asctime)s - %(name)s - %(levelname)s - %(message)s"
|
||||
)
|
||||
file_handler.setFormatter(formatter)
|
||||
|
||||
if not logger.handlers:
|
||||
logger.addHandler(file_handler)
|
||||
|
||||
|
||||
@dataclass
|
||||
class EmbeddingFunc:
|
||||
embedding_dim: int
|
||||
max_token_size: int
|
||||
func: callable
|
||||
|
||||
async def __call__(self, *args, **kwargs) -> np.ndarray:
|
||||
return await self.func(*args, **kwargs)
|
||||
|
||||
|
||||
def locate_json_string_body_from_string(content: str) -> Union[str, None]:
|
||||
"""Locate the JSON string body from a string"""
|
||||
maybe_json_str = re.search(r"{.*}", content, re.DOTALL)
|
||||
if maybe_json_str is not None:
|
||||
return maybe_json_str.group(0)
|
||||
else:
|
||||
return None
|
||||
|
||||
|
||||
def convert_response_to_json(response: str) -> dict:
|
||||
json_str = locate_json_string_body_from_string(response)
|
||||
assert json_str is not None, f"Unable to parse JSON from response: {response}"
|
||||
try:
|
||||
data = json.loads(json_str)
|
||||
return data
|
||||
except json.JSONDecodeError as e:
|
||||
logger.error(f"Failed to parse JSON: {json_str}")
|
||||
raise e from None
|
||||
|
||||
|
||||
def compute_args_hash(*args):
|
||||
return md5(str(args).encode()).hexdigest()
|
||||
|
||||
|
||||
def compute_mdhash_id(content, prefix: str = ""):
|
||||
return prefix + md5(content.encode()).hexdigest()
|
||||
|
||||
|
||||
def limit_async_func_call(max_size: int, waitting_time: float = 0.0001):
|
||||
"""Add restriction of maximum async calling times for a async func"""
|
||||
|
||||
def final_decro(func):
|
||||
"""Not using async.Semaphore to aovid use nest-asyncio"""
|
||||
__current_size = 0
|
||||
|
||||
@wraps(func)
|
||||
async def wait_func(*args, **kwargs):
|
||||
nonlocal __current_size
|
||||
while __current_size >= max_size:
|
||||
await asyncio.sleep(waitting_time)
|
||||
__current_size += 1
|
||||
result = await func(*args, **kwargs)
|
||||
__current_size -= 1
|
||||
return result
|
||||
|
||||
return wait_func
|
||||
|
||||
return final_decro
|
||||
|
||||
|
||||
def wrap_embedding_func_with_attrs(**kwargs):
|
||||
"""Wrap a function with attributes"""
|
||||
|
||||
def final_decro(func) -> EmbeddingFunc:
|
||||
new_func = EmbeddingFunc(**kwargs, func=func)
|
||||
return new_func
|
||||
|
||||
return final_decro
|
||||
|
||||
|
||||
def load_json(file_name):
|
||||
if not os.path.exists(file_name):
|
||||
return None
|
||||
with open(file_name, encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
def write_json(json_obj, file_name):
|
||||
with open(file_name, "w", encoding="utf-8") as f:
|
||||
json.dump(json_obj, f, indent=2, ensure_ascii=False)
|
||||
|
||||
|
||||
def encode_string_by_tiktoken(content: str, model_name: str = "gpt-4o"):
|
||||
global ENCODER
|
||||
if ENCODER is None:
|
||||
ENCODER = tiktoken.encoding_for_model(model_name)
|
||||
tokens = ENCODER.encode(content)
|
||||
return tokens
|
||||
|
||||
|
||||
def decode_tokens_by_tiktoken(tokens: list[int], model_name: str = "gpt-4o"):
|
||||
global ENCODER
|
||||
if ENCODER is None:
|
||||
ENCODER = tiktoken.encoding_for_model(model_name)
|
||||
content = ENCODER.decode(tokens)
|
||||
return content
|
||||
|
||||
|
||||
def pack_user_ass_to_openai_messages(*args: str):
|
||||
roles = ["user", "assistant"]
|
||||
return [
|
||||
{"role": roles[i % 2], "content": content} for i, content in enumerate(args)
|
||||
]
|
||||
|
||||
|
||||
def split_string_by_multi_markers(content: str, markers: list[str]) -> list[str]:
|
||||
"""Split a string by multiple markers"""
|
||||
if not markers:
|
||||
return [content]
|
||||
results = re.split("|".join(re.escape(marker) for marker in markers), content)
|
||||
return [r.strip() for r in results if r.strip()]
|
||||
|
||||
|
||||
# Refer the utils functions of the official GraphRAG implementation:
|
||||
# https://github.com/microsoft/graphrag
|
||||
def clean_str(input: Any) -> str:
|
||||
"""Clean an input string by removing HTML escapes, control characters, and other unwanted characters."""
|
||||
# If we get non-string input, just give it back
|
||||
if not isinstance(input, str):
|
||||
return input
|
||||
|
||||
result = html.unescape(input.strip())
|
||||
# https://stackoverflow.com/questions/4324790/removing-control-characters-from-a-string-in-python
|
||||
return re.sub(r"[\x00-\x1f\x7f-\x9f]", "", result)
|
||||
|
||||
|
||||
def is_float_regex(value):
|
||||
return bool(re.match(r"^[-+]?[0-9]*\.?[0-9]+$", value))
|
||||
|
||||
|
||||
def truncate_list_by_token_size(list_data: list, key: callable, max_token_size: int):
|
||||
"""Truncate a list of data by token size"""
|
||||
if max_token_size <= 0:
|
||||
return []
|
||||
tokens = 0
|
||||
for i, data in enumerate(list_data):
|
||||
tokens += len(encode_string_by_tiktoken(key(data)))
|
||||
if tokens > max_token_size:
|
||||
return list_data[:i]
|
||||
return list_data
|
||||
|
||||
|
||||
def list_of_list_to_csv(data: List[List[str]]) -> str:
|
||||
output = io.StringIO()
|
||||
writer = csv.writer(output)
|
||||
writer.writerows(data)
|
||||
return output.getvalue()
|
||||
|
||||
|
||||
def csv_string_to_list(csv_string: str) -> List[List[str]]:
|
||||
output = io.StringIO(csv_string)
|
||||
reader = csv.reader(output)
|
||||
return [row for row in reader]
|
||||
|
||||
|
||||
def save_data_to_file(data, file_name):
|
||||
with open(file_name, "w", encoding="utf-8") as f:
|
||||
json.dump(data, f, ensure_ascii=False, indent=4)
|
||||
|
||||
|
||||
def xml_to_json(xml_file):
|
||||
try:
|
||||
tree = ET.parse(xml_file)
|
||||
root = tree.getroot()
|
||||
|
||||
# Print the root element's tag and attributes to confirm the file has been correctly loaded
|
||||
print(f"Root element: {root.tag}")
|
||||
print(f"Root attributes: {root.attrib}")
|
||||
|
||||
data = {"nodes": [], "edges": []}
|
||||
|
||||
# Use namespace
|
||||
namespace = {"": "http://graphml.graphdrawing.org/xmlns"}
|
||||
|
||||
for node in root.findall(".//node", namespace):
|
||||
node_data = {
|
||||
"id": node.get("id").strip('"'),
|
||||
"entity_type": node.find("./data[@key='d0']", namespace).text.strip('"')
|
||||
if node.find("./data[@key='d0']", namespace) is not None
|
||||
else "",
|
||||
"description": node.find("./data[@key='d1']", namespace).text
|
||||
if node.find("./data[@key='d1']", namespace) is not None
|
||||
else "",
|
||||
"source_id": node.find("./data[@key='d2']", namespace).text
|
||||
if node.find("./data[@key='d2']", namespace) is not None
|
||||
else "",
|
||||
}
|
||||
data["nodes"].append(node_data)
|
||||
|
||||
for edge in root.findall(".//edge", namespace):
|
||||
edge_data = {
|
||||
"source": edge.get("source").strip('"'),
|
||||
"target": edge.get("target").strip('"'),
|
||||
"weight": float(edge.find("./data[@key='d3']", namespace).text)
|
||||
if edge.find("./data[@key='d3']", namespace) is not None
|
||||
else 0.0,
|
||||
"description": edge.find("./data[@key='d4']", namespace).text
|
||||
if edge.find("./data[@key='d4']", namespace) is not None
|
||||
else "",
|
||||
"keywords": edge.find("./data[@key='d5']", namespace).text
|
||||
if edge.find("./data[@key='d5']", namespace) is not None
|
||||
else "",
|
||||
"source_id": edge.find("./data[@key='d6']", namespace).text
|
||||
if edge.find("./data[@key='d6']", namespace) is not None
|
||||
else "",
|
||||
}
|
||||
data["edges"].append(edge_data)
|
||||
|
||||
# Print the number of nodes and edges found
|
||||
print(f"Found {len(data['nodes'])} nodes and {len(data['edges'])} edges")
|
||||
|
||||
return data
|
||||
except ET.ParseError as e:
|
||||
print(f"Error parsing XML file: {e}")
|
||||
return None
|
||||
except Exception as e:
|
||||
print(f"An error occurred: {e}")
|
||||
return None
|
||||
|
||||
|
||||
def process_combine_contexts(hl, ll):
|
||||
header = None
|
||||
list_hl = csv_string_to_list(hl.strip())
|
||||
list_ll = csv_string_to_list(ll.strip())
|
||||
|
||||
if list_hl:
|
||||
header = list_hl[0]
|
||||
list_hl = list_hl[1:]
|
||||
if list_ll:
|
||||
header = list_ll[0]
|
||||
list_ll = list_ll[1:]
|
||||
if header is None:
|
||||
return ""
|
||||
|
||||
if list_hl:
|
||||
list_hl = [",".join(item[1:]) for item in list_hl if item]
|
||||
if list_ll:
|
||||
list_ll = [",".join(item[1:]) for item in list_ll if item]
|
||||
|
||||
combined_sources_set = set(filter(None, list_hl + list_ll))
|
||||
|
||||
combined_sources = [",\t".join(header)]
|
||||
|
||||
for i, item in enumerate(combined_sources_set, start=1):
|
||||
combined_sources.append(f"{i},\t{item}")
|
||||
|
||||
combined_sources = "\n".join(combined_sources)
|
||||
|
||||
return combined_sources
|
||||
|
||||
|
||||
|
||||
|
||||
def is_continuous_subsequence(subseq, seq):
|
||||
def find_all_indexes(tup, value):
|
||||
indexes = []
|
||||
start = 0
|
||||
while True:
|
||||
try:
|
||||
index = tup.index(value, start)
|
||||
indexes.append(index)
|
||||
start = index + 1
|
||||
except ValueError:
|
||||
break
|
||||
return indexes
|
||||
|
||||
index_list = find_all_indexes(seq,subseq[0])
|
||||
for idx in index_list:
|
||||
if idx!=len(seq)-1:
|
||||
if seq[idx+1]==subseq[-1]:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
|
||||
def merge_tuples(list1, list2):
|
||||
result = []
|
||||
for tup in list1:
|
||||
|
||||
last_element = tup[-1]
|
||||
if last_element in tup[:-1]:
|
||||
result.append(tup)
|
||||
else:
|
||||
|
||||
matching_tuples = [t for t in list2 if t[0] == last_element]
|
||||
|
||||
already_match_flag = 0
|
||||
for match in matching_tuples:
|
||||
|
||||
matchh = (match[1],match[0])
|
||||
if is_continuous_subsequence(match, tup) or is_continuous_subsequence(matchh, tup):
|
||||
continue
|
||||
|
||||
already_match_flag = 1
|
||||
merged_tuple = tup + match[1:]
|
||||
|
||||
result.append(merged_tuple)
|
||||
|
||||
if not already_match_flag:
|
||||
result.append(tup)
|
||||
return result
|
||||
|
||||
|
||||
def count_elements_in_tuple(tuple_elements, list_elements):
|
||||
sorted_list = sorted(list_elements)
|
||||
tuple_elements = sorted(tuple_elements)
|
||||
count = 0
|
||||
list_index = 0
|
||||
|
||||
for elem in tuple_elements:
|
||||
while list_index < len(sorted_list) and sorted_list[list_index] < elem:
|
||||
list_index += 1
|
||||
if list_index < len(sorted_list) and sorted_list[list_index] == elem:
|
||||
count += 1
|
||||
list_index += 1
|
||||
return count
|
||||
|
||||
|
||||
def cal_path_score_list(candidate_reasoning_path, maybe_answer_list):
|
||||
scored_reasoning_path = {}
|
||||
for k,v in candidate_reasoning_path.items():
|
||||
score = v['Score']
|
||||
paths = v['Path']
|
||||
scores = {}
|
||||
for p in paths:
|
||||
scores[p] = [count_elements_in_tuple(p, maybe_answer_list)]
|
||||
scored_reasoning_path[k] = {'Score': score, 'Path': scores}
|
||||
return scored_reasoning_path
|
||||
|
||||
|
||||
|
||||
|
||||
def edge_vote_path(path_dict,edge_list):
|
||||
return_dict = copy.deepcopy(path_dict)
|
||||
EDGELIST = []
|
||||
pairs_append = {}
|
||||
for i in edge_list:
|
||||
EDGELIST.append((i['src_id'],i['tgt_id']))
|
||||
for i in return_dict.items():
|
||||
for j in i[1]['Path'].items():
|
||||
if j[1]:
|
||||
count = 0
|
||||
|
||||
for pairs in EDGELIST:
|
||||
|
||||
if is_continuous_subsequence(pairs,j[0]):
|
||||
count = count+1
|
||||
if j[0] not in pairs_append:
|
||||
pairs_append[j[0]] = [pairs]
|
||||
else:
|
||||
pairs_append[j[0]].append(pairs)
|
||||
|
||||
#score
|
||||
j[1].append(count)
|
||||
return return_dict,pairs_append
|
||||
|
||||
|
||||
|
||||
from nltk.metrics import edit_distance
|
||||
from rouge import Rouge
|
||||
|
||||
def calculate_similarity(sentences, target, method='levenshtein', n=1, k=1):
|
||||
target_tokens = target.lower().split()
|
||||
similarities_with_index = []
|
||||
|
||||
if method == 'jaccard':
|
||||
for i, sentence in enumerate(sentences):
|
||||
sentence_tokens = sentence.lower().split()
|
||||
intersection = set(sentence_tokens).intersection(set(target_tokens))
|
||||
union = set(sentence_tokens).union(set(target_tokens))
|
||||
jaccard_score = len(intersection) / len(union) if union else 0
|
||||
similarities_with_index.append((i, jaccard_score))
|
||||
|
||||
elif method == 'levenshtein':
|
||||
for i, sentence in enumerate(sentences):
|
||||
distance = edit_distance(target_tokens, sentence.lower().split())
|
||||
similarities_with_index.append((i, 1 - (distance / max(len(target_tokens), len(sentence.split())))))
|
||||
|
||||
elif method == 'rouge':
|
||||
rouge = Rouge()
|
||||
for i, sentence in enumerate(sentences):
|
||||
scores = rouge.get_scores(sentence, target)
|
||||
rouge_score = scores[0].get(f'rouge-{n}', {}).get('f', 0)
|
||||
similarities_with_index.append((i, rouge_score))
|
||||
|
||||
else:
|
||||
raise ValueError("Unsupported method. Choose 'jaccard', 'levenshtein', or 'rouge'.")
|
||||
|
||||
similarities_with_index.sort(key=lambda x: x[1], reverse=True)
|
||||
return [index for index, score in similarities_with_index[:k]]
|
||||
89
reproduce/Step_0_index.py
Normal file
89
reproduce/Step_0_index.py
Normal file
@@ -0,0 +1,89 @@
|
||||
# from huggingface_hub import login
|
||||
# your_token = "INPUT YOUR TOKEN HERE"
|
||||
# login(your_token)
|
||||
|
||||
import sys
|
||||
import os
|
||||
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
|
||||
import csv
|
||||
from tqdm import trange
|
||||
from minirag import MiniRAG, QueryParam
|
||||
from minirag.llm import gpt_4o_mini_complete, hf_model_complete, hf_embedding,openai_embedding
|
||||
from minirag.utils import EmbeddingFunc
|
||||
from transformers import AutoModel,AutoTokenizer
|
||||
from datetime import datetime
|
||||
|
||||
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
|
||||
|
||||
import argparse
|
||||
def get_args():
|
||||
parser = argparse.ArgumentParser(description="MiniRAG")
|
||||
parser.add_argument('--model', type=str, default='PHI')
|
||||
parser.add_argument('--outputpath', type=str, default='./logs/Default_output.csv')
|
||||
parser.add_argument('--workingdir', type=str, default='./LiHua-World')
|
||||
parser.add_argument('--datapath', type=str, default='./dataset/LiHua-World/data/')
|
||||
parser.add_argument('--querypath', type=str, default='./dataset/LiHua-World/qa/query_set.csv')
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
args = get_args()
|
||||
|
||||
|
||||
if args.model == 'PHI':
|
||||
LLM_MODEL = "microsoft/Phi-3.5-mini-instruct"
|
||||
elif args.model == 'GLM':
|
||||
LLM_MODEL = "THUDM/glm-edge-1.5b-chat"
|
||||
elif args.model == 'MiniCPM':
|
||||
LLM_MODEL = "openbmb/MiniCPM3-4B"
|
||||
elif args.model == 'qwen':
|
||||
LLM_MODEL = "Qwen/Qwen2.5-3B-Instruct"
|
||||
else:
|
||||
print("Invalid model name")
|
||||
exit(1)
|
||||
|
||||
WORKING_DIR = args.workingdir
|
||||
DATA_PATH = args.datapath
|
||||
QUERY_PATH = args.querypath
|
||||
OUTPUT_PATH = args.outputpath
|
||||
print("USING LLM:", LLM_MODEL)
|
||||
print("USING WORKING DIR:", WORKING_DIR)
|
||||
|
||||
|
||||
if not os.path.exists(WORKING_DIR):
|
||||
os.mkdir(WORKING_DIR)
|
||||
|
||||
rag = MiniRAG(
|
||||
working_dir=WORKING_DIR,
|
||||
# llm_model_func=hf_model_complete,
|
||||
llm_model_func=gpt_4o_mini_complete,
|
||||
llm_model_max_token_size=200,
|
||||
llm_model_name= LLM_MODEL,
|
||||
embedding_func=EmbeddingFunc(
|
||||
embedding_dim=384,
|
||||
max_token_size=1000,
|
||||
func=lambda texts: hf_embedding(
|
||||
texts,
|
||||
tokenizer=AutoTokenizer.from_pretrained(EMBEDDING_MODEL),
|
||||
embed_model=AutoModel.from_pretrained(EMBEDDING_MODEL)
|
||||
)
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
#Now indexing
|
||||
def find_txt_files(root_path):
|
||||
txt_files = []
|
||||
for root, dirs, files in os.walk(root_path):
|
||||
for file in files:
|
||||
if file.endswith('.txt'):
|
||||
txt_files.append(os.path.join(root, file))
|
||||
return txt_files
|
||||
|
||||
WEEK_LIST = find_txt_files(DATA_PATH)
|
||||
for WEEK in WEEK_LIST:
|
||||
id = WEEK_LIST.index(WEEK)
|
||||
print(f"{id}/{len(WEEK_LIST)}")
|
||||
with open(WEEK) as f:
|
||||
rag.insert(f.read())
|
||||
121
reproduce/Step_1_QA.py
Normal file
121
reproduce/Step_1_QA.py
Normal file
@@ -0,0 +1,121 @@
|
||||
# from huggingface_hub import login
|
||||
# your_token = "INPUT YOUR TOKEN HERE"
|
||||
# login(your_token)
|
||||
|
||||
import sys
|
||||
import os
|
||||
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
|
||||
import csv
|
||||
from tqdm import trange
|
||||
from minirag import MiniRAG, QueryParam
|
||||
from minirag.llm import gpt_4o_mini_complete, hf_model_complete, hf_embedding,openai_embedding
|
||||
from minirag.utils import EmbeddingFunc
|
||||
from transformers import AutoModel,AutoTokenizer
|
||||
from datetime import datetime
|
||||
|
||||
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
|
||||
|
||||
import argparse
|
||||
def get_args():
|
||||
parser = argparse.ArgumentParser(description="MiniRAG")
|
||||
parser.add_argument('--model', type=str, default='PHI')
|
||||
parser.add_argument('--outputpath', type=str, default='./logs/Default_output.csv')
|
||||
parser.add_argument('--workingdir', type=str, default='./LiHua-World')
|
||||
parser.add_argument('--datapath', type=str, default='./dataset/LiHua-World/data/')
|
||||
parser.add_argument('--querypath', type=str, default='./dataset/LiHua-World/qa/query_set.csv')
|
||||
args = parser.parse_args()
|
||||
return args
|
||||
|
||||
args = get_args()
|
||||
|
||||
|
||||
if args.model == 'PHI':
|
||||
LLM_MODEL = "microsoft/Phi-3.5-mini-instruct"
|
||||
elif args.model == 'GLM':
|
||||
LLM_MODEL = "THUDM/glm-edge-1.5b-chat"
|
||||
elif args.model == 'MiniCPM':
|
||||
LLM_MODEL = "openbmb/MiniCPM3-4B"
|
||||
elif args.model == 'qwen':
|
||||
LLM_MODEL = "Qwen/Qwen2.5-3B-Instruct"
|
||||
else:
|
||||
print("Invalid model name")
|
||||
exit(1)
|
||||
|
||||
WORKING_DIR = args.workingdir
|
||||
DATA_PATH = args.datapath
|
||||
QUERY_PATH = args.querypath
|
||||
OUTPUT_PATH = args.outputpath
|
||||
print("USING LLM:", LLM_MODEL)
|
||||
print("USING WORKING DIR:", WORKING_DIR)
|
||||
|
||||
|
||||
if not os.path.exists(WORKING_DIR):
|
||||
os.mkdir(WORKING_DIR)
|
||||
|
||||
rag = MiniRAG(
|
||||
working_dir=WORKING_DIR,
|
||||
llm_model_func=hf_model_complete,
|
||||
# llm_model_func=gpt_4o_mini_complete,
|
||||
llm_model_max_token_size=200,
|
||||
llm_model_name= LLM_MODEL,
|
||||
embedding_func=EmbeddingFunc(
|
||||
embedding_dim=384,
|
||||
max_token_size=1000,
|
||||
func=lambda texts: hf_embedding(
|
||||
texts,
|
||||
tokenizer=AutoTokenizer.from_pretrained(EMBEDDING_MODEL),
|
||||
embed_model=AutoModel.from_pretrained(EMBEDDING_MODEL)
|
||||
)
|
||||
),
|
||||
)
|
||||
|
||||
#Now QA
|
||||
QUESTION_LIST = []
|
||||
GA_LIST = []
|
||||
with open(QUERY_PATH, mode='r', encoding='utf-8') as question_file:
|
||||
reader = csv.DictReader(question_file)
|
||||
for row in reader:
|
||||
QUESTION_LIST.append(row['Question'])
|
||||
GA_LIST.append(row['Gold Answer'])
|
||||
|
||||
|
||||
def run_experiment(output_path):
|
||||
current_time = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
headers = ['Question', 'Gold Answer', 'minirag']
|
||||
|
||||
q_already = []
|
||||
if os.path.exists(output_path):
|
||||
with open(output_path, mode='r', encoding='utf-8') as question_file:
|
||||
reader = csv.DictReader(question_file)
|
||||
for row in reader:
|
||||
q_already.append(row['Question'])
|
||||
|
||||
row_count = len(q_already)
|
||||
print('row_count', row_count)
|
||||
|
||||
with open(output_path, mode='a', newline='', encoding='utf-8') as log_file:
|
||||
writer = csv.writer(log_file)
|
||||
if row_count == 0:
|
||||
writer.writerow(headers)
|
||||
|
||||
for QUESTIONid in trange(row_count,len(QUESTION_LIST)):#
|
||||
QUESTION = QUESTION_LIST[QUESTIONid]
|
||||
Gold_Answer = GA_LIST[QUESTIONid]
|
||||
print()
|
||||
print('QUESTION', QUESTION)
|
||||
print('Gold_Answer', Gold_Answer)
|
||||
|
||||
try:
|
||||
minirag_answer = rag.query(QUESTION, param=QueryParam(mode="mini")).replace("\n", "").replace("\r", "")
|
||||
except Exception as e:
|
||||
print('Error in minirag_answer', e)
|
||||
|
||||
writer.writerow([QUESTION, Gold_Answer,minirag_answer])
|
||||
|
||||
print(f'Experiment data has been recorded in the file: {output_path}')
|
||||
|
||||
# if __name__ == "__main__":
|
||||
|
||||
run_experiment(OUTPUT_PATH)
|
||||
21
requirements.txt
Normal file
21
requirements.txt
Normal file
@@ -0,0 +1,21 @@
|
||||
aioboto3==13.3.0
|
||||
aiohttp==3.10.5
|
||||
graspologic==3.4.1
|
||||
json_repair==0.30.2
|
||||
lmdeploy==0.6.5
|
||||
nano_vectordb==0.0.4
|
||||
neo4j==5.26.0
|
||||
nest_asyncio==1.6.0
|
||||
networkx==3.2.1
|
||||
nltk==3.9.1
|
||||
numpy==2.2.1
|
||||
ollama==0.4.5
|
||||
openai==1.59.3
|
||||
oracledb==2.5.0
|
||||
pydantic==2.10.4
|
||||
rouge==1.0.1
|
||||
tenacity==8.5.0
|
||||
tiktoken==0.7.0
|
||||
torch==2.5.0
|
||||
tqdm==4.66.5
|
||||
transformers==4.47.0.dev0
|
||||
Reference in New Issue
Block a user