Add more benchs and pipelines

This commit is contained in:
Ruslan Bel'kov
2025-03-08 20:34:01 +03:00
parent d41142359c
commit b135ae1f42

View File

@@ -13,29 +13,36 @@ contents [by default](https://github.blog/changelog/2021-04-13-table-of-contents
## Comparison
| Pipeline | [OmniDocBench](#omnidocbench) Overall ↓ | [olmOCR](#olmoocreval) ELO ↑ | [Marker](#marker-benchmarks) Overall ↓ | [Mistral](#mistral-ocr-benchmarks) Overall ↑ | [READoc](#readoc) Overall ↑ | [Actualize.pro](#actualize-pro) Overall ↑ |
|-------------------------------|-----------------------------------------|------------------------------|----------------------------------------|:---------------------------------------------|-----------------------------|-------------------------------------------|
| [MinerU](#MinerU) | **0.150** ⚠️ | 1545.2 | | | 60.17 | **8** |
| [Marker](#Marker) | 0.336 | 1429.1 | **4.23916** ⚠️ | | 63.57 | 6.5 |
| [Mathpix](#Mathpix) | 0.189 | | 4.15626 | | | |
| [DocLing](#DocLing) | 0.589 | | 3.70429 | | | 7.3 |
| [GOT-OCR](#GOT-OCR) | 0.289 | 1212.7 | | | | |
| [olmOCR](#olmOCR) | | **1813.0** ⚠️ | | | | |
| [LlamaParse](#LlamaParse) | | | 3.97619 | | | 7.1 |
| [MarkItDown](#MarkItDown) | | | | | | 7.78 |
| [Nougat](#Nougat) | 0.453 | | | | **81.42** | |
| [Zerox](#Zerox) | | | | | | 7.9 |
| [Unstructured](#Unstructured) | | | | | | 6.2 |
| [Pix2Text](#Pix2Text) | | | | | 64.39 | |
| [open-parse](#open-parse) | | | | | | |
| [Markdrop](#markdrop) | | | | | | |
| Mistral OCR 2503 | | | | **94.89** ⚠️ | | |
| Google Document AI | | | | 83.42 | | |
| Azure OCR | | | | 89.52 | | |
| Gemini-1.5-Flash-002 | | | | 90.23 | | |
| Gemini-1.5-Pro-002 | | | | 89.92 | | |
| Gemini-2.0-Flash-001 | | | | 88.69 | | |
| GPT4o | 0.233 | | | 89.77 | | |
| Pipeline | [OmniDocBench](#omnidocbench) Overall ↓ | [Omni OCR](#omni-ocr-benchmark) Accuracy ↑ | [olmOCR](#olmoocr-eval) ELO ↑ | [Marker](#marker-benchmarks) Overall ↓ | [Mistral](#mistral-ocr-benchmarks) Overall ↑ | [dp-bench](#dp-bench) NID ↑ | [READoc](#readoc) Overall ↑ | [Actualize.pro](#actualize-pro) Overall ↑ |
|-------------------------------|-----------------------------------------|:-------------------------------------------|-------------------------------|----------------------------------------|:---------------------------------------------|-----------------------------|-----------------------------|-------------------------------------------|
| [MinerU](#MinerU) | **0.150** ⚠️ | | 1545.2 | | | | 60.17 | **8** |
| [Marker](#Marker) | 0.336 | | 1429.1 | **4.23916** ⚠️ | | | 63.57 | 6.5 |
| [DocLing](#DocLing) | 0.589 | | | 3.70429 | | | | 7.3 |
| [GOT-OCR](#GOT-OCR) | 0.289 | | 1212.7 | | | | | |
| [olmOCR](#olmOCR) | | | **1813.0** ⚠️ | | | | | |
| [MarkItDown](#MarkItDown) | | | | | | | | 7.78 |
| [Nougat](#Nougat) | 0.453 | | | | | | **81.42** | |
| [Zerox (OmniAI)](#Zerox) | | **91.7** ⚠️ | | | | | | 7.9 |
| [Unstructured](#Unstructured) | | 50.8 | | | | 91.18 | | 6.2 |
| [Pix2Text](#Pix2Text) | | | | | | | 64.39 | |
| [open-parse](#open-parse) | | | | | | | | |
| [Markdrop](#markdrop) | | | | | | | | |
| | | | | | | | | |
| Mistral OCR 2503 | | | | | **94.89** ⚠️ | | | |
| Google Document AI | | 67.8 | | | 83.42 | 90.86 | | |
| Azure OCR | | 85.1 | | | 89.52 | 87.69 | | |
| AWS Textract | | 74.3 | | | | 96.71 | | |
| [LlamaParse](#LlamaParse) | | | | 3.97619 | | 92.82 | | 7.1 |
| [Mathpix](#Mathpix) | 0.189 | | | 4.15626 | | | | |
| upstage | | | | | | **97.02** ⚠️ | | |
| | | | | | | | | |
| Gemini-1.5-Flash-002 | | | | | 90.23 | | | |
| Gemini-1.5-Pro-002 | | | | | 89.92 | | | |
| Gemini-2.0-Flash-001 | | 86.1 | | | 88.69 | | | |
| GPT4o | 0.233 | 75.5 | | | 89.77 | | | |
| Claude Sonnet 3.5 | | 69.3 | | | | | | |
### [dp-bench](https://huggingface.co/datasets/upstage/dp-bench)
- **Bold** indicates the best result for a given metric.
- " " means the pipeline was not evaluated in that benchmark.
@@ -694,6 +701,17 @@ that aligns text with ground truth text segments, and an LLM as a judge scoring
| GPT-4o-2024-11-20 | 89.77 | 87.55 | 86.00 | 94.58 | 91.70 |
| Mistral OCR 2503 | **94.89** | **94.29** | **89.55** | **98.96** | **96.12** |
### [dp-bench](https://huggingface.co/datasets/upstage/dp-bench)
| Source | Request date | TEDS ↑ | TEDS-S ↑ | NID ↑ | Avg. Time (secs) ↓ |
|--------------|--------------|--------|----------|-------|--------------------|
| upstage | 2024-10-24 | 93.48 | 94.16 | 97.02 | 3.79 |
| aws | 2024-10-24 | 88.05 | 90.79 | 96.71 | 14.47 |
| llamaparse | 2024-10-24 | 74.57 | 76.34 | 92.82 | 4.14 |
| unstructured | 2024-10-24 | 65.56 | 70.00 | 91.18 | 13.14 |
| google | 2024-10-24 | 66.13 | 71.58 | 90.86 | 5.85 |
| microsoft | 2024-10-24 | 87.19 | 89.75 | 87.69 | 4.44 |
### [Actualize pro](https://www.actualize.pro/recourses/unlocking-insights-from-pdfs-a-comparative-study-of-extraction-tools)
[![GitHub last commit](https://img.shields.io/github/last-commit/actualize-ae/pdf-benchmarking?label=GitHub&logo=github)](https://github.com/actualize-ae/pdf-benchmarking)