Commit Graph

  • 7df157204b chore: bump version to 2.28.0 [skip ci] main v2.28.0 github-actions[bot] 2025-03-19 15:18:10 +00:00
  • d0245e9bf2 Deployed 1c26769 with MkDocs version: 1.6.1 gh-pages 2025-03-19 14:40:59 +00:00
  • 1c26769785 feat(SmolDocling): Support MLX acceleration in VLM pipeline (#1199) Maxim Lysak 2025-03-19 15:38:54 +01:00
  • b454aa1551 feat: Add PPTX notes slides (#474) Maciej Wieczorek 2025-03-19 14:52:09 +01:00
  • f5adfb9724 fix: Determine correct page size in DoclingParseV4Backend (#1196) Christoph Auer 2025-03-19 11:05:42 +01:00
  • d5f7798763 test(html): fix regression test after docling-core update (#1197) Cesar Berrospi Ramis 2025-03-19 11:03:46 +01:00
  • 0b707d0882 fix(msword): Fixing function return in equations handling (#1194) Rafael Teixeira de Lima 2025-03-19 10:34:25 +01:00
  • 1d680b0a32 docs: Linux Foundation AI & Data (#1183) Michele Dolfi 2025-03-19 09:05:57 +01:00
  • 54a78c307d docs: move apify to docs (#1182) Michele Dolfi 2025-03-18 16:43:55 +01:00
  • 2f72167ff6 feat: updated vlm pipeline (with latest changes from docling-core) (#1158) Maxim Lysak 2025-03-18 15:44:51 +01:00
  • 1a2a9e4eff chore: bump version to 2.27.0 [skip ci] v2.27.0 github-actions[bot] 2025-03-18 13:37:45 +00:00
  • 6eaae3cba0 feat: add factory for ocr engines via plugins (#1010) Michele Dolfi 2025-03-18 13:58:05 +01:00
  • 3960b199d6 feat: Add DoclingParseV4 backend, using high-level docling-parse API (#905) Christoph Auer 2025-03-18 10:38:19 +01:00
  • 772487f9c9 feat(actor): Docling Actor on Apify infrastructure (#875) Václav Vančura 2025-03-18 10:17:44 +01:00
  • 75a03c4257 disable GT generation on test_interfaces cau/dpv4-test-updates Christoph Auer 2025-03-17 11:31:18 +01:00
  • 9359f86c6a Merge branch 'cau/docling-parse-api' of github.com:DS4SD/docling into cau/dpv4-test-updates Christoph Auer 2025-03-17 11:17:31 +01:00
  • 50ac62b5fa test_input_doc use default backend Christoph Auer 2025-03-17 11:13:42 +01:00
  • 7bce91893c Unset DPv1 backend on tests (use DPv4 default), re-generate test output Christoph Auer 2025-03-17 11:04:41 +01:00
  • eff907811a Merge branch 'main' of github.com:DS4SD/docling into cau/docling-parse-api Christoph Auer 2025-03-17 10:37:13 +01:00
  • 7e01798417 docs: fix spelling of picture in usage (#1165) serced 2025-03-17 09:33:51 +01:00
  • fe45d30942 Fixes for DPv4 backend init, better test coverage Christoph Auer 2025-03-17 09:26:31 +01:00
  • e34c0750a7 Reset all tests to use docling-parse v1 for now Christoph Auer 2025-03-14 16:39:16 +01:00
  • 412c013d95 Merge from main Christoph Auer 2025-03-14 13:52:36 +01:00
  • d654568ad9 Test all backends, fixes Christoph Auer 2025-03-14 13:32:37 +01:00
  • af18215714 Rename docling backend to v4 Christoph Auer 2025-03-14 12:35:06 +01:00
  • fa16b12316 chore: move to docling-project org (#1160) Michele Dolfi 2025-03-14 12:35:29 +01:00
  • b77f73beec Text fixes, new test data Christoph Auer 2025-03-14 11:44:09 +01:00
  • f94da44ec5 fix(html): handle nested empty lists (#1154) Cesar Berrospi Ramis 2025-03-13 16:56:58 +01:00
  • e00f362405 Update tests, use TextCell.from_ocr property Christoph Auer 2025-03-13 16:04:08 +01:00
  • 0945973b79 fix: use first table row as col headers (#1156) Panos Vagenas 2025-03-13 15:34:18 +01:00
  • 6eb718f849 feat: equations to latex in MSWord backend (with inline groups) (#1114) Rafael Teixeira de Lima 2025-03-13 15:12:22 +01:00
  • aa92a57fa9 fix: Pass tests, update docling-core to 2.22.0 (#1150) Cesar Berrospi Ramis 2025-03-13 09:45:55 +01:00
  • 6e06040da6 Fix tests Christoph Auer 2025-03-12 20:04:17 +01:00
  • f1cce8ff07 Ground-truth files updated Christoph Auer 2025-03-12 19:57:18 +01:00
  • 519bc43e47 fix: update docling-core to 2.22.0 Cesar Berrospi Ramis 2025-03-12 19:38:03 +01:00
  • 90b0f73d06 Update locks Christoph Auer 2025-03-12 16:54:23 +01:00
  • 9ebd7108f2 Add back DoclingParse v1 backend, pipeline options Christoph Auer 2025-03-12 16:28:25 +01:00
  • 8a45a2cafa update test units Christoph Auer 2025-03-12 12:07:03 +01:00
  • 15282547cb update test cases Christoph Auer 2025-03-12 11:04:48 +01:00
  • 18b4991aa4 Reset tests Christoph Auer 2025-03-11 16:34:38 +01:00
  • a5089ef8f6 Merge branch 'cau/docling-parse-api' of github.com:DS4SD/docling into cau/docling-parse-api Christoph Auer 2025-03-11 16:31:50 +01:00
  • 1b9fcf0edf Fix streams Christoph Auer 2025-03-11 16:24:49 +01:00
  • 31c86613e5 Fix streams Christoph Auer 2025-03-11 16:24:49 +01:00
  • fbcde2cdeb Merge branch 'main' of github.com:DS4SD/docling into cau/docling-parse-api Christoph Auer 2025-03-11 16:06:55 +01:00
  • f411772569 Fixes and test updates Christoph Auer 2025-03-11 16:06:28 +01:00
  • 0dd596ff09 Draft implementation of Doctag backend dev/doctag_backend Maksym Lysak 2025-03-11 14:02:34 +01:00
  • 78353f1697 Use docling-core with docling-parse types Christoph Auer 2025-03-11 13:37:24 +01:00
  • 17c5bf1242 chore: bump version to 2.26.0 [skip ci] v2.26.0 github-actions[bot] 2025-03-11 11:12:43 +00:00
  • eb97357b05 feat: Use new TableFormer model weights and default to accurate model version (#1100) Christoph Auer 2025-03-11 10:53:49 +01:00
  • 5e30381c0d perf: New revision code formula model and document picture classifier (#1140) Matteo 2025-03-11 09:15:28 +00:00
  • 099aa4da83 Updates for DoclingParseV3DocumentBackend Christoph Auer 2025-03-10 17:11:20 +01:00
  • 4d64c4c0b6 fix(CLI): fix help message for abort options (#1130) Michele Dolfi 2025-03-07 14:47:49 +01:00
  • e1c49ad727 docs: add description of DOCLING_ARTIFACTS_PATH env var (#1124) Michele Dolfi 2025-03-06 07:30:07 +01:00
  • a3c957ca6b chore: bump version to 2.25.2 [skip ci] v2.25.2 github-actions[bot] 2025-03-05 14:51:57 +00:00
  • c56ab3a66b fix: Proper handling of orphan IDs in layout postprocessing (#1118) Christoph Auer 2025-03-05 14:30:59 +01:00
  • 655e95dd72 Upgrading docling core and adding groups rtdl/docx_latex Rafael Teixeira de Lima 2025-03-04 17:18:40 +01:00
  • 5630c6b8fd Merge branch 'main' into rtdl/docx_latex Rafael Teixeira de Lima 2025-03-04 16:51:53 +01:00
  • 357d41cc47 docs: Enrichment models (#1097) Michele Dolfi 2025-03-04 14:24:38 +01:00
  • b1e79cadc7 chore: bump version to 2.25.1 [skip ci] v2.25.1 github-actions[bot] 2025-03-03 00:56:40 +00:00
  • 0c1e9391de chore: use gh cache for huggingface models (#1096) Michele Dolfi 2025-03-03 00:13:47 +01:00
  • 8dc0562542 fix: enable locks for threadsafe pdfium (#1052) Michele Dolfi 2025-03-02 20:06:44 +01:00
  • e25d557c06 refactor: add the contentlayer to html-backend (#1040) Peter W. J. Staar 2025-03-02 10:37:53 -05:00
  • 23a429b73b docs: show visual grounding on RAG show-visual-grounding Panos Vagenas 2025-02-28 17:25:26 +01:00
  • db3ceefd4a docs: improve docs on token limit warning triggered by HybridChunker (#1077) Panos Vagenas 2025-02-28 14:54:46 +01:00
  • de7b963b09 fix(html): use 'start' attribute when parsing ordered lists from HTML docs (#1062) Cesar Berrospi Ramis 2025-02-27 09:46:57 +01:00
  • 37dd8c1cc7 chore: bump version to 2.25.0 [skip ci] v2.25.0 github-actions[bot] 2025-02-26 14:16:15 +00:00
  • 3c9fe76b70 feat: [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054) Christoph Auer 2025-02-26 14:43:26 +01:00
  • ab683e4fb6 feat(cli): add option for downloading all models, refine help messages (#1061) Panos Vagenas 2025-02-26 13:27:29 +01:00
  • e197225739 fix: vlm using artifacts path (#1057) Michele Dolfi 2025-02-26 08:33:50 +01:00
  • c84b973959 docs: extend chunking docs, add FAQ on token limit (#1053) Panos Vagenas 2025-02-25 13:07:38 +01:00
  • 1c75b52f85 re-built poetry.lock mly/smol-docling-integration Maksym Lysak 2025-02-24 17:37:35 +01:00
  • 9ecec1d330 Updated poetry.lock Maksym Lysak 2025-02-24 17:27:50 +01:00
  • 923f766ada Replaced remaining strings to appropriate enums Maksym Lysak 2025-02-24 16:54:59 +01:00
  • a095a7c5b7 removing changes from base_pipeline Maksym Lysak 2025-02-24 15:13:59 +01:00
  • a7a1f32b10 Added example on how to get original predicted doctags in minimal_smol_docling Maksym Lysak 2025-02-24 14:39:18 +01:00
  • 1dbedcbb4e removed pipeline_options.generate_table_images from vlm_pipeline (deprecated in the pipelines) Maksym Lysak 2025-02-24 14:17:06 +01:00
  • 0c60ef199a Moved keep_backend = True to vlm pipeline Maksym Lysak 2025-02-13 17:53:03 +01:00
  • 853544ba11 Addressing PR comments, added enabled property to SmolDocling, and related VLM pipeline option, few other minor things Maksym Lysak 2025-02-13 17:19:53 +01:00
  • b0935daec4 Removed special html code wrapping when exporting to docling document, cleaned up comments Maksym Lysak 2025-02-13 10:29:37 +01:00
  • b12f5ba80f removed minimal_smol_docling example from CI checks Maksym Lysak 2025-02-13 09:42:45 +01:00
  • 66532eadb6 More elegant solution in removing the input prompt Maksym Lysak 2025-02-12 18:48:48 +01:00
  • e486eb1720 Cleaned up unnecessary logging Maksym Lysak 2025-02-12 17:56:37 +01:00
  • 55fa4eb4e3 Fix repo id Christoph Auer 2025-02-12 17:09:56 +01:00
  • 6f9f4f4aee Update minimal smoldocling example Christoph Auer 2025-02-12 17:07:00 +01:00
  • b1df461ca8 Added captions for the images for SmolDocling assembly code, improved provenance definition for all elements Maksym Lysak 2025-02-11 16:42:23 +01:00
  • d7abe1b1cd Updated example of Smol Docling usage Maksym Lysak 2025-02-11 13:53:19 +01:00
  • 479ee239aa New assembly code for latest model revision, updated prompt and parsing of doctags, updated logging Maksym Lysak 2025-02-11 13:34:14 +01:00
  • 7c4ab5c716 Moved artifacts_path for SmolDocling into vlm_options instead of global pipeline option Maksym Lysak 2025-01-21 18:00:05 +01:00
  • f2751e11f9 Introduced SmolDoclingOptions to configure model parameters (such as query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models. Maksym Lysak 2025-01-21 17:37:11 +01:00
  • 88b9ac6706 Fixing doctags starting tag, that broke elements on first line during assembly Maksym Lysak 2025-01-21 11:14:55 +01:00
  • 0fe12d819a Updated vlm pipeline assembly and smol docling model code to support updated doctags Maksym Lysak 2025-01-17 17:54:55 +01:00
  • f6d123a01c Flipped keep_backend to True for vlm_pipeline assembly to work Maksym Lysak 2025-01-16 16:51:27 +01:00
  • 9901729d8c Exposed "force_backend_text" as pipeline parameter Maksym Lysak 2025-01-16 14:23:59 +01:00
  • 0dc3ac43b1 Added capability for vlm_pipeline to grab text from preconfigured backend Maksym Lysak 2025-01-16 10:44:49 +01:00
  • e0929781f4 Added tokens/sec measurement, improved example Maksym Lysak 2025-01-15 10:22:48 +01:00
  • 437053572d Replaced hardcoded otsl tokens with the ones from docling-core tokens.py enum Maksym Lysak 2025-01-14 16:07:37 +01:00
  • 2a43c199d5 Cleaned up logs, added pages to vlm_pipeline, basic timing per page measurement in smol_docling models Maksym Lysak 2025-01-14 14:04:47 +01:00
  • 61bb9dbba2 Properly propagating image data per page, together with predicted tags in VLM pipeline. This enables correct figure extraction and page numbers in provenances Maksym Lysak 2025-01-13 15:21:19 +01:00
  • 01c46e24b1 Fix for table span compute in vlm_pipeline Maksym Lysak 2025-01-10 16:30:12 +01:00
  • ef079e4e78 Enabled figure support in vlm_pipeline Maksym Lysak 2025-01-10 13:56:46 +01:00