NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other enterprise documents into metadata and text to embed into retrieval systems.
Updated 2025-01-05 18:58:13 +03:00
Convert PDF to markdown quickly with high accuracy
Updated 2024-06-09 20:49:02 +03:00
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Updated 2023-06-17 01:18:36 +03:00
This program creates a PDF including the original post with graphs of the most used language from scraping a LinkedIn job post using Python.
Updated 2022-02-19 19:59:36 +03:00