Hancom Tops Open-Source PDF Benchmarks with OpenDataLoader PDF v2.0
- Tops in benchmark test, including reading order, tables, and title inference.
- Offers a perfect local security environment with the hybrid engine that combines AI and direct extraction heuristic engine
- Four AI functions via free AI add-ons; OCR, Table AI, Chart and Formula AI
- Change to Apache-2.0 licensing from MPL-2.0
The headline engineering move is a hybrid extraction engine that pairs AI-based parsing with direct extraction. The practical upside: enterprises and developers get high-accuracy PDF data extraction that runs entirely on-premise, with no data leaving the local environment. For organizations handling sensitive documents — legal, financial, medical — that's not a minor footnote.
Four Free AI Add-ons, Out of the Box
OpenDataLoader PDF PDF v2.0 includes the following four AI features as add-ons at no additional cost:
- OCR— improves text recognition on image-based and scanned PDFs
- Table Extraction— a lightweight AI model that handles merged cells and complex table structures with precision
- Formula Extraction— recognizes mathematical and scientific notation locally, without a cloud call
- Chart Analysis— converts chart visuals into natural-language descriptions
All four are built for compatibility with third-party open-source models, including Docling. Hancom is clear that no formal partnership or sponsorship is in place — the compatibility is purely technical, designed so developers can slot OpenDataLoader PDF into existing pipelines without rebuilding their stack.
Apache 2.0: Lowering the Barrier, Expanding the Ecosystem
The project has also shed its MPL 2.0 license in favor of Apache 2.0 — one of the most permissive open-source licenses available. The move directly reduces friction for commercial use, making it easier for global developers and enterprises to build on top of OpenDataLoader PDF without navigating license compatibility headaches. Hancom expects this to accelerate downstream business models including WebApp and SaaS applications built on the engine.
Ecosystem Expansion: LangChain Is In, More Integrations Coming
LangChain integration shipped in 2025. In 2026, Hancom is targeting Langflow, LlamaIndex, and Gemini CLI, plus MCP (Model Context Protocol) support for agentic AI workflows. The roadmap positions OpenDataLoader PDF as infrastructure for the autonomous AI agent era, not just a standalone parsing tool.
Later in 2026, a commercial AI add-on is planned — described as a concentration of Hancom's proprietary document AI technology.
AI based auto-tagging to Tagged PDF: Start The Accessibility Play
Perhaps the most forward-looking item on the roadmap is PDF accessibility. With the European Accessibility Act (EAA) now in force,
What Hancom's CTO Said
"OpenDataLoader PDF v2.0 has evolved into an open PDF data platform that anyone can freely use and build upon, through its AI hybrid engine and transition to Apache 2.0," said
OpenDataLoader PDF PDF v2.0 is available now. Source code, benchmark datasets, and documentation are published at the OpenDataLoader PDF official GitHub repository.
View original content to download multimedia:https://www.prnewswire.com/news-releases/hancom-tops-open-source-pdf-benchmarks-with-opendataloader-pdf-v2-0--302713091.html
SOURCE Hancom
Serious News for Serious Traders! Try StreetInsider.com Premium Free!
You May Also Be Interested In
- Whirlpool Announces Cash Tender Offer Early Results
- Julian Barnes Named Laureate of the 2026 Fitzgerald Prize
- PreveCeutical Closes Second Tranche of Non-Brokered Private Placement
Create E-mail Alert Related Categories
PRNewswire, Press ReleasesSign up for StreetInsider Free!
Receive full access to all new and archived articles, unlimited portfolio tracking, e-mail alerts, custom newswires and RSS feeds - and more!



Tweet
Share