Published on April 7, 2026

LightOnOCR-2 Open-Source Model Excels in Table Recognition Against Competitors

Open-Source Model LightOnOCR-2 Outperforms Claude, GPT-5, and Others in Table Recognition

LightOn has released the open-source model LightOnOCR-2, which surpasses leading commercial AI solutions in the task of extracting tables from documents.

Products 4 – 5 minutes min read
Event Source: LightOn AI 4 – 5 minutes min read

If you ask where the most valuable information in corporate documents is hidden, the answer is almost always the same: in tables. Financial reports, technical specifications, medical data – all of this is typically structured in this way. And it's precisely tables that most AI tools have historically struggled with.

Why Tables Are Complex for AI Recognition

Tables Are More Than Just Text

Simply put, recognizing a table is harder than it looks. It's not just a collection of words; it's a structure where the row and column of each element are crucial. Merged cells, nested headers, and complex layouts – all of this turns the task into a puzzle, even for powerful models. This is why many companies still process documents manually or pay for specialized services.

Against this backdrop, LightOn has introduced the second version of its model: LightOnOCR-2. This is an open-source model specializing in what's known as OCR, or Optical Character Recognition, which involves recognizing characters and structures in scanned or photographed documents. But the key achievement here isn't just recognizing characters; it's the ability to accurately extract tables with all their rows, columns, and interrelationships.

LightOnOCR-2 Versus Commercial Table Extraction Models

How LightOnOCR-2 Outperformed Commercial Giants

In comparative testing, LightOnOCR-2 surpassed a whole range of well-known solutions – Claude, GPT-5, Qwen3, Mistral, and Mathpix – specifically in the task of table extraction. This is noteworthy for several reasons.

First, most of the listed models are commercial, backed by large companies with vast resources. LightOnOCR-2 is open-source, meaning its code and weights are available to everyone. Second, large general-purpose models like GPT-5 or Claude can do many things, but they often fall short against more specialized solutions where precision in a specific task is required.

It's like having a multi-tool that's good for most jobs, but when you need to do something precisely, you reach for a specialized instrument. LightOnOCR-2 is exactly that case: the model is fine-tuned for working with documents, and it's in this niche that it delivers better results than the larger «jacks-of-all-trades.»

Importance of Accurate Table Extraction in Document Processing

Why This Matters for Document Processing

The task of table extraction isn't just an abstract benchmark. Behind it lies a very real need: companies work with vast numbers of documents every day where data is packed into tables. Banks process financial reports, hospitals handle medical records, and logistics companies deal with invoices. An error in a single cell can distort the entire picture.

Until now, automating this process was either expensive (subscription-based commercial solutions) or unreliable (general-purpose models that only «understand» tables approximately). LightOnOCR-2 offers a third option: a high-precision, open-source solution that can be deployed independently.

This is especially relevant for organizations that need to avoid sending documents to external cloud services – whether for confidentiality reasons or due to regulatory requirements. Deploying an open-source model locally solves this problem.

Open Source Technology for Document Processing Advantage

Open Source as a Competitive Advantage

LightOnOCR-2 arrives at a time when open-source models are increasingly challenging commercial ones in specialized tasks. Recently, Google released the Gemma 4 family – also open-source models under the Apache 2.0 license – which compete with much larger solutions in certain scenarios. The trend is clear: open-source projects are no longer in the «second league» and are starting to set the standard in specific niches.

In the case of LightOnOCR-2, that niche is working with documents and tables. And based on the test results, the open-source model doesn't just match its commercial counterparts – it surpasses them.

Future Considerations for Table Recognition Models

What Remains an Open Question

Benchmark results are always a snapshot taken under specific conditions. How the model performs on real-world documents with non-standard layouts, in languages with different typography, or on tables with partially damaged or unreadable data – these are separate questions that can only be answered in practice, not in lab tests.

Nevertheless, the emergence of a strong open-source alternative in a niche long dominated by commercial solutions is a significant event. This is especially true for teams looking for a reliable document processing tool, who are unwilling to depend on external APIs and want to understand what's happening «under the hood.»

Original Title: Open-Source LightOnOCR-2 Just Outscored Claude, GPT-5, Qwen3, Mistral and Mathpix at Table Extraction
Publication Date: Apr 7, 2026
LightOn AI www.lighton.ai A French company developing large language models and AI solutions for business and research.
Previous Article PyTorch Foundation Adopts Helion as an Official Project: What This Means for AI Development Next Article Elevators for People: How AI Is Changing the Approach to Design

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

AI: Events

Gemma 4 on AMD: Day-and-Date Support on Release

Technical context Infrastructure

Google has released the Gemma 4 family of open models, and AMD has provided immediate support on release day across its entire hardware spectrum, from data centers to laptops.

AMDwww.amd.com Apr 3, 2026

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Want to know about new
experiments first?

Subscribe to our Telegram channel — we share all the latest
and exciting updates from NeuraBooks.

Subscribe