About Document Processing Benchmark

Document Processing Benchmark is a tool for comparing how different AI models and OCR engines extract text from documents. Upload a document, describe what you expect to find, and see how each contestant performs — scored, ranked, and compared side-by-side.

How It Works

1 Upload

You upload a document (image or PDF) and describe what text you expect to be extracted.

2 Process

Each contestant processes your document independently and in parallel. You can watch progress in real-time via streaming updates.

3 Score

An LLM evaluator scores each contestant's output (0–100) against your expectations, providing reasoning for each score.

4 Compare

Results appear on a leaderboard with aggregate statistics. You can drill into individual runs, view step-by-step processing details, and compare outputs side-by-side.

Scoring Methodology

Overall Score (0–100)

Each contestant output is scored by an LLM evaluator (currently Qwen 2.5 72B) against the user's expectations. The evaluator considers completeness, accuracy, formatting, and relevant detail extraction.

Scores are displayed as X.X/10 for readability.

Green: 8.0+ Yellow: 4.0–7.9 Red: below 4.0

Speed Index (0–100)

Measures how fast a contestant is relative to others in the field.

100 = fastest average duration, 0 = slowest.

100 × (1 - (avg_duration - min_avg) / (max_avg - min_avg))

Resource Index (0–100)

Estimates computational efficiency as a weighted composite:

Average processing duration: 40%
Model parameter count: 30%
Estimated API cost per call: 30%

Higher = more efficient. Missing factors are excluded and weights redistributed.

Contestants

Tesseract OCR

Open-source OCR engine via tesseract.js

Cost per call Free

Source View

View Profile

Qwen Vision

Qwen 2.5 VL vision-language model for document understanding

Parameters 72B

Cost per call $0.0030

Source View

View Profile

Gemma 4 26B

Google Gemma 4 26B vision-language model (MoE, 4B active)

Parameters 26B

Cost per call $0.0010

Source View

View Profile

Gemma 4 31B

Google Gemma 4 31B vision-language model

Parameters 31B

Cost per call $0.0015

Source View

View Profile

Gemma 3 12B

Google Gemma 3 12B vision-language model

Parameters 12B

Cost per call $0.0005

Source View

View Profile

Gemma 3 4B

Google Gemma 3 4B vision-language model

Parameters 4B

Cost per call $0.0002

Source View

View Profile

GPT-4o Mini

OpenAI GPT-4o Mini vision-language model

Parameters 8B

Cost per call $0.0002

Source View

View Profile

Claude 3.5 Haiku

Anthropic Claude 3.5 Haiku vision-language model

Parameters 8B

Cost per call $0.0003

Source View

View Profile

Llama 3.2 Vision 11B

Meta Llama 3.2 Vision 11B instruction-tuned model

Parameters 11B

Cost per call $0.0002

Source View

View Profile

Pixtral 12B

Mistral Pixtral 12B vision-language model

Parameters 12B

Cost per call $0.0002

Source View

View Profile

Supported Document Types

Images

PNG, JPEG, GIF, WebP

Documents

PDF (multi-page supported)

Max file size: 20 MB

Validation: Files are validated by magic bytes, not MIME type headers

Limitations & Caveats

Subjective scoring: Scores depend on an LLM evaluator which may have biases or inconsistencies.
Expectation quality matters: Vague expectations produce less meaningful scores.
Network variability: Duration measurements include network latency for cloud-based contestants.
No ground truth: Scores measure alignment with expectations, not absolute accuracy.
Contestant set: Currently limited to 10 contestants — more may be added over time.

FAQ

← Back to Leaderboard