Run Results
The name Erika Mustermann should be contained.
Best Score
9.5/10
Gemma 3 12B
Average Score
5.0/10
Identity Document
Output Scoring Prompt
You are an impartial judge evaluating document processing output.
The user uploaded a document and described what they expect to be extracted:
"{{expectation}}"
A contestant produced the following output:
"{{output}}"
Score the output from 0 to 100 based on how well it matches the expectation.
Return ONLY a JSON object with this exact format:
{"score": <0-100>, "reasoning": "<brief explanation>"} Default
Step Scoring Prompt
You are an impartial judge evaluating one processing step of a document extraction pipeline.
The user expects the final output to contain:
"{{expectation}}"
This is the output of the "{{stepName}}" step:
"{{output}}"
Score how well this step's output contributes toward the expected result, from 0 to 100.
Return ONLY a JSON object with this exact format:
{"score": <0-100>, "reasoning": "<brief explanation>"} Default
Total time: 00:18
| Status | |||||
|---|---|---|---|---|---|
| 🥇 | Gemma 3 12B | 9.5/10 | 2.2s | Done | ▶ |
| 🥈 | Gemma 4 26B | 9.5/10 | 5.8s | Done | ▶ |
| 🥉 | Gemma 4 31B | 9.5/10 | 17.6s | Done | ▶ |
| #4 | Qwen Vision | 9.5/10 | 2.9s | Done | ▶ |
| #5 | Tesseract OCR | 7.0/10 | 1.0s | Done | ▶ |
| #6 | Llama 3.2 Vision 11B | 3.0/10 | 1.9s | Done | ▶ |
| #7 | Gemma 3 4B | 2.0/10 | 1.9s | Done | ▶ |
| #8 | GPT-4o Mini | 0.0/10 | 2.0s | Done | ▶ |
| #9 | Claude 3.5 Haiku | - | 0.4s | Error | ▶ |
| #10 | Pixtral 12B | - | 0.4s | Error | ▶ |