Skip to main content

Extraction gives wrong values

The AI extracted a value, but it's incorrect. Here's how to debug and fix it.

Debug: Is it OCR or AI?

When extraction gives wrong values, the error comes from one of two places:

SourceWhat went wrongHow to fix
OCRThe document text wasn't read correctlyImprove document quality
AIText was read correctly, but AI misinterpreted itAdjust extraction model

How to check

Use the PDF viewer's OCR overlay to see exactly what text Moby reads from your document:

  1. Open the document in the PDF viewer
  2. Click the OCR button in the viewer toolbar (or toggle "Show OCR text")
  3. The viewer now shows the recognized text overlaid on the document
  4. Find the value that was extracted incorrectly
  5. Compare what the OCR shows vs. what the document actually says

If OCR is wrong

The OCR layer misread the document. Common symptoms:

  • 0 read as O (zero vs. letter O)
  • 1 read as l or I
  • Numbers jumbled or in wrong order
  • Text garbled or missing characters

Solutions:

  • Use a higher quality scan (300 DPI minimum)
  • Ensure good contrast (dark text on light background)
  • Use native PDFs when possible (instead of scanned images)
  • For stubborn documents, try OCR to Excel which uses a different OCR engine

If OCR is correct but AI extracted wrong

The text was read correctly, but the AI misinterpreted it. Common causes:

  • Ambiguous labels — Multiple similar values in the document
  • Unclear field definitions — AI doesn't know which value you want
  • Unusual document layout — AI confused by formatting

Solutions:

  • Make your extraction model field descriptions more specific
  • Add examples to clarify which value you want
  • Use field hints like "The invoice total at the bottom of the page, not line item amounts"

Common extraction errors

Wrong amount extracted

Multiple amounts appear in the document and AI picked the wrong one.

Fix: Add context to your field description:

Invoice total (the final amount including tax, usually at the bottom)

Date format issues

Date appears correctly in document but extracted in wrong format.

Fix: Specify the expected format in your field:

Invoice date (format: DD/MM/YYYY)

Missing values

Field exists in document but wasn't extracted.

Causes:

  • Value is in an image or watermark (not real text)
  • Value is in a header/footer that OCR missed
  • Field description doesn't match how the value appears

Fix: Check OCR overlay to confirm the text is recognized, then adjust field description.

Extracted from wrong document section

AI pulled value from a different part of the document than intended.

Fix: Be specific about location:

Vendor name (from the "Bill From" section, not the "Bill To" section)

When to contact support

Contact support if:

  • OCR consistently fails on similar document types
  • AI errors persist despite clear field descriptions
  • You see unexpected behavior not covered here