Extract Text from PDFs and Images

Extract clean Markdown and structured OCR JSON from PDFs and images. Run it manually, call it over HTTP, or use it as an MCP tool without bringing your own OCR provider key.

Extract Text from PDFs and Images cover showing PDFs and images passing through a scanner into Markdown, JSON, integration, and resource artifact outputs

Created by Chris Moen • Version 6 • 9 steps

Use this app

What you get

  • Extract clean Markdown and structured OCR JSON from PDFs and images. Run it manually, call it over HTTP, or use it as an MCP tool without bringing your own OCR provider key.

Integrations

  • mistral
  • ocr-api
  • Ocr Api

How it works

  • 9 workflow steps run in Breyta to produce the app result.

Extract clean text from PDFs and images

Extract Text from PDFs and Images turns PDFs and images into clean Markdown plus structured OCR JSON. Use it when you need document text that can feed another Breyta flow, an HTTP integration, or an MCP-connected assistant.

What it does

  • Accepts an uploaded document resource, a public PDF URL, or a public image URL
  • Runs OCR on the document
  • Preserves tables in Markdown when the provider returns table structure
  • Returns a readable OCR summary with page, table, image, and hyperlink counts
  • Persists the extracted Markdown and raw OCR JSON as retained Breyta resources

Ways to run it

  • Run it manually from Breyta for one-off document extraction
  • Call the HTTP endpoint from another system
  • Use the MCP tool from agents and assistants
  • Chain it as a reusable step before summarization, review, classification, or data extraction

Inputs

Provide one of these:

  • A PDF or image file resource
  • A public PDF URL
  • A public image URL

Optional settings let you choose table format, include or skip headers and footers, include extracted image base64, and request page or word confidence scores.

Output

Each run returns a concise result with:

  • Extracted Markdown preview
  • Page and content counts
  • Structured OCR metadata
  • Retained Markdown and JSON artifacts for downstream use

Notes

  • Trial users get 3 free runs on the Starter plan
  • Best results come from clear scans, native PDFs, screenshots, and document images
  • The app uses an author-provided OCR API connection, so installers do not need to bring their own OCR provider key

FAQ

What does this OCR app do?

This app extracts text from PDF documents and image files, converting them into clean Markdown or structured JSON. It handles Optical Character Recognition (OCR) tasks so you can turn static documents into editable, structured data.

Which OCR providers and services does the tool use?

The app uses Mistral and a dedicated OCR API to process your documents. You don't need to provide your own API keys for these services, as everything you need is built into the tool.

How do I integrate this document extraction into my workflow?

You can run the app manually for one-off tasks, call it over HTTP for automated workflows, or use it as a Model Context Protocol (MCP) tool. It provides flexibility for both non-technical people and developers who need programmatic access.

How do I set up the app and start extracting text?

There's no complex setup because the OCR provider keys are already included. You can start extracting text immediately after installation by uploading a file or connecting the app to your existing tools via the HTTP API.