Skip to main content

OCR MCP

In one line: Let your UnleashX agent read text from images, screenshots, business cards, and scanned documents.
CategoryAI & Media
AuthenticationPlatform-managed
Setup time~1 minute
DifficultyEasy
Best forReading documents, business cards, receipts, and screenshots inside an agent flow

1. Overview

OCR (Optical Character Recognition) turns pictures of text into actual, usable text. Point it at a photo of a receipt, a scanned contract, or a screenshot and it returns the words it finds — preserving structure, line breaks, and tables as much as possible. Once connected, your agent can read text from a URL, a local file, or a base64 image, batch-process a whole folder of images, and even return word-level positions and confidence scores when you need structured output. It runs OpenAI Vision under the hood, so it is multilingual by default. Connecting OCR to UnleashX means your voice and automation agents can act on what they “see” — extracting a phone number from a business card, reading an invoice total, or capturing a form field — without any manual data entry.

2. What you’ll need

OCR is built into UnleashX. There is no third-party account to create and no API key for you to manage.
  • An active UnleashX account.
  • The OCR feature enabled on your workspace/plan.
  • Permission to edit the agent you want to add OCR to (admin or editor role). If you don’t have that access, ask a workspace admin to enable it for you.

3. Get your credentials

There are no credentials to create. OCR is a platform-managed integration — UnleashX provisions and rotates the underlying OpenAI Vision keys for you. You never see or handle an API key.
If you are a workspace admin, the only optional platform-side configuration is the vision model used for extraction (UnleashX sets a sensible default, e.g. a GPT-4o-class vision model). Most teams never need to change this.
Platform-managed settingPlain-English reason it exists
Vision modelControls quality/speed of text extraction. UnleashX picks a good default.
API keyProvisioned and rotated by UnleashX so every agent can read images securely.

4. Connect on UnleashX

1

Open your agent

Go to https://www.tryunleashx.com and open the agent you want to give OCR to.
2

Open Data Connectors

In the agent, click Data Connectors.
3

Find OCR and add it

Locate OCR in the list and click Connect / Add. Because it’s platform-managed, there is no key to paste — it activates immediately.
4

Confirm it's connected

The OCR tool should now show a Connected badge. Your agent can start reading images right away.

5. Available tools

01 ocr tools
ToolWhat it doesChanges data?
Extract TextExtract plain text from an image given by path, URL, or base64No
From URLDownload an image from a URL and extract its textNo
From FileExtract text from a local image fileNo
Extract With BoxesExtract text plus bounding boxes and confidence detailsNo
Batch ExtractExtract text from every image in a directoryNo
Supported LanguagesList the languages OCR can readNo
Get InfoReport OCR server status and configurationNo
Preprocess ImageClean up an image (resize, grayscale, threshold, blur) before extractionNo
Every OCR tool is read-only — it reads pixels and returns text. Preprocess Image writes a temporary working file but does not modify your original image or any stored data.

6. Example usage

“Read this business card and pull out the name, company, and phone number.” → Runs From URL (or Extract Text) to extract the text, then the agent parses the fields. “Go through every receipt image in this folder and give me the totals.” → Runs Batch Extract across the directory and the agent reads each total from the returned text.

7. Permissions & data access

UnleashX can:
  • Read images you provide (by URL, file path, or base64) and return the extracted text.
  • Optionally return word positions and confidence scores.
  • Temporarily process images in memory or in a short-lived temp file.
UnleashX cannot:
  • Edit, delete, or store your original images permanently.
  • Access images you don’t explicitly pass to a tool.
  • Browse your device or cloud storage on its own.
To disconnect: Open the agent → Data ConnectorsOCRDisconnect. Access is revoked immediately.

8. Troubleshooting

ProblemWhat it meansHow to fix it
”Could not load image from URL”The URL isn’t a direct image link or isn’t publicUse a direct .jpg/.png URL that’s publicly reachable
401 / credential errorThe platform-managed vision key is unavailableThis is on UnleashX’s side — contact cs@unleashx.ai
403 / feature not enabledOCR isn’t enabled on your planAsk a workspace admin or contact support to enable it
Empty or partial textImage is low-resolution, blurry, or rotatedRun Preprocess Image first, or supply a clearer image
”Pillow is required”An image library is missing on the serverPlatform-side issue — contact support
For general connector issues, see /mcp/integrations.

9. Frequently asked questions

Is my image data stored? No. Images are processed to extract text and are not retained as part of the integration. Any temp files created during preprocessing are cleaned up automatically. Do I need an OpenAI account or key? No. OCR is platform-managed — UnleashX provides and rotates the underlying vision keys. Which languages are supported? Many, automatically. Call Supported Languages for the current list (English, Spanish, French, German, Chinese, Japanese, Arabic, Hindi, and more). Can multiple team members use it? Yes. Once enabled on the workspace, anyone with access to the agent can use OCR.

10. References