OCR With Google AI (original) (raw)
OCR (Optical Character Recognition)
OCR (Optical Character Recognition) with world-class Google Cloud AI
Extract text and data from images and documents, turn unstructured content into business-ready structured data, and unlock valuable insights.
Integrate OCR functionalities into your applications through APIs.
New customers get $300 in free credits on signup to apply towards document summarizing OCR solutions.
Overview
What is OCR?
Optical Character Recognition (OCR) is a foundational technology behind the conversion of typed, handwritten or printed text from images into machine-encoded text.
What types of OCR does Google Cloud offer?
Google Cloud offers two types of OCR: OCR for documents and OCR for images and videos.
While they share a foundational technology, Document AI is a document understanding platform optimized for document processing. Its Custom Extractor is powered by GenAI that processes both generic and domain-specific documents with higher accuracy and faster, without the need to choose a specialized processor.
Cloud Vision, is commonly used to detect text, handwriting and a wide range of objects from images and videos.
How does OCR work at Google Cloud?
Google Cloud powers OCR with best-in-class AI. It goes beyond traditional text recognition by understanding, organizing and enriching data, ultimately generating business-ready insights.
It gives you the flexibility to either use the OCR tools as a unified suite for streamlined efficiency (e.g. Document AI), or simply call the relevant APIs directly available in Google Cloud console to integrate OCR functionalities into your applications.
How Google Cloud AI and OCR work together?
All the OCR solutions mentioned above give you access to pre-trained ML models that you can deploy right away through an API, or uptrain to improve accuracy for your specific needs.
You can also train your own custom models with AutoML - no machine learning expertise needed.
Check out AutoML documentation on building custom ML models.
Which OCR solution is right for me?
If you are looking to analyze a document, or build an automated document processing pipeline, use Document AI - it takes care of the entire workflow all in one place, from understanding documents to search, store, govern and manage the documents alongside extracted data.
If you want to analyze and process images, use Cloud Vision alongside other Google Cloud products for best results - check the Common Uses section for details and quickstart guides.
Both APIs are free to try with a Google Cloud account.
Compare OCR offerings
OCR offering | Best for | Key features | |
---|---|---|---|
Cloud Vision API | General text-extraction use cases that require low latency and high capacity. | Pre-built features like image labeling, face & landmark detection, OCR, safe search. | |
Document AI | Enterprise Document OCR | Digitize text from documents (PDFs, scanned documents as images, or Microsoft DocX files). | Extract text in 200+ languages, 50 handwritten languages.Add-ons to recognize math formulas, styles, etc. |
Document AI Workbench | Extract, classify and split any documents with generative ai (foundational models) | Custom Extractor: uses foundational models to quickly create parsers without extensive data labeling or training.Custom classifier and document splitter for efficient processing. | |
Pretrained models | Text and field extraction from domain-specific documents. | Text extraction and digitization across a variety of procurement, lending, identity and contractual documents. |
Best for
General text-extraction use cases that require low latency and high capacity.
Key features
Pre-built features like image labeling, face & landmark detection, OCR, safe search.
Best for
Digitize text from documents (PDFs, scanned documents as images, or Microsoft DocX files).
Key features
Extract text in 200+ languages, 50 handwritten languages.
Add-ons to recognize math formulas, styles, etc.
Best for
Extract, classify and split any documents with generative ai (foundational models)
Key features
Custom Extractor: uses foundational models to quickly create parsers without extensive data labeling or training.
Custom classifier and document splitter for efficient processing.
Best for
Text and field extraction from domain-specific documents.
Key features
How It Works
To understand and process documents, use Document AI.
For images, we recommend using Cloud Vision**.**
Both give you access to pre-trained ML models that you can deploy as-is through APIs or uptrain. You can also train your own custom models from scratch with AutoML - no ML expertise needed.
First 1000 units every month are free when you use Cloud Vision or Document OCR - try it with a simple API call.
How Cloud Vision recognizes and classifies images
Demo
See Document OCR in action with your own documents
Try the Document AI API with a simple drag-and-drop.
Common Uses
Build an end-to-end document solution
How-tos
Build a document processing and understanding pipeline
Powered by GenAI, Document AI delivers great accuracy in extracting data from documents of varying layouts and quality. You can connect it with Cloud Storage so your unstructured documents have enterprise-grade compliance. BigQuery helps batch process and analyze the extracted data any way you like. With Looker, you can easily build visualizations based on your BigQuery tables. Vertex AI Search enables you to query and search your documents in Cloud Storage, conversationally or traditionally.
It takes 60-90 minutes to set up the entire pipeline as seen, the Document AI portion takes 10 minutes.
How-tos
Build a document processing and understanding pipeline
Powered by GenAI, Document AI delivers great accuracy in extracting data from documents of varying layouts and quality. You can connect it with Cloud Storage so your unstructured documents have enterprise-grade compliance. BigQuery helps batch process and analyze the extracted data any way you like. With Looker, you can easily build visualizations based on your BigQuery tables. Vertex AI Search enables you to query and search your documents in Cloud Storage, conversationally or traditionally.
It takes 60-90 minutes to set up the entire pipeline as seen, the Document AI portion takes 10 minutes.
Image tagging, processing and search
How-tos
Additional resources
Use Cloud Vision API and AutoML to tag and process images
Image tagging is also referred to as image labeling.
Cloud Vision API can identify and label general objects, landmarks, locations, logos, activities, animal species, products, and more in an image. Once the images are tagged with the detected labels, image search, processing and management are automated and easier.
If you need targeted custom labels, use Cloud AutoML to train a custom ML model.
To use Google OCR technologies on premise, use OCR On-Prem, available in the Cloud Marketplace.
Deploy in console: Event-driven image processing using Cloud Functions and Cloud Vision
Skills Boost labs: Image processing
How-to guides: Cloud Vision API
How-tos
Use Cloud Vision API and AutoML to tag and process images
Image tagging is also referred to as image labeling.
Cloud Vision API can identify and label general objects, landmarks, locations, logos, activities, animal species, products, and more in an image. Once the images are tagged with the detected labels, image search, processing and management are automated and easier.
If you need targeted custom labels, use Cloud AutoML to train a custom ML model.
To use Google OCR technologies on premise, use OCR On-Prem, available in the Cloud Marketplace.
Deploy in console: Event-driven image processing using Cloud Functions and Cloud Vision
Skills Boost labs: Image processing
How-to guides: Cloud Vision API
Additional resources
Pricing example
To run a basic image processing pipeline that detects labels as shown on the right, your monthly cost would be $27.36.
You can check the usage assumptions made to arrive at this number in the pricing calculator.
First 1,000 units per month is free.
Pricing
How much does my use case cost? | Understand your monthly cost to solve for a use case, with products you need and key usage assumptions laid out. | ||
---|---|---|---|
Use case | Products used | Usage assumptions | Estimated monthly cost (USD) |
Image tagging, processing and search | Cloud VisionCloud StoragePub/SubCloud Run | 1. 15,000 Cloud Vision label detection API calls monthly2. 100 GiB monthly storage3. One 1.25 GiB CPU4. Four GiB published daily through Pub/SubSee calculation details in calculator | $27.36 |
Extract text and insights from documents | Document AICloud StorageBigQueryCloud Functions | 1. 1,000 Document AI form parser API calls monthly2. 100 GiB monthly storage3. 1 TiB monthly queries4. RAM: 512 MB, CPU: 800 MHzSee calculation details in calculator | $71.87 |
Extract text from images | Cloud VisionCloud StoragePub/SubCloud Run | 1. 15,000 Cloud Vision OCR API calls monthly2. 100 GiB monthly storage3. One 1.25 GiB CPU4. Four GiB published daily through Pub/SubSee calculation details in calculator | $27.36 |
How much does my use case cost?
Understand your monthly cost to solve for a use case, with products you need and key usage assumptions laid out.
Image tagging, processing and search
Products used
Cloud Vision
Cloud Storage
Pub/Sub
Cloud Run
Usage assumptions
Estimated monthly cost (USD)
$27.36
Extract text and insights from documents
Products used
Document AI
Cloud Storage
BigQuery
Cloud Functions
Usage assumptions
Estimated monthly cost (USD)
$71.87
Products used
Cloud Vision
Cloud Storage
Pub/Sub
Cloud Run
Usage assumptions
Estimated monthly cost (USD)
$27.36
Pricing Calculator
Estimate the cost of your project by pulling in all the tools you need in a single place.
Custom Quote
Connect with our sales team to get a custom quote for your organization's unique needs.
Start your proof of concept
New customers get up to $300 in free credits to try Google Cloud products
Have a large project?
See code samples for OCR solutions and use cases
Learn how to detect labels with Cloud Vision API
Learn how to automate a doc processing pipeline with Google AI