Features list (original) (raw)

Documentation
- Guides
- Reference
- Samples
- Support
- Resources
Technology areas
Cross-product tools
Related sites
Console
Contact Us
Start free
Vision API
Product overview
Features list
Try it!
Basics and quickstarts
All Quickstarts
Setup and cleanup
Authentication
Use client libraries
Use the command line
Use the API explorer
Samples
All Vision API code samples
Code samples for all products
How-to Guides
All How-to guides
Before you begin
Detect crop hints
Detect faces
Detect image properties
Detect labels
Detect landmarks
Detect logos
Detect multiple objects
Detect explicit content (SafeSearch)
Detect Web entities and pages
Using Vision with Spring framework
Base64 encode
Tutorials
All tutorials
Crop hints tutorial
Dense document text detection tutorial
Face detection tutorial
Web detection tutorial
Detect and translate image text with Cloud Storage, Vision, Translation, Cloud Functions, and Pub/Sub
Translating and speaking text from a photo
Codelab: Use the Vision API with C# (label, text/OCR, landmark, and face detection)
Codelab: Use the Vision API with Python (label, text/OCR, landmark, and face detection)
Sample applications
Monitoring and security
Cloud audit logs

Features list

Stay organized with collections Save and categorize content based on your preferences.

Vision API currently allows you to use the following features:

All feature types
Text detection	Optical character recognition (OCR) for an image; text recognition and conversion to machine-coded text. Identifies and extracts UTF-8 text in an image. Images: Optimized for sparse areas of text within a larger image. Response: Returns both a list of words identifed with text, bounding boxes, and textAnnotations, as well as the structural hierarchy for the OCR detected text (fullTextAnnotation). Hierarchy of extracted text structure: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol. Each structural component from Page on may further have their own properties such as detected languages, breaks, etc. Languages supported: Works with currentlysupported, mapped, and experimental languages. Feature enum value: TEXT_DETECTION.
Document text detection (dense text / handwriting)	Optical character recognition (OCR) for a file (PDF/TIFF) or dense text image; dense text recognition and conversion to machine-coded text. Files: Optimized for document files (PDF/TIFF). Images: Optimized for *dense* areas of text in an image (images that are documents), and images that contain handwriting. Response: Returns the structural hierarchy for the OCR detected text (fullTextAnnotation). Hierarchy of extracted text structure: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol. Each structural component from Page on may further have their own properties such as detected languages, breaks, etc. Languages supported: Works with currentlysupported, mapped, and experimental languages. Feature enum value: DOCUMENT_TEXT_DETECTION. Takes precedence when bothDOCUMENT_TEXT_DETECTION and TEXT_DETECTION are requested.
Landmark detection 1	Provides the name of the landmark, a confidence score and a bounding box in the image for the landmark. Gives coordinates for the detected entity.
Logo detection 2	Provides a textual description of the entity identified, a confidence score, and a bounding polygon for the logo in the file.
Label detection 3	Provides generalized labels for an image. For each label returns a textual description, confidence score, and topicality rating.
Image properties 4	Returns dominant colors in an image. Each color is represented in the RGBA color space, has a confidence score, and displays the fraction of pixels occupied by the color [0, 1].
Object localization 5	Provides general label and bounding box annotations for multiple objects recognized in a single image. For each object detected the following elements are returned: a textual description, a confidence score, and normalized vertices [0,1] for the bounding polygon around the object.
Crop hint detection 6	Provides a bounding polygon for the cropped image, a confidence score, and an importance fraction of this salient region with respect to the original image for each request. You can provide up to 16 image ratio values (width:height) for a single image.
Web entities and pages 7	Provides a series of related Web content to an image. Returns the following information: Web entities: Inferred entities (labels/descriptions) from similar images on the Web. Full matching images: A list of URLs for fully matching images of any size on the Internet. Partial matching images: A list of URLs for images that share key-point features, such as a cropped version of the original image. Pages with matching images: A list of Webpages (identified by page URL, page title, matching image URL) with an image that satisfies the conditions described above. Visually similar images: A list of URLs for images that share some features with the original image. Best guess label: A best guess as to the topic of the requested image inferred from similar images on the Internet.
Explicit content detection (SafeSearch)	Provides likelihood ratings for the following explicit content categories: adult, spoof, medical, violence, and racy. Likelihoods ratings are expressed as 6 different values: UNKNOWN, VERY_UNLIKELY,UNLIKELY, POSSIBLE, LIKELY, orVERY_LIKELY.
Face detection	Locates faces with bounding polygons, and identifies specific facial "landmarks" such as eyes, ears, nose, mouth, etc. along with their corresponding confidence values. Returns likelihood ratings for emotion (joy, sorrow, anger, surprise) and general image properties (underexposed, blurred, headwear present). Likelihoods ratings are expressed as 6 different values: UNKNOWN, VERY_UNLIKELY,UNLIKELY, POSSIBLE, LIKELY, orVERY_LIKELY. Specific individualFacial Recognition is not supported.

1. Image credit:Nikolay Vorobyev onUnsplash (annotations added).↩

2. Image credit:Robert Scoble(CC BY 2.0, annotation added).↩

3. Image credit:Alex Knight on Unsplash.↩

4. Image credit:Jeremy Bishop on Unsplash.↩

5. Image credit: Bogdan Dada on Unsplash(annotations added).↩

6. Image credit: Yasmin Dangor on Unsplash (original and cropped image shown).↩

7. Image credit:Quinten de Graaf onUnsplash.↩

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-06-12 UTC.