Features list (original) (raw)

Skip to main content

Features list

Stay organized with collections Save and categorize content based on your preferences.

Vision API currently allows you to use the following features:

All feature types
Text detection Road sign image Optical character recognition (OCR) for an image; text recognition and conversion to machine-coded text. Identifies and extracts UTF-8 text in an image. Images: Optimized for sparse areas of text within a larger image. Response: Returns both a list of words identifed with text, bounding boxes, and textAnnotations, as well as the structural hierarchy for the OCR detected text (fullTextAnnotation). Hierarchy of extracted text structure: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol. Each structural component from Page on may further have their own properties such as detected languages, breaks, etc. Languages supported: Works with currentlysupported, mapped, and experimental languages. Feature enum value: TEXT_DETECTION.
Document text detection (dense text / handwriting) Dense image with annotations handwriting image Optical character recognition (OCR) for a file (PDF/TIFF) or dense text image; dense text recognition and conversion to machine-coded text. Files: Optimized for document files (PDF/TIFF). Images: Optimized for dense areas of text in an image (images that are documents), and images that contain handwriting. Response: Returns the structural hierarchy for the OCR detected text (fullTextAnnotation). Hierarchy of extracted text structure: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol. Each structural component from Page on may further have their own properties such as detected languages, breaks, etc. Languages supported: Works with currentlysupported, mapped, and experimental languages. Feature enum value: DOCUMENT_TEXT_DETECTION. Takes precedence when bothDOCUMENT_TEXT_DETECTION and TEXT_DETECTION are requested.
Landmark detection 1 St Basil's Cathedral image Provides the name of the landmark, a confidence score and a bounding box in the image for the landmark. Gives coordinates for the detected entity.
Logo detection 2 annotated logo Provides a textual description of the entity identified, a confidence score, and a bounding polygon for the logo in the file.
Label detection 3 Shanghai street image Provides generalized labels for an image. For each label returns a textual description, confidence score, and topicality rating.
Image properties 4 Bali image with properties Returns dominant colors in an image. Each color is represented in the RGBA color space, has a confidence score, and displays the fraction of pixels occupied by the color [0, 1].
Object localization 5 image with bounding boxes Provides general label and bounding box annotations for multiple objects recognized in a single image. For each object detected the following elements are returned: a textual description, a confidence score, and normalized vertices [0,1] for the bounding polygon around the object.
Crop hint detection 6 image with cropped version Provides a bounding polygon for the cropped image, a confidence score, and an importance fraction of this salient region with respect to the original image for each request. You can provide up to 16 image ratio values (width:height) for a single image.
Web entities and pages 7 image with web entities table Provides a series of related Web content to an image. Returns the following information: Web entities: Inferred entities (labels/descriptions) from similar images on the Web. Full matching images: A list of URLs for fully matching images of any size on the Internet. Partial matching images: A list of URLs for images that share key-point features, such as a cropped version of the original image. Pages with matching images: A list of Webpages (identified by page URL, page title, matching image URL) with an image that satisfies the conditions described above. Visually similar images: A list of URLs for images that share some features with the original image. Best guess label: A best guess as to the topic of the requested image inferred from similar images on the Internet.
Explicit content detection (SafeSearch) Provides likelihood ratings for the following explicit content categories: adult, spoof, medical, violence, and racy. Likelihoods ratings are expressed as 6 different values: UNKNOWN, VERY_UNLIKELY,UNLIKELY, POSSIBLE, LIKELY, orVERY_LIKELY.
Face detection sample image with face detection Locates faces with bounding polygons, and identifies specific facial "landmarks" such as eyes, ears, nose, mouth, etc. along with their corresponding confidence values. Returns likelihood ratings for emotion (joy, sorrow, anger, surprise) and general image properties (underexposed, blurred, headwear present). Likelihoods ratings are expressed as 6 different values: UNKNOWN, VERY_UNLIKELY,UNLIKELY, POSSIBLE, LIKELY, orVERY_LIKELY. Specific individualFacial Recognition is not supported.

1. Image credit:Nikolay Vorobyev onUnsplash (annotations added).

2. Image credit:Robert Scoble(CC BY 2.0, annotation added).

3. Image credit:Alex Knight on Unsplash.

4. Image credit:Jeremy Bishop on Unsplash.

5. Image credit: Bogdan Dada on Unsplash(annotations added).

6. Image credit: Yasmin Dangor on Unsplash (original and cropped image shown).

7. Image credit:Quinten de Graaf onUnsplash.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-06-12 UTC.