Features list (original) (raw)
Vision API
Basics and quickstarts
Samples
How-to Guides
Tutorials
Codelab: Use the Vision API with C# (label, text/OCR, landmark, and face detection)
Codelab: Use the Vision API with Python (label, text/OCR, landmark, and face detection)
Monitoring and security
Features list
Stay organized with collections Save and categorize content based on your preferences.
Vision API currently allows you to use the following features:
All feature types | |
---|---|
Text detection ![]() |
Optical character recognition (OCR) for an image; text recognition and conversion to machine-coded text. Identifies and extracts UTF-8 text in an image. Images: Optimized for sparse areas of text within a larger image. Response: Returns both a list of words identifed with text, bounding boxes, and textAnnotations, as well as the structural hierarchy for the OCR detected text (fullTextAnnotation). Hierarchy of extracted text structure: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol. Each structural component from Page on may further have their own properties such as detected languages, breaks, etc. Languages supported: Works with currentlysupported, mapped, and experimental languages. Feature enum value: TEXT_DETECTION. |
Document text detection (dense text / handwriting) ![]() ![]() |
Optical character recognition (OCR) for a file (PDF/TIFF) or dense text image; dense text recognition and conversion to machine-coded text. Files: Optimized for document files (PDF/TIFF). Images: Optimized for dense areas of text in an image (images that are documents), and images that contain handwriting. Response: Returns the structural hierarchy for the OCR detected text (fullTextAnnotation). Hierarchy of extracted text structure: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol. Each structural component from Page on may further have their own properties such as detected languages, breaks, etc. Languages supported: Works with currentlysupported, mapped, and experimental languages. Feature enum value: DOCUMENT_TEXT_DETECTION. Takes precedence when bothDOCUMENT_TEXT_DETECTION and TEXT_DETECTION are requested. |
Landmark detection 1 ![]() |
Provides the name of the landmark, a confidence score and a bounding box in the image for the landmark. Gives coordinates for the detected entity. |
Logo detection 2 ![]() |
Provides a textual description of the entity identified, a confidence score, and a bounding polygon for the logo in the file. |
Label detection 3 ![]() |
Provides generalized labels for an image. For each label returns a textual description, confidence score, and topicality rating. |
Image properties 4 ![]() |
Returns dominant colors in an image. Each color is represented in the RGBA color space, has a confidence score, and displays the fraction of pixels occupied by the color [0, 1]. |
Object localization 5 ![]() |
Provides general label and bounding box annotations for multiple objects recognized in a single image. For each object detected the following elements are returned: a textual description, a confidence score, and normalized vertices [0,1] for the bounding polygon around the object. |
Crop hint detection 6 ![]() |
Provides a bounding polygon for the cropped image, a confidence score, and an importance fraction of this salient region with respect to the original image for each request. You can provide up to 16 image ratio values (width:height) for a single image. |
Web entities and pages 7 ![]() |
Provides a series of related Web content to an image. Returns the following information: Web entities: Inferred entities (labels/descriptions) from similar images on the Web. Full matching images: A list of URLs for fully matching images of any size on the Internet. Partial matching images: A list of URLs for images that share key-point features, such as a cropped version of the original image. Pages with matching images: A list of Webpages (identified by page URL, page title, matching image URL) with an image that satisfies the conditions described above. Visually similar images: A list of URLs for images that share some features with the original image. Best guess label: A best guess as to the topic of the requested image inferred from similar images on the Internet. |
Explicit content detection (SafeSearch) | Provides likelihood ratings for the following explicit content categories: adult, spoof, medical, violence, and racy. Likelihoods ratings are expressed as 6 different values: UNKNOWN, VERY_UNLIKELY,UNLIKELY, POSSIBLE, LIKELY, orVERY_LIKELY. |
Face detection ![]() |
Locates faces with bounding polygons, and identifies specific facial "landmarks" such as eyes, ears, nose, mouth, etc. along with their corresponding confidence values. Returns likelihood ratings for emotion (joy, sorrow, anger, surprise) and general image properties (underexposed, blurred, headwear present). Likelihoods ratings are expressed as 6 different values: UNKNOWN, VERY_UNLIKELY,UNLIKELY, POSSIBLE, LIKELY, orVERY_LIKELY. Specific individualFacial Recognition is not supported. |
1. Image credit:Nikolay Vorobyev onUnsplash (annotations added).↩
2. Image credit:Robert Scoble(CC BY 2.0, annotation added).↩
3. Image credit:Alex Knight on Unsplash.↩
4. Image credit:Jeremy Bishop on Unsplash.↩
5. Image credit: Bogdan Dada on Unsplash(annotations added).↩
6. Image credit: Yasmin Dangor on Unsplash (original and cropped image shown).↩
7. Image credit:Quinten de Graaf onUnsplash.↩
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-06-12 UTC.