Block - Amazon Textract (original) (raw)

A Block represents items that are recognized in a document within a group of pixels close to each other. The information returned in a Block object depends on the type of operation. In text detection for documents (for example DetectDocumentText), you get information about the detected words and lines of text. In text analysis (for example AnalyzeDocument), you can also get information about the fields, tables, and selection elements that are detected in the document.

An array of Block objects is returned by both synchronous and asynchronous operations. In synchronous operations, such as DetectDocumentText, the array of Block objects is the entire set of results. In asynchronous operations, such as GetDocumentAnalysis, the array is returned over one or more responses.

For more information, see How Amazon Textract Works.

Contents

The type of text item that's recognized. In operations for text detection, the following types are returned:

In text analysis operations, the following types are returned:

The following BlockTypes are only returned for Amazon Textract Layout.

Type: String

Valid Values: KEY_VALUE_SET | PAGE | LINE | WORD | TABLE | CELL | SELECTION_ELEMENT | MERGED_CELL | TITLE | QUERY | QUERY_RESULT | SIGNATURE | TABLE_TITLE | TABLE_FOOTER | LAYOUT_TEXT | LAYOUT_TITLE | LAYOUT_HEADER | LAYOUT_FOOTER | LAYOUT_SECTION_HEADER | LAYOUT_PAGE_NUMBER | LAYOUT_LIST | LAYOUT_FIGURE | LAYOUT_TABLE | LAYOUT_KEY_VALUE

Required: No

ColumnIndex

The column in which a table cell appears. The first column position is 1.ColumnIndex isn't returned by DetectDocumentText andGetDocumentTextDetection.

Type: Integer

Valid Range: Minimum value of 0.

Required: No

ColumnSpan

The number of columns that a table cell spans. ColumnSpan isn't returned byDetectDocumentText and GetDocumentTextDetection.

Type: Integer

Valid Range: Minimum value of 0.

Required: No

The confidence score that Amazon Textract has in the accuracy of the recognized text and the accuracy of the geometry points around the recognized text.

Type: Float

Valid Range: Minimum value of 0. Maximum value of 100.

Required: No

The type of entity.

The following entity types can be returned by FORMS analysis:

The following entity types can be returned by TABLES analysis:

EntityTypes isn't returned by DetectDocumentText andGetDocumentTextDetection.

Type: Array of strings

Valid Values: KEY | VALUE | COLUMN_HEADER | TABLE_TITLE | TABLE_FOOTER | TABLE_SECTION_TITLE | TABLE_SUMMARY | STRUCTURED_TABLE | SEMI_STRUCTURED_TABLE

Required: No

The location of the recognized text on the image. It includes an axis-aligned, coarse bounding box that surrounds the text, and a finer-grain polygon for more accurate spatial information.

Type: Geometry object

Required: No

The identifier for the recognized text. The identifier is only unique for a single operation.

Type: String

Pattern: .*\S.*

Required: No

The page on which a block was detected. Page is returned by synchronous and asynchronous operations. Page values greater than 1 are only returned for multipage documents that are in PDF or TIFF format. A scanned image (JPEG/PNG) provided to an asynchronous operation, even if it contains multiple document pages, is considered a single-page document. This means that for scanned images the value of Page is always 1.

Type: Integer

Valid Range: Minimum value of 0.

Required: No

Type: Query object

Required: No

A list of relationship objects that describe how blocks are related to each other. For example, a LINE block object contains a CHILD relationship type with the WORD blocks that make up the line of text. There aren't Relationship objects in the list for relationships that don't exist, such as when the current block has no child blocks.

Type: Array of Relationship objects

Required: No

The row in which a table cell is located. The first row position is 1.RowIndex isn't returned by DetectDocumentText andGetDocumentTextDetection.

Type: Integer

Valid Range: Minimum value of 0.

Required: No

The number of rows that a table cell spans. RowSpan isn't returned byDetectDocumentText and GetDocumentTextDetection.

Type: Integer

Valid Range: Minimum value of 0.

Required: No

The selection status of a selection element, such as an option button or check box.

Type: String

Valid Values: SELECTED | NOT_SELECTED

Required: No

The word or line of text that's recognized by Amazon Textract.

Type: String

Required: No

The kind of text that Amazon Textract has detected. Can check for handwritten text and printed text.

Type: String

Valid Values: HANDWRITING | PRINTED

Required: No

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: