Table extraction (original) (raw)

Table extraction is the process of recognizing and separating a table from a large document, possibly also recognizing individual rows, columns or elements.It may be regarded as a special form of information extraction. Table extractions from webpages can take advantage of the special HTML elements that exist for tables, e.g., the "table" tag,and programming libraries may implement table extraction from webpages.The Python pandas software library can extract tables from HTML webpages via its read_html function.

Property	Value
dbo:abstract	Table extraction is the process of recognizing and separating a table from a large document, possibly also recognizing individual rows, columns or elements.It may be regarded as a special form of information extraction. Table extractions from webpages can take advantage of the special HTML elements that exist for tables, e.g., the "table" tag,and programming libraries may implement table extraction from webpages.The Python pandas software library can extract tables from HTML webpages via its read_html function. More challenging is table extraction from PDFs or scanned images, where there usually is no table-specific machine readable markup.Systems that extract data from tables in scientific PDFs have been described. Wikipedia presents some of its information in tables, and, e.g., 3.5 million tables can be extracted from the English Wikipedia.Some of the tables have a specific format, e.g., the so-called infoboxes.Large-scale table extraction of Wikipedia infoboxes forms one of the sources for DBpedia. Commercial web services for table extraction exist, e.g., Amazon Textract, Google's Document AI, IBM Watson Discovery, and Microsoft Form Recognizer.Open source tools also exist, e.g., PDFFigures 2.0 that has been used in Semantic Scholar.In a comparison published in 2017, the researchers found the proprietary program ABBYY FineReader to yield the best PDF table extraction performance among six different tools evaluated. (en)
dbo:wikiPageID	68522460 (xsd:integer)
dbo:wikiPageLength	2210 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID	1061383696 (xsd:integer)
dbo:wikiPageWikiLink	dbr:Python_(programming_language) dbr:Information_extraction dbc:Artificial_intelligence dbr:English_Wikipedia dbr:Google dbr:Table_(information) dbr:Web_service dbr:Webpage dbr:Wikipedia dbr:Document_AI dbr:ABBYY_FineReader dbr:Amazon_(company) dbr:DBpedia dbr:PDF dbr:Pandas_(software) dbr:HTML_element dbc:Natural_language_processing dbr:IBM dbr:Infobox dbr:Microsoft dbr:Semantic_Scholar dbr:Image_scanner
dbp:wikiPageUsesTemplate	dbt:Scholia
dct:subject	dbc:Artificial_intelligence dbc:Natural_language_processing
rdfs:comment	Table extraction is the process of recognizing and separating a table from a large document, possibly also recognizing individual rows, columns or elements.It may be regarded as a special form of information extraction. Table extractions from webpages can take advantage of the special HTML elements that exist for tables, e.g., the "table" tag,and programming libraries may implement table extraction from webpages.The Python pandas software library can extract tables from HTML webpages via its read_html function. (en)
rdfs:label	Table extraction (en)
owl:sameAs	wikidata:Table extraction https://global.dbpedia.org/id/FnUHQ
prov:wasDerivedFrom	wikipedia-en:Table_extraction?oldid=1061383696&ns=0
foaf:isPrimaryTopicOf	wikipedia-en:Table_extraction
is dbo:wikiPageWikiLink of	dbr:Information_extraction dbr:Semantic_Scholar
is foaf:primaryTopic of	wikipedia-en:Table_extraction