Document-term matrix (original) (raw)

Терм-документная матрица представляет собой математическую матрицу, описывающую частоту терминов, которые встречаются в коллекции документов. В терм-документной матрице строки соответствуют документам в коллекции, а столбцы соответствуют терминам. Существуют различные схемы для определения значения каждого элемента матрицы. Одной из таких является схема TF-IDF. Они полезны в области обработки естественного языка, особенно в методах латентно-семантического анализа.

Property	Value
dbo:abstract	A document-term matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. This matrix is a specific instance of a document-feature matrix where "features" may refer to other properties of a document besides terms. It is also common to encounter the transpose, or term-document matrix where documents are the columns and terms are the rows. They are useful in the field of natural language processing and . While the value of the cells is commonly the raw count of a given term, there are various schemes for weighting the raw counts such as, row normalizing (i.e. relative frequency/proportions) and tf-idf. Terms are commonly single words separated by whitespace or punctuation on either side (a.k.a. unigrams). In such a case, this is also referred to as "bag of words" representation because the counts of individual words is retained, but not the order of the words in the document. (en) Терм-документная матрица представляет собой математическую матрицу, описывающую частоту терминов, которые встречаются в коллекции документов. В терм-документной матрице строки соответствуют документам в коллекции, а столбцы соответствуют терминам. Существуют различные схемы для определения значения каждого элемента матрицы. Одной из таких является схема TF-IDF. Они полезны в области обработки естественного языка, особенно в методах латентно-семантического анализа. (ru) Терм-документна матриця (англ. document-term matrix, term-document matrix) — матриця, що описує частоту появи термінів у колекції документів. В терм-документній матриці рядки відповідають документам з колекції, що аналізується, а стовпці асоційовані з термінами. Існують різноманітні схеми для визначення елементів матриці. Одною з них є схема TF-IDF. Такі матриці використовуються при обробці природної мови, зокрема в методах латентно-семантичного аналізу. (uk)
dbo:wikiPageExternalLink	http://nlp.fi.muni.cz/projekty/gensim
dbo:wikiPageID	1234327 (xsd:integer)
dbo:wikiPageLength	11324 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID	1085887604 (xsd:integer)
dbo:wikiPageWikiLink	dbr:Multivariate_analysis dbr:Natural_language_processing dbr:Non-negative_matrix_factorization dbr:Probabilistic_latent_semantic_analysis dbr:Matrix_(mathematics) dbr:Gerard_Salton dbr:Collocation dbr:Zipf's_law dbr:Synonym dbr:Bag_of_words_model dbr:Trie dbr:Document dbr:Latent_Dirichlet_allocation dbr:Latent_semantic_analysis dbc:Natural_language_processing dbr:Data_clustering dbr:Indo-European_languages dbr:Syntactic_category dbr:Vector_space_model dbr:Word-sense_disambiguation dbr:Tf-idf dbr:Polysemy dbr:Term_(language) dbr:Singular-value_decomposition dbr:Computational_text_analysis dbr:Eileen_Stone dbr:Harold_Borko
dbp:wikiPageUsesTemplate	dbt:More_citations_needed dbt:Reflist dbt:Natural_Language_Processing
dcterms:subject	dbc:Natural_language_processing
gold:hypernym	dbr:Matrix
rdf:type	dbo:AnatomicalStructure
rdfs:comment	Терм-документная матрица представляет собой математическую матрицу, описывающую частоту терминов, которые встречаются в коллекции документов. В терм-документной матрице строки соответствуют документам в коллекции, а столбцы соответствуют терминам. Существуют различные схемы для определения значения каждого элемента матрицы. Одной из таких является схема TF-IDF. Они полезны в области обработки естественного языка, особенно в методах латентно-семантического анализа. (ru) Терм-документна матриця (англ. document-term matrix, term-document matrix) — матриця, що описує частоту появи термінів у колекції документів. В терм-документній матриці рядки відповідають документам з колекції, що аналізується, а стовпці асоційовані з термінами. Існують різноманітні схеми для визначення елементів матриці. Одною з них є схема TF-IDF. Такі матриці використовуються при обробці природної мови, зокрема в методах латентно-семантичного аналізу. (uk) A document-term matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. This matrix is a specific instance of a document-feature matrix where "features" may refer to other properties of a document besides terms. It is also common to encounter the transpose, or term-document matrix where documents are the columns and terms are the rows. They are useful in the field of natural language processing and . (en)
rdfs:label	Document-term matrix (en) Терм-документная матрица (ru) Терм-документна матриця (uk)
owl:sameAs	freebase:Document-term matrix wikidata:Document-term matrix dbpedia-ru:Document-term matrix dbpedia-uk:Document-term matrix https://global.dbpedia.org/id/4iY3A
prov:wasDerivedFrom	wikipedia-en:Document-term_matrix?oldid=1085887604&ns=0
foaf:isPrimaryTopicOf	wikipedia-en:Document-term_matrix
is dbo:wikiPageRedirects of	dbr:Term-document_matrix dbr:Occurrence_matrix dbr:Occurrency_matrix
is dbo:wikiPageWikiLink of	dbr:Non-negative_matrix_factorization dbr:Matrix_(mathematics) dbr:Matrix_completion dbr:Latent_semantic_analysis dbr:Linear_classifier dbr:Bag-of-words_model dbr:BERT_(language_model) dbr:Term-document_matrix dbr:Search_engine_indexing dbr:Outline_of_natural_language_processing dbr:Word2vec dbr:Occurrence_matrix dbr:Occurrency_matrix dbr:Text_graph
is foaf:primaryTopic of	wikipedia-en:Document-term_matrix