Latent semantic analysis (original) (raw)

L’analyse sémantique latente (LSA, de l'anglais : Latent semantic analysis) ou indexation sémantique latente (ou LSI, de l'anglais : Latent semantic indexation) est un procédé de traitement des langues naturelles, dans le cadre de la sémantique vectorielle. La LSA fut brevetée en 1988 et publiée en 1990. Elle permet d'établir des relations entre un ensemble de documents et les termes qu'ils contiennent, en construisant des « concepts » liés aux documents et aux termes.

Property	Value
dbo:abstract	Latent Semantic Indexing (kurz LSI) ist ein (nicht mehr patentgeschütztes) Verfahren des Information Retrieval, das 1990 zuerst von et al. erwähnt wurde. Verfahren wie das LSI sind insbesondere für die Suche auf großen Datenmengen wie dem Internet von Interesse. Das Ziel von LSI ist es, Hauptkomponenten von Dokumenten zu finden. Diese Hauptkomponenten (Konzepte) kann man sich als generelle Begriffe vorstellen. So ist Pferd zum Beispiel ein Konzept, das Begriffe wie Mähre, Klepper oder Gaul umfasst. Somit ist dieses Verfahren zum Beispiel dazu geeignet, aus sehr vielen Dokumenten (wie sie sich beispielsweise im Internet finden lassen), diejenigen herauszufinden, die sich thematisch mit ‘Autos’ befassen, auch wenn in ihnen das Wort Auto nicht explizit vorkommt. Des Weiteren kann LSI dabei helfen, Artikel, in denen es wirklich um Autos geht, von denen zu unterscheiden, in denen nur das Wort Auto erwähnt wird (wie zum Beispiel bei Seiten, auf denen ein Auto als Gewinn angepriesen wird). (de) Ezkutuko semantikaren analisia (LSA) hizkuntzaren prozesamendurako teknika bat da. Dokumentu-multzo baten eta bertan agertzen diren terminoen arteko erlazioa aztertzeko kontzeptu-multzo bat sortzen da dokumentuetan eta terminoetan oinarrituz. Semantikoki oso antzeko diren hitzak antzeko esanahia duten testuetan agertzen direla ontzat ematen du LSAk. Testuetako paragrafoetako terminoen agerpen maiztasunak kalkulatuz termino-dokumentu matrize bat eraikitzen da (errenkada bat termino bakoitzekoeta zutabe bat paragrafo bakoitzeko) eta balio singularretan deskonposatzea (SVD) izeneko teknika matematikoa erabiltzen da terminoen eta dokumentuen adierazpen bektorialen dimentsioa murrizteko. Hitzen (terminoen) antzekotasun semantikoa kalkulatzeko errenkada-bektoreen arteko angeluaren kosinua kalkulatzen da (edo biderketa eskalarra). Kosinu-antzekotasuna 1etik gertu badago hitzak semantikoki antzekoak direla interpretatzen da, 0tik gertu badago, aldiz, semantikoki oso desberdinak direla. 1988an ezkutuko egitura semantikoan oinarritzen den informazio-berreskuratze teknika bat patentatu zuten (AEBetako 4,839,853 patentea, orain iraungia), Scott Deerwester, Susan Dumais, George Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum eta Lynn Streeter ikerlariek. Teknika Informazioa berreskuratzeko testuinguruan erabiltzen denean Ezkutuko Semantikaren Indexatzea (LSI) izenez ezagutu ohi da. (eu) Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text (the distributional hypothesis). A matrix containing word counts per document (rows represent unique words and columns represent each document) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents. An information retrieval technique using latent semantic structure was patented in 1988 (US Patent 4,839,853, now expired) by Scott Deerwester, Susan Dumais, George Furnas, Richard Harshman, Thomas Landauer, and . In the context of its application to information retrieval, it is sometimes called latent semantic indexing (LSI). (en) L’analyse sémantique latente (LSA, de l'anglais : Latent semantic analysis) ou indexation sémantique latente (ou LSI, de l'anglais : Latent semantic indexation) est un procédé de traitement des langues naturelles, dans le cadre de la sémantique vectorielle. La LSA fut brevetée en 1988 et publiée en 1990. Elle permet d'établir des relations entre un ensemble de documents et les termes qu'ils contiennent, en construisant des « concepts » liés aux documents et aux termes. (fr) 潜在意味解析（せんざいいみかいせき、英: Latent Semantic Analysis、略称: LSA）は、ベクトル空間モデルを利用した自然言語処理の技法の1つで、文書群とそこに含まれる用語群について、それらに関連した概念の集合を生成することで、その関係を分析する技術である。潜在的意味解析とも。 1988年、アメリカ合衆国でLSAの特許が取得されている。情報検索の分野では、潜在的意味索引または潜在意味インデックス（英: Latent Semantic Indexing, LSI）とも呼ばれている。 (ja) Латентно-семантический анализ (ЛСА) (англ. Latent semantic analysis, LSA) — это метод обработки информации на естественном языке, анализирующий взаимосвязь между библиотекой документов и терминами, в них встречающимися, и выявляющий характерные факторы (тематики), присущие всем документам и терминам. В основе метода латентно-семантического анализа лежат принципы факторного анализа, в частности, выявление латентных связей изучаемых явлений или объектов. При классификации / кластеризации документов этот метод используется для извлечения контекстно-зависимых значений лексических единиц при помощи статистической обработки больших корпусов текстов. (ru) Latent semantisk analys (eng. Latent Semantic Analysis, LSA), även kallat latent semantisk indexering, (eng. Latent Semantic Indexing, LSI), är en indexeringsmetod inom språkteknologi som beskriver relationen mellan termer (ord) och dokument i en korpus. Metoden placerar alla dokument i ett högdimensionellt vektorrum så att konceptuellt besläktade dokument även är närliggande i vektorrummet. Ett av metodens främsta mål är att kunna hämta ut alla relevanta dokument vid en sökning, även de som inte innehåller just de termer som användes i sökfrasen. (sv) 潜在语义分析（Latent Semantic Analysis），是语义学的一个新的分支。传统的语义学通常研究字、词的含义以及词与词之间的关系，如同义，近义，反义等等。潜在语义分析探讨的是隐藏在字词背后的某种关系，这种关系不是以词典上的定义为基础，而是以字词的使用环境作为最基本的参考。这种思想来自于心理语言学家。他们认为，世界上数以百计的语言都应该有一种共同的简单的机制，使得任何人只要是在某种特定的语言环境下长大都能掌握那种语言。在这种思想的指导下，人们找到了一种简单的数学模型，这种模型的输入是由任何一种语言书写的文献构成的文库，输出是该语言的字、词的一种数学表达（向量）。字、词之间的关系乃至任何文章片断之间的含义的比较就由这种向量之间的运算产生。潛在語義學的觀念也被應用在資訊檢索上，所以有時潛在語義學也被稱為隱含語義索引（Latent Semantic Indexing，LSI）。 (zh) Лате́нтно-семанти́чний ана́ліз (ЛСА) — метод обробки інформації природною мовою, зокрема, , що дозволяє аналізувати взаємозв'язок між набором документів і термінами, які в них зустрічаються, шляхом створення набору понять. ЛСА припускає, що слова, близькі за значенням, зустрічатимуться в подібних фрагментах тексту (дистрибутивна гіпотеза). З великої частини тексту створюється матриця, що вміщує кількість слів на параграф (рядки містять унікальні слова, а стовпці — текст кожного параграфа).При аналізі множини документів як вихідну інформацію ЛСА застосовує терм-документну матрицю, елементи якої свідчать про частоту використання кожного терміну в документах (TF-IDF). За допомогою математичного методу, що називається сингулярним розкладом матриці, кількість рядків терм-документної матриці зменшують, зберігаючи при цьому структуру подібності у стовпцях. Потім слова порівнюють за допомогою обчислення косинуса кута між двома векторами (скалярний добуток векторів, поділений на добуток їх модулів), що утворено будь-якими двома рядками. Значення, близькі до 1, є дуже схожими словами, тоді як значення, близькі до 0, представляють дуже різнорідні слова. ЛСА запатентували 1988 року , , Джордж Фурнас, , , Karen Lochbaum і Lynn Streeter. В контексті застосування в інформаційному пошуку, його часом називають латентно-семантичним індексуванням (англ. Latent Semantic Indexing (LSI)). (uk)
dbo:wikiPageExternalLink	http://lsi.research.telcordia.com/lsi/papers/JASIS90.pdf http://www.welchco.com/02/14/01/60/96/02/2901.HTM https://web.archive.org/web/20120717020428/http:/lsi.research.telcordia.com/lsi/papers/JASIS90.pdf http://infomap-nlp.sourceforge.net/ http://iv.slis.indiana.edu/sw/lsa.html http://lsa.colorado.edu/papers/dp1.LSAintro.pdf http://lsirwww.epfl.ch/courses/dis/2003ws/papers/ut-cs-94-270.pdf http://scgroup20.ceid.upatras.gr:8000/tmg/index.php/Main_Page http://www.semanticquery.com/archive/semanticsearchart/researchLSA.html https://www.inf.ethz.ch/department/faculty-profs/person-detail.html%3Fpersid=148752 https://doi.org/10.1109/TCBB.2014.2382127 http://web.mit.edu/~wingated/www/resources.html http://patft.uspto.gov/netacgi/nph-Parser%3Fpatentnumber=4839853 http://citeseer.ist.psu.edu/berry95using.html http://cran.at.r-project.org/web/packages/lsa/index.html http://www.d.umn.edu/~tpederse/senseclusters.html http://radimrehurek.com/gensim http://www.scholarpedia.org/article/Latent_semantic_analysis http://videolectures.net/slsfs05_hofmann_lsvm/ http://code.google.com/p/airhead-research/ http://code.google.com/p/semanticvectors/
dbo:wikiPageID	689427 (xsd:integer)
dbo:wikiPageLength	57724 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID	1123547216 (xsd:integer)
dbo:wikiPageWikiLink	dbr:Science_Applications_International_Corporation dbr:Multinomial_distribution dbr:N-gram dbr:Natural_language_processing dbr:Neural_network dbr:Probabilistic_latent_semantic_analysis dbr:Bellcore dbr:Patents dbr:Deep_learning dbr:Information_retrieval dbr:Electronic_Discovery dbc:Semantic_relations dbr:Computational_Linguistics dbr:Correlation dbr:Cosine_similarity dbc:Information_retrieval_techniques dbr:George_Furnas dbr:Low-rank_approximation dbr:Cognitive_Science dbr:Eigenvector dbr:Gensim dbr:Concept dbr:Context_(language_use) dbr:Contingency_table dbr:Correspondence_analysis dbr:Cross-language_information_retrieval dbr:Ergodic_hypothesis dbr:Singular_value_decomposition dbc:Latent_variable_models dbr:Frobenius_norm dbr:Bag_of_words_model dbr:Distributional_semantics dbr:Document-term_matrix dbr:Document_classification dbr:Latent_Dirichlet_allocation dbr:Latent_semantic_mapping dbr:Latent_semantic_structure_indexing dbr:Factor_analysis dbr:Normal_distribution dbr:Diagonal_matrix dbr:Graphical_model dbr:Principal_component_analysis dbr:Text_corpus dbr:Susan_Dumais dbr:Richard_Harshman dbr:Jean-Paul_Benzécri dbr:Boolean_search dbr:Lucene dbc:Natural_language_processing dbr:Synonymy dbr:Co-occurrence dbr:Coh-Metrix dbr:Higher-order_statistics dbr:Terminology dbr:Dot_product dbr:Automated_essay_scoring dbr:Automatic_summarization dbr:Physicians dbr:Poisson_distribution dbr:Sparse_matrix dbr:Free_recall dbr:Data_clustering dbr:Document_categorization dbr:Term-document_matrix dbr:Information_Retrieval dbr:Natural_Language_Processing dbr:Orthogonal_matrix dbr:Automated_document_classification dbr:Word_sense_disambiguation dbr:Tf–idf dbr:Multiple_choice_question dbr:Vector_space_model dbr:Explicit_semantic_analysis dbr:Compound_term_processing dbr:Literature-based_discovery dbr:Thomas_Landauer dbr:Prior_art dbr:Spamdexing dbr:Spam_(electronic) dbr:Tf-idf dbr:Evaluation_measures_(information_retrieval) dbr:Polysemy dbr:Scott_Deerwester dbr:Topic_model dbr:Word_vector dbr:Matrix_product dbr:Principal_Component_Analysis dbr:Principal_components_analysis dbr:Singular_Value_Decomposition dbr:Lanczos_method dbr:Concept_searching dbr:Probabilistic_model dbr:Locality_sensitive_hashing dbr:File:Topic_model_scheme.webm dbr:Karen_Lochbaum dbr:Lynn_Streeter dbr:Semantic_Proximity_Effect dbr:Word_Association_Spaces
dbp:date	November 2019 (en)
dbp:reason	This implies that the fastest method currently known is slower than an older method, which is impossible. (en)
dbp:wikiPageUsesTemplate	dbt:Citation_needed dbt:Cite_journal dbt:Cite_web dbt:Clarify dbt:Cleanup_bare_URLs dbt:Em dbt:Reflist dbt:Semantics dbt:Natural_Language_Processing
dcterms:subject	dbc:Semantic_relations dbc:Information_retrieval_techniques dbc:Latent_variable_models dbc:Natural_language_processing
gold:hypernym	dbr:Technique
rdf:type	dbo:TopicalConcept yago:WikicatLatentVariableModels yago:Assistant109815790 yago:CausalAgent100007347 yago:LivingThing100004258 yago:Model110324560 yago:Object100002684 yago:Organism100004475 yago:Person100007846 yago:PhysicalEntity100001930 yago:Worker109632518 yago:YagoLegalActor yago:YagoLegalActorGeo yago:Whole100003553
rdfs:comment	L’analyse sémantique latente (LSA, de l'anglais : Latent semantic analysis) ou indexation sémantique latente (ou LSI, de l'anglais : Latent semantic indexation) est un procédé de traitement des langues naturelles, dans le cadre de la sémantique vectorielle. La LSA fut brevetée en 1988 et publiée en 1990. Elle permet d'établir des relations entre un ensemble de documents et les termes qu'ils contiennent, en construisant des « concepts » liés aux documents et aux termes. (fr) 潜在意味解析（せんざいいみかいせき、英: Latent Semantic Analysis、略称: LSA）は、ベクトル空間モデルを利用した自然言語処理の技法の1つで、文書群とそこに含まれる用語群について、それらに関連した概念の集合を生成することで、その関係を分析する技術である。潜在的意味解析とも。 1988年、アメリカ合衆国でLSAの特許が取得されている。情報検索の分野では、潜在的意味索引または潜在意味インデックス（英: Latent Semantic Indexing, LSI）とも呼ばれている。 (ja) Latent semantisk analys (eng. Latent Semantic Analysis, LSA), även kallat latent semantisk indexering, (eng. Latent Semantic Indexing, LSI), är en indexeringsmetod inom språkteknologi som beskriver relationen mellan termer (ord) och dokument i en korpus. Metoden placerar alla dokument i ett högdimensionellt vektorrum så att konceptuellt besläktade dokument även är närliggande i vektorrummet. Ett av metodens främsta mål är att kunna hämta ut alla relevanta dokument vid en sökning, även de som inte innehåller just de termer som användes i sökfrasen. (sv) 潜在语义分析（Latent Semantic Analysis），是语义学的一个新的分支。传统的语义学通常研究字、词的含义以及词与词之间的关系，如同义，近义，反义等等。潜在语义分析探讨的是隐藏在字词背后的某种关系，这种关系不是以词典上的定义为基础，而是以字词的使用环境作为最基本的参考。这种思想来自于心理语言学家。他们认为，世界上数以百计的语言都应该有一种共同的简单的机制，使得任何人只要是在某种特定的语言环境下长大都能掌握那种语言。在这种思想的指导下，人们找到了一种简单的数学模型，这种模型的输入是由任何一种语言书写的文献构成的文库，输出是该语言的字、词的一种数学表达（向量）。字、词之间的关系乃至任何文章片断之间的含义的比较就由这种向量之间的运算产生。潛在語義學的觀念也被應用在資訊檢索上，所以有時潛在語義學也被稱為隱含語義索引（Latent Semantic Indexing，LSI）。 (zh) Latent Semantic Indexing (kurz LSI) ist ein (nicht mehr patentgeschütztes) Verfahren des Information Retrieval, das 1990 zuerst von et al. erwähnt wurde. Verfahren wie das LSI sind insbesondere für die Suche auf großen Datenmengen wie dem Internet von Interesse. Das Ziel von LSI ist es, Hauptkomponenten von Dokumenten zu finden. Diese Hauptkomponenten (Konzepte) kann man sich als generelle Begriffe vorstellen. So ist Pferd zum Beispiel ein Konzept, das Begriffe wie Mähre, Klepper oder Gaul umfasst. Somit ist dieses Verfahren zum Beispiel dazu geeignet, aus sehr vielen Dokumenten (wie sie sich beispielsweise im Internet finden lassen), diejenigen herauszufinden, die sich thematisch mit ‘Autos’ befassen, auch wenn in ihnen das Wort Auto nicht explizit vorkommt. Des Weiteren kann LSI dabei h (de) Ezkutuko semantikaren analisia (LSA) hizkuntzaren prozesamendurako teknika bat da. Dokumentu-multzo baten eta bertan agertzen diren terminoen arteko erlazioa aztertzeko kontzeptu-multzo bat sortzen da dokumentuetan eta terminoetan oinarrituz. Semantikoki oso antzeko diren hitzak antzeko esanahia duten testuetan agertzen direla ontzat ematen du LSAk. Testuetako paragrafoetako terminoen agerpen maiztasunak kalkulatuz termino-dokumentu matrize bat eraikitzen da (errenkada bat termino bakoitzekoeta zutabe bat paragrafo bakoitzeko) eta balio singularretan deskonposatzea (SVD) izeneko teknika matematikoa erabiltzen da terminoen eta dokumentuen adierazpen bektorialen dimentsioa murrizteko. Hitzen (terminoen) antzekotasun semantikoa kalkulatzeko errenkada-bektoreen arteko angeluaren kosinua kalkul (eu) Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text (the distributional hypothesis). A matrix containing word counts per document (rows represent unique words and columns represent each document) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 repre (en) Лате́нтно-семанти́чний ана́ліз (ЛСА) — метод обробки інформації природною мовою, зокрема, , що дозволяє аналізувати взаємозв'язок між набором документів і термінами, які в них зустрічаються, шляхом створення набору понять. ЛСА припускає, що слова, близькі за значенням, зустрічатимуться в подібних фрагментах тексту (дистрибутивна гіпотеза). З великої частини тексту створюється матриця, що вміщує кількість слів на параграф (рядки містять унікальні слова, а стовпці — текст кожного параграфа).При аналізі множини документів як вихідну інформацію ЛСА застосовує терм-документну матрицю, елементи якої свідчать про частоту використання кожного терміну в документах (TF-IDF). (uk) Латентно-семантический анализ (ЛСА) (англ. Latent semantic analysis, LSA) — это метод обработки информации на естественном языке, анализирующий взаимосвязь между библиотекой документов и терминами, в них встречающимися, и выявляющий характерные факторы (тематики), присущие всем документам и терминам. (ru)
rdfs:label	Latent Semantic Analysis (de) Ezkutuko semantikaren analisia (eu) Analyse sémantique latente (fr) Latent semantic analysis (en) 潜在意味解析 (ja) Латентно-семантический анализ (ru) Latent semantisk analys (sv) Латентно-семантичний аналіз (uk) 潜在语义学 (zh)
owl:sameAs	freebase:Latent semantic analysis yago-res:Latent semantic analysis wikidata:Latent semantic analysis dbpedia-de:Latent semantic analysis dbpedia-eu:Latent semantic analysis dbpedia-fa:Latent semantic analysis dbpedia-fr:Latent semantic analysis dbpedia-ja:Latent semantic analysis dbpedia-ru:Latent semantic analysis dbpedia-sv:Latent semantic analysis dbpedia-uk:Latent semantic analysis dbpedia-vi:Latent semantic analysis dbpedia-zh:Latent semantic analysis https://global.dbpedia.org/id/kRFs
prov:wasDerivedFrom	wikipedia-en:Latent_semantic_analysis?oldid=1123547216&ns=0
foaf:isPrimaryTopicOf	wikipedia-en:Latent_semantic_analysis
is dbo:wikiPageDisambiguates of	dbr:LSA
is dbo:wikiPageRedirects of	dbr:Latent_Semantic_Indexing dbr:Latent_Semantic_Analysis dbr:Latent_semantic_indexing dbr:Infoscale
is dbo:wikiPageWikiLink of	dbr:Scott_Crossley dbr:Entity_linking dbr:Probabilistic_latent_semantic_analysis dbr:Information_retrieval dbr:Linda_Harasim dbr:Social_computing dbr:George_Furnas dbr:Low-rank_approximation dbr:Similarity_learning dbr:Quantum_cognition dbr:Episodic_memory dbr:Gensim dbr:Models_of_collaborative_tagging dbr:Concept_search dbr:Confabulation_(neural_networks) dbr:Content_(Freudian_dream_analysis) dbr:Singular_value_decomposition dbr:Situation_awareness dbr:Emily_Howell dbr:Feature_extraction dbr:Persona_(user_experience) dbr:Distributional_semantics dbr:Document-term_matrix dbr:Language_acquisition dbr:Latent_and_observable_variables dbr:Latent_semantic_mapping dbr:Latent_semantic_structure_indexing dbr:Latent_space dbr:News_analytics dbr:Dimensionality_reduction dbr:Relevance_(information_retrieval) dbr:Hanson_Robotics dbr:Hierarchical_temporal_memory dbr:Similarity_search dbr:Latent_Semantic_Indexing dbr:Biclustering dbr:Symbolic_artificial_intelligence dbr:Cognition dbr:Richard_A._Harshman dbr:AutoTutor dbr:Automated_essay_scoring dbr:Automatic_summarization dbr:R/The_Donald dbr:Yebol dbr:Search_engine_indexing dbr:Tf–idf dbr:Semantic_analysis_(machine_learning) dbr:Semantic_similarity dbr:Sentiment_analysis dbr:Similarity_(psychology) dbr:Vector_space_model dbr:Latent_Semantic_Analysis dbr:Explicit_semantic_analysis dbr:List_of_statistics_articles dbr:LSA dbr:Latent_semantic_indexing dbr:Thomas_Landauer dbr:Semantic_analytics dbr:Statistical_semantics dbr:Semantic_memory dbr:Semantic_folding dbr:Semantic_space dbr:Outline_of_machine_learning dbr:Outline_of_natural_language_processing dbr:Scott_Deerwester dbr:Word_embedding dbr:Topic_model dbr:Word2vec dbr:Infoscale
is foaf:primaryTopic of	wikipedia-en:Latent_semantic_analysis