Heaps' law (original) (raw)

Зако́н Хи́пса — эмпирическая закономерность в лингвистике, описывающая распределение числа разных слов в документе (или наборе документов) как функцию от его длины. Описывается формулой , где VR — число разных слов в тексте размера n. K и β — свободные параметры, определяются эмпирически. Для английского корпуса текстов K обычно лежит между 10 и 100, а β между 0,4 и 0,6. Закон часто приписывается Гарольду Стэнли Хипсу, но впервые был открыт Густавом Герданом. С некоторым приближением закон Гердана — Хипса асимптотически эквивалентен закону Ципфа о частоте отдельных слов в тексте.

Property	Value
dbo:abstract	In linguistics, Heaps' law (also called Herdan's law) is an empirical law which describes the number of distinct words in a document (or set of documents) as a function of the document length (so called type-token relation). It can be formulated as where VR is the number of distinct words in an instance text of size n. K and β are free parameters determined empirically. With English text corpora, typically K is between 10 and 100, and β is between 0.4 and 0.6. The law is frequently attributed to , but was originally discovered by Gustav Herdan. Under mild assumptions, the Herdan–Heaps law is asymptotically equivalent to Zipf's law concerning the frequencies of individual words within a text. This is a consequence of the fact that the type-token relation (in general) of a homogenous text can be derived from the distribution of its types. Heaps' law means that as more instance text is gathered, there will be diminishing returns in terms of discovery of the full vocabulary from which the distinct terms are drawn. Heaps' law also applies to situations in which the "vocabulary" is just some set of distinct types which are attributes of some collection of objects. For example, the objects could be people, and the types could be country of origin of the person. If persons are selected randomly (that is, we are not selecting based on country of origin), then Heaps' law says we will quickly have representatives from most countries (in proportion to their population) but it will become increasingly difficult to cover the entire set of countries by continuing this method of sampling.Heaps' law has been observed also in single-cell transcriptomes considering genes as the distinct objects in the "vocabulary". (en) En lingüística, la ley de Heaps (también llamada ley de Herdan) es una ley empírica que describe el número de palabras distintas en un documento (o conjunto de documentos) como una función de la longitud del documento. Pueda ser formulado como: Donde VR es el número de palabras distintas en un texto de tamaño n. K Y β son los parámetros libres que se determinan empíricamente. Con un texto en inglés, típicamente K es entre 10 y 100, y β es entre 0,4 y 0,6. La ley es frecuentemente atribuida a Harold Stanley Heaps, pero fue originalmente descubierta por Gustav Herdan (1960). Bajo suposiciones suaves, la ley de Herdan-Heaps es una la ley asintóticamente equivalente a la ley de Zipf, que concierne a las frecuencias de palabras individuales dentro de un texto. Esto es una consecuencia del hecho de que la relación typo-token (en general) de un texto homogéneo puede ser derivado de la distribución de sus typos. La ley de Heaps significa que cuando más texto es generado, costará más tiempo encontrar palabras nuevas. La ley de Heaps también aplica a las situaciones en que el «vocabulario» es algún conjunto de distintas clases de alguna colección de objetos. Por ejemplo, los objetos podrían ser personas, y las clases podrían ser países de origen de la persona. Si las personas están seleccionadas aleatoriamente (es decir, no están seleccionadas las personas en función del país de origen), entonces la ley de Heaps dice cuán rápido encontraremos representantes de los países (en proporción al número de personas seleccionadas al azar) y predice que será más difícil cada vez encontrar personas de un país no incluido en la muestra. (es) Зако́н Хи́пса — эмпирическая закономерность в лингвистике, описывающая распределение числа разных слов в документе (или наборе документов) как функцию от его длины. Описывается формулой , где VR — число разных слов в тексте размера n. K и β — свободные параметры, определяются эмпирически. Для английского корпуса текстов K обычно лежит между 10 и 100, а β между 0,4 и 0,6. Закон часто приписывается Гарольду Стэнли Хипсу, но впервые был открыт Густавом Герданом. С некоторым приближением закон Гердана — Хипса асимптотически эквивалентен закону Ципфа о частоте отдельных слов в тексте. (ru) Закон Гіпса (англ. Heaps' law) — емпірична закономірність у лінгвістиці, що описує розподіл числа різних слів у документі (або наборі документів) як функцію від його довжини. Описується формулою , де VR — число різних слів у тексті розміру n. K і β — вільні параметри, визначаються емпірично. Для англійського корпусу текстів, K зазвичай лежить між 10 і 100, а β між 0.4 і 0.6. Закон часто приписують Гарольду Стенлі Гіпсу (Harold Stanley Heaps), але вперше його відкрив Густав Гердан (Gustav Herdan). З деяким наближенням закон Гердана — Гіпса асимптотично еквівалентний закону Ципфа про частоту окремих слів у тексті. (uk)
dbo:thumbnail	wiki-commons:Special:FilePath/Heaps_law_plot.png?width=300
dbo:wikiPageID	436287 (xsd:integer)
dbo:wikiPageLength	5491 (xsd:nonNegativeInteger)
dbo:wikiPageRevisionID	1062655297 (xsd:integer)
dbo:wikiPageWikiLink	dbc:Statistical_laws dbr:Menzerath's_law dbr:Principle_of_least_effort dbr:Benford's_law dbc:Empirical_laws dbr:Genes dbr:Bradford's_law dbc:Eponyms dbr:Zipf's_law dbr:Pareto_distribution dbr:Linguistics dbr:Brevity_law dbr:Text_corpus dbc:Computational_linguistics dbr:Empirical_law dbr:Rank-size_distribution dbr:Transcriptomes dbr:File:Heaps_law_plot.png dbr:Harold_Stanley_Heaps
dbp:id	3431 (xsd:integer)
dbp:title	Heaps' law (en)
dbp:wikiPageUsesTemplate	dbt:Citation dbt:Commonscatinline dbt:Div_col dbt:Div_col_end dbt:Refbegin dbt:Refend dbt:Reflist dbt:Harvs dbt:PlanetMath_attribution dbt:Comp-ling-stub
dcterms:subject	dbc:Statistical_laws dbc:Empirical_laws dbc:Eponyms dbc:Computational_linguistics
rdf:type	yago:WikicatStatisticalLaws yago:Abstraction100002137 yago:CausalAgent100007347 yago:Collection107951464 yago:Group100031264 yago:Law108441203 yago:LivingThing100004258 yago:Object100002684 yago:Organism100004475 yago:Person100007846 yago:PhysicalEntity100001930 yago:YagoLegalActor yago:YagoLegalActorGeo yago:Whole100003553 yago:WikicatEmpiricalLaws yago:WikicatEthnicGermanPeople
rdfs:comment	Зако́н Хи́пса — эмпирическая закономерность в лингвистике, описывающая распределение числа разных слов в документе (или наборе документов) как функцию от его длины. Описывается формулой , где VR — число разных слов в тексте размера n. K и β — свободные параметры, определяются эмпирически. Для английского корпуса текстов K обычно лежит между 10 и 100, а β между 0,4 и 0,6. Закон часто приписывается Гарольду Стэнли Хипсу, но впервые был открыт Густавом Герданом. С некоторым приближением закон Гердана — Хипса асимптотически эквивалентен закону Ципфа о частоте отдельных слов в тексте. (ru) In linguistics, Heaps' law (also called Herdan's law) is an empirical law which describes the number of distinct words in a document (or set of documents) as a function of the document length (so called type-token relation). It can be formulated as where VR is the number of distinct words in an instance text of size n. K and β are free parameters determined empirically. With English text corpora, typically K is between 10 and 100, and β is between 0.4 and 0.6. (en) En lingüística, la ley de Heaps (también llamada ley de Herdan) es una ley empírica que describe el número de palabras distintas en un documento (o conjunto de documentos) como una función de la longitud del documento. Pueda ser formulado como: Donde VR es el número de palabras distintas en un texto de tamaño n. K Y β son los parámetros libres que se determinan empíricamente. Con un texto en inglés, típicamente K es entre 10 y 100, y β es entre 0,4 y 0,6. La ley de Heaps significa que cuando más texto es generado, costará más tiempo encontrar palabras nuevas. (es) Закон Гіпса (англ. Heaps' law) — емпірична закономірність у лінгвістиці, що описує розподіл числа різних слів у документі (або наборі документів) як функцію від його довжини. Описується формулою , де VR — число різних слів у тексті розміру n. K і β — вільні параметри, визначаються емпірично. Для англійського корпусу текстів, K зазвичай лежить між 10 і 100, а β між 0.4 і 0.6. (uk)
rdfs:label	Ley de Heaps (es) Heaps' law (en) Закон Хипса (ru) Закон Гіпса (uk)
owl:sameAs	freebase:Heaps' law yago-res:Heaps' law wikidata:Heaps' law dbpedia-az:Heaps' law dbpedia-es:Heaps' law dbpedia-ru:Heaps' law dbpedia-uk:Heaps' law https://global.dbpedia.org/id/4kYua
prov:wasDerivedFrom	wikipedia-en:Heaps'_law?oldid=1062655297&ns=0
foaf:depiction	wiki-commons:Special:FilePath/Heaps_law_plot.png
foaf:isPrimaryTopicOf	wikipedia-en:Heaps'_law
is dbo:wikiPageRedirects of	dbr:Herdan's_law dbr:Heaps_law
is dbo:wikiPageWikiLink of	dbr:List_of_eponymous_laws dbr:Menzerath's_law dbr:Herdan's_law dbr:Index_of_linguistics_articles dbr:List_of_scientific_laws_named_after_people dbr:Clustering_high-dimensional_data dbr:Zipf's_law dbr:Feature_hashing dbr:Language_model dbr:Brevity_law dbr:Googlewhack dbr:Heaps_law dbr:List_of_statistics_articles dbr:Quantitative_linguistics
is foaf:primaryTopic of	wikipedia-en:Heaps'_law