Analyzing phonetic variation in the traditional English dialects: Simultaneously clustering dialects and phonetic features (original) (raw)
Related papers
2011
In this study we use bipartite spectral graph partitioning to simultaneously cluster varieties and identify their most distinctive linguistic features in Dutch dialect data. While clustering geographical varieties with respect to their features, eg pronunciation, is not new, the simultaneous identification of the features which give rise to the geographical clustering presents novel opportunities in dialectometry. Earlier methods aggregated sound differences and clustered on the basis of aggregate differences.
A statistical method for the identification and aggregation of regional linguistic variation
Language Variation and Change, 2011
This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States.
Frontiers in Artificial Intelligence, 2021
Dialectometry studies patterns of linguistic variation through correlations between geographic and aggregate measures of linguistic distance. However, aggregating smooths out the role of semantic characteristics, which have been shown to affect the distribution of lexical variants across dialects. Furthermore, although dialectologists have always been well-aware of other variables like population size, isolation and socio-demographic features, these characteristics are generally only included in dialectometric analyses afterwards for further interpretation of the results rather than as explanatory variables. This study showcases linear mixed-effects modelling as a method that is able to incorporate both language-external and language-internal factors as explanatory variables of linguistic variation in the Limburgish dialect continuum in Belgium and the Netherlands. Covering four semantic domains that vary in their degree of basic vs. cultural vocabulary and their degree of standardi...
The Geographic Distribution of Linguistic Variation
This paper builds on techniques for measuring pronunciation differences in order to analyze large quantities of dialect atlas data. We aggregate the pronunciation differences between lists of dialect atlas entries in order to assay the difference between two varieties. Given these techniques we can measure pronunciation differences for largish numbers of varieties.