Generalized Linear Gaussian Cluster-Weighted Modeling (original) (raw)

Generalized Linear Models with Random Effects

1991

In this paper count data with excess zeros and repeated observations per subject are evaluated. If the number of values observed for the zero event in the trial substantially exceeds the expected number (derived from the Poisson or from the negative binomial distribution), then there is an excess of zeros. Hurdle and zero-inflated models with random effects are available in order to evaluate this type of data. In this paper both model approaches are presented and are used for the evaluation of the number of visits to the feeder per cow per hour. Finally, for the analysis of the target trait a hurdle model with random effects based on a negative binomial distribution was used. This analysis was derived from a detailed comparison of models and was needed because of a simpler computer implementation. For improved interpretation of the results, the levels of the explanatory factors (for example, the classes of lactation) were not averaged in the link scale, but rather in the response scale. The deciding explanatory variables for the pattern of visiting activities in the 24-hour cycle are the milking and cleaning times at hours 4, 7, 12 and 20. The highly significant differences in the visiting frequencies of cows of the first lactation and those of higher lactations were explained by competition for access to the feeder and thus to the feed.

Agglomerative Likelihood Clustering

arXiv (Cornell University), 2019

We consider the problem of fast time-series data clustering. Building on previous work modeling the correlation-based Hamiltonian of spin variables we present an updated fast non-expensive Agglomerative Likelihood Clustering algorithm (ALC). The method replaces the optimized genetic algorithm based approach (f-SPC) with an agglomerative recursive merging framework inspired by previous work in Econophysics and Community Detection. The method is tested on noisy synthetic correlated time-series data-sets with built-in cluster structure to demonstrate that the algorithm produces meaningful non-trivial results. We apply it to time-series data-sets as large as 20,000 assets and we argue that ALC can reduce compute time costs and resource usage cost for large scale clustering for time-series applications while being serialized, and hence has no obvious parallelization requirement. The algorithm can be an effective choice for state-detection for online learning in a fast non-linear data environment because the algorithm requires no prior information about the number of clusters.

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.