Automating Computer Bottleneck Detection with Belief Nets (original) (raw)

2013, Eprint Arxiv 1302 4932

We describe an application of belief networks to the diagnosis of bottlenecks in computer systems. The technique relies on a high-level functional model of the interaction between application workloads, the Windows NT operating system, and system hardware. Given a workload description, the model predicts the values of observable system counters available from the Windows NT performance monitoring tool. Uncertainty in workloads, predictions, and counter values are characterized with Gaussian distributions. During diagnostic inference, we use observed performance monitor values to find the most probable assignment to the workload parameters. In this paper we provide some background on automated bottleneck detection, describe the structure of the system model, and discuss empirical procedures for model calibration and verification. Part of the calibration process includes generating a dataset to estimate a multivariate Gaussian error model. Initial results in diagnosing bottlenecks are presented.