Tree Induction for Probability-Based Ranking (original) (raw)
References
Apte, C., Grossman, E., Pednault, E., Rosen, B., Tipu, F., & White, B. (1999). Probabilistic estimation-based data mining for discovering insurance risks. IEEE Intelligent Systems, 14, 49–58. Google Scholar
Bahl, L. R., Brown, P. F., de Souza, P. V., & Mercer, R. L. (1989). A tree-based statistical language model for natural language speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37:7, 1001–1008. Google Scholar
Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning, 36, 105–142. Google Scholar
Bennett, P. (2002). Using a symmetric distributions to improve classifier probabilities: A comparison of new and standard parametric methods. Technical report CMU-CS-02-126, School of Computer Science, Carnegie Mellon University.
Blake, C., & Merz, C. J. (2000). UCI repository of machine learning databases. Machine-readable data repository, Department of Information and Computer Science, University of California at Irvine, Irvine, CA. Available at http://www.ics.uci.edu/?mlearn/MLRepository.html.
Bradford, J. P., Kunz, C., Kohavi, R., Brunk, C., & Brodley, C. E. (1998). Pruning decision trees with misclassification costs. Proceedings of the Tenth European Conference on Machine Learning (pp. 131–136). Berlin: Springer Verlag. Google Scholar
Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30:7, 1145–1159. Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140. Google Scholar
Breiman, L. (1998). Out-of-bag estimation. Unpublished manuscript.
Breiman, L. (2000). Private communication.
Breiman, L., Friedman, J. H., Olshen, R. A.,& Stone, C. J. (1984). Classification and Regression Trees. Wadsworth International Group.
Buntine,W. (1991). A theory of learning classification rules. Ph.D. thesis, School of Computer Science, University of Technology, Sydney, Australia. Google Scholar
Cestnik, B. (1990). Estimating probabilities:Acrucial task in machine learning. Proceedings of the Ninth European Conference on Artificial Intelligence (pp. 147–149). Pitman.
Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. Proceedings of the Sixth European Working Session on Learning (pp. 151–163). Berlin: Springer. Google Scholar
Danyluk, A., & Provost, F. (2002). Telecommunications network diagnosis. In W. Kloesgen, & J. Zytkow (Eds.), Handbook of Knowledge Discovery and Data Mining, 897–902.
Domingos, P. (1997). Why does bagging work? A Bayesian account and its implications. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (pp. 155–158). Menlo Park, CA: AAAI Press. Google Scholar
Domingos, P. (1999). MetaCost: A general method for making classifiers cost-sensitive. Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 155–164). New York: ACM Press. Google Scholar
Domingos, P. (1997).Knowledge acquisition from examples via multiple models. In D. H. Fisher (Ed.), Proceedings of the Fourteenth International Conference on Machine Learning (ICML-97) (pp. 98–106). San Francisco, CA: Morgan Kaufmann. Google Scholar
Drummond, C., & Holte, R. (2000). Exploiting the cost (in)sensitivity of decision tree splitting criteria. Proceedings of the Seventeenth International Conference on Machine Learning (pp. 239–246). San Francisco: Morgan Kaufmann. Google Scholar
Dzeroski, S., Cestnik, B., & Petrovski, I. (1993). Using the _m_-estimate in rule induction. Journal of Computing and Information Technology, 1, 37–46. Google Scholar
Friedman, N., & Goldszmidt, M. (1996). Learning Bayesian networks with local structure. Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (pp. 252–262). San Francisco: Morgan Kaufmann. Google Scholar
Good, I. J. (1965). The Estimation of Probabilities: An Essay on Modern Bayesian Methods. Cambridge, MA: MIT Press. Google Scholar
Gordon, L., & Olshen, R. A. (1984). Almost sure consistent nonparametric regression from recursive partitioning schemes. Journal of Multivariate Analysis, 15, 147–163. Google Scholar
Hand, D. J. (1997). Construction and Assessment of Classification Rules. Chichester: John Wiley and Sons. Google Scholar
Hand, D. J., & Till, R. J. (2001). A simple generalization of the area under the ROC curve for multiple class classification problems. Machine Learning, 45:2, 171–186. Google Scholar
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143, 29–36. Google Scholar
Hastie, T. J., & Pregibon, D. (1990). Shrinking trees. Technical report, AT&T Laboratories.
Heckerman, D., Chickering, M., Meek, C., Rounthwaite, R., & Kadie, C. (2000). Dependency networks for density estimation, collaborative filtering, and data visualization. Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann. Google Scholar
Holte, R., Acker, L., & Porter, B. (1989). Concept learning and the problem of small disjuncts. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 813–818). San Francisco: Morgan Kaufmann. Google Scholar
Jelinek, F. (1997). Statistical Methods for Speech Recognition. Cambridge, MA: MIT Press. Google Scholar
Kohavi, R., Becker, B., & Sommerfield, D. (1997). Improving simple Bayes. The Ninth European Conference on Machine Learning (pp. 78–87).
Lim, T.-J., Loh, W.-Y., & Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40:3, 203–228. Google Scholar
Margineantu, D. D., & Dietterich, T. G. (2001). Improved class probability estimates from decision tree models. In C. Holmes (Ed.), Nonlinear Estimation and Classification. The Mathematical Sciences Research Institute, University of California, Berkeley. Google Scholar
McCallum, A., Rosenfeld, R., Mitchell, T., & Ng, A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes. Proceedings of the Fifteenth International Conference on Machine Learning (pp. 359–367). San Francisco: Morgan Kaufmann. Google Scholar
Niblett, T. (1987). Constructing decision trees in noisy domains. Proceedings of the Second European Working Session on Learning (pp. 67–78). Wilmslow, England: Sigma Press. Google Scholar
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. Proceedings of the Eleventh International Conference on Machine Learning (pp. 217–225). San Francisco: Morgan Kaufmann. Google Scholar
Perlich, C., Provost, F., & Simonoff, J. S. (2003). Tree induction versus logistic regression: A learning-curve analysis. Journal of Machine Learning Research. (In press).
Provost, F., & Domingos, P. (2000). Well-trained PETs: Improving probability estimation trees. CeDER Working Paper #IS-00-04, Stern School of Business, New York University, NY 10012. Google Scholar
Provost, F., & Fawcett,T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97) (pp. 43–48). Menlo Park, CA: AAAI Press. Google Scholar
Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning, 42, 203–231. Google Scholar
Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. Proceedings of the Fifteenth International Conference on Machine Learning (pp. 445–453). San Francisco: Morgan Kaufmann. Google Scholar
Provost, F., & Kolluri, V. (1999). A survey of methods for scaling up inductive algorithms. Data Mining and Knowledge Discovery, 3:2, 131–169. Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann. Google Scholar
Simonoff, J. S. (1995). Smoothing categorical data. Journal of Statistical Planning and Inference, 47, 41–69. Google Scholar
Smyth, P., Gray, A., & Fayyad, U. (1995). Retrofitting decision tree classifiers using kernel density estimation. Proceedings of the Twelfth International Conference on Machine Learning (pp. 506–514). San Francisco: Morgan Kaufmann. Google Scholar
Sobehart, J. R., Stein, R. M., Mikityanskaya, V., & Li, L. (2000). Moody's public firm risk model: A hybrid approach to modeling short term default risk. Tech rep., Moody's Investors Service, Global Credit Research. Available: http://www.moodysqra.com/research/crm/53853.asp.
Swets, J. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285–1293. Google Scholar
Zadrozny, B., & Elkan, C. (2001). Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In C. Brodley, & A. Danyluk (Eds.), Proceedings of the Eighteenth International Conference on Machine Learning (pp. 609–616). San Francisco: Morgan Kaufmann. Google Scholar