Sally Goldman - Academia.edu (original) (raw)
Papers by Sally Goldman
Machine Learning, 1994
This article studies self-directed learning, a variant of the on-line (or incremental) learning m... more This article studies self-directed learning, a variant of the on-line (or incremental) learning model in which the learner selects the presentation order for the instances. Alternatively, one can view this model as a variation of learning with membership queries in which the learner is only "charged" for membership queries for which it could not predict the outcome. We give tight bounds on the complexity of self-directed learning for the concept classes of monomials, monotone DNF formulas, and axis-parallel rectangles in {0, 1,. .., n-1} 6. These results demonstrate that the number of mistakes under self-directed learning can be surprisingly small. We then show that learning complexity in the model of self-directed learning is less than that of all other commonly studied on-line and query learning models. Next we explore the relationship between the complexity of self-directed learning and the Vapnik-Chervonenkis (VC-)dimension. We show that, in general, the VC-dimension and the self-directed learning complexity are incomparable. However, for some special cases, we show that the VC-dimension gives a lower bound for the self-directed learning complexity. Finally, we explore a relationship between Mitchell's version space algorithm and the existence of self-directed learning algorithms that make few mistakes.
Page 1. 6 JOURNAL OF INTERNET ENGINEERING, VOL. 1, NO. 1, JANUARY 2007 Smartacking: Improving TCP... more Page 1. 6 JOURNAL OF INTERNET ENGINEERING, VOL. 1, NO. 1, JANUARY 2007 Smartacking: Improving TCP Performance from the Receiving End Daniel K. Blandford, Sally A. Goldman, Sergey Gorinsky, Yan Zhou, and Daniel R. Dooly ...
Proceedings of the 12th International Conference on Algorithmic Learning Theory, Nov 25, 2001
While there has been a significant amount of theoretical and empirical research on the multiple-i... more While there has been a significant amount of theoretical and empirical research on the multiple-instance learning model, most of this research is for concept learning. However, for the important application area of drug discovery, a real-valued classification is preferable. In this paper we initiate a theoretical study of real-valued multiple-instance learning. We prove that the problem of finding a target point consistent with a set of labeled multiple-instance examples (or bags) is NP-complete, and that the problem of learning from real-valued multiple-instance examples is as hard as learning DNF. Another contribution of our work is in defining and studying a multiple-instance membership query (MI-MQ). We give a positive result on exactly learning the target point for a multiple-instance problem in which the learner is provided with a MI-MQ oracle and a single adversarially selected bag.
Proceedings of the Nineteenth International Conference on Machine Learning, Jul 8, 2002
Siamcomp, 1993
Page 1. Exact Identification of Circuits Using Fixed Points of Amplification Functions (Extended ... more Page 1. Exact Identification of Circuits Using Fixed Points of Amplification Functions (Extended Abstract) Sally A. Goldman Michael J. Kearns Robert E. Schapire Laboratory for Computer Science Massachusetts Inst it ute of Technology Cambridge, Massachusetts 02139 ...
Ml, 1999
Developing the ability to recognize a landmark from a visual image of a robot's current location ... more Developing the ability to recognize a landmark from a visual image of a robot's current location is a fundamental problem in robotics. We describe a way in which the landmark matching problem can be mapped to that of learning a one-dimensional geometric pattern. The first contribution of our work is an efficient noisetolerant algorithm (designed using the statistical query model) to PAC learn the class of one-dimensional geometric patterns. The second contribution of our work is an empirical study of our algorithm that provides some evidence that statistical query algorithms may be valuable for use in practice for handling noisy data.
Proceedings of the tenth annual conference on Computational learning theory - COLT '97, 1997
Proceedings of the seventh annual conference on Computational learning theory - COLT '94, 1994
Page 1. Learning Unions of Boxes with Membership and Equivalence Queries Paul W. Goldberg* Sally ... more Page 1. Learning Unions of Boxes with Membership and Equivalence Queries Paul W. Goldberg* Sally A. Goldmant H. David Mathias Department 1423 Dept. of Computer Science Dept. of Computer Science Sandia National ...
Proceedings of the fifth annual workshop on Computational learning theory - COLT '92, 1992
Page 1. Learning k-term DNF Formulas with an Incomplete Membership Oracle Sally A. Goldman Depart... more Page 1. Learning k-term DNF Formulas with an Incomplete Membership Oracle Sally A. Goldman Department of Computer Science Washington University St. Louis, MO 63130 sg@cs.wustl.edu H. David Mathias Department of Computer Science Washington University St. ...
Maximum-Entropy and Bayesian Methods in Science and Engineering, 1988
ABSTRACT This paper presents a new way to compute the probability distribution with maximum entro... more ABSTRACT This paper presents a new way to compute the probability distribution with maximum entropy satisfying a set of constraints. Unlike previous approaches, our method is integrated with the planning of data collection and tabulation. We show how adding constraints and performing the associated additional tabulations can substantially speed up computation by replacing the usual iterative techniques with a straight-forward computation. These extra constraints are shown to correspond to the intermediate tables used in Cheeseman's method. We also show that the class of constraint graphs that our method handles is a proper generalization of Pearl's singly-connected networks. An open problem is to determine a minimal set of constraints necessary to make a hypergraph acyclic. We conjecture that this problem is NP-complete, and discuss heuristics to approximate the optimal solution.
Colt Proceedings 1990, 1990
Page 1. Exact Identification of Circuits Using Fixed Points of Amplification Functions (Extended ... more Page 1. Exact Identification of Circuits Using Fixed Points of Amplification Functions (Extended Abstract) Sally A. Goldman Michael J. Kearns Robert E. Schapire Laboratory for Computer Science Massachusetts Inst it ute of Technology Cambridge, Massachusetts 02139 ...
Message Understanding Conference, 1993
Proceedings of the 5th conference on Message understanding - MUC5 '93, 1993
The primary goal of our effort is the development of robust and portable language processin g cap... more The primary goal of our effort is the development of robust and portable language processin g capabilities for information extraction applications. The system under evaluation here is based on language processing components that have demonstrated strong performance capabilities in previous evaluation s [ ] . Having demonstrated the general viability of these techniques, we are no w concentrating on the practicality of our technology by creating trainable system components to replac e hand-coded data and manually-engineered software.
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1 (CVPR'06), 2006
Image segmentation is a fundamental step in many computer vision applications. Generally, the cho... more Image segmentation is a fundamental step in many computer vision applications. Generally, the choice of a segmentation algorithm, or parameterization of a given algorithm, is selected at the application level and fixed for all images within that application. Our goal is to create a stand-alone method to evaluate segmentation quality. Stand-alone methods have the advantage that they do not require a manually-segmented reference image for comparison, and can therefore be used for real-time evaluation. Current stand-alone evaluation methods often work well for some types of images, but poorly for others. We propose a meta-evaluation method in which any set of base evaluation methods are combined by a machine learning algorithm that coalesces their evaluations based on a learned weighting function, which depends upon the image to be segmented. The training data used by the machine learning algorithm can be labeled by a human, based on similarity to a human-generated reference segmentation, or based upon system-level performance. Experimental results demonstrate that our method performs better than the existing stand-alone segmentation evaluation methods.
Chapman & Hall/CRC Applied Algorithms and Data Structures series, 1998
Machine Intelligence and Pattern Recognition, 1988
Machine Learning, 1994
This article studies self-directed learning, a variant of the on-line (or incremental) learning m... more This article studies self-directed learning, a variant of the on-line (or incremental) learning model in which the learner selects the presentation order for the instances. Alternatively, one can view this model as a variation of learning with membership queries in which the learner is only "charged" for membership queries for which it could not predict the outcome. We give tight bounds on the complexity of self-directed learning for the concept classes of monomials, monotone DNF formulas, and axis-parallel rectangles in {0, 1,. .., n-1} 6. These results demonstrate that the number of mistakes under self-directed learning can be surprisingly small. We then show that learning complexity in the model of self-directed learning is less than that of all other commonly studied on-line and query learning models. Next we explore the relationship between the complexity of self-directed learning and the Vapnik-Chervonenkis (VC-)dimension. We show that, in general, the VC-dimension and the self-directed learning complexity are incomparable. However, for some special cases, we show that the VC-dimension gives a lower bound for the self-directed learning complexity. Finally, we explore a relationship between Mitchell's version space algorithm and the existence of self-directed learning algorithms that make few mistakes.
Page 1. 6 JOURNAL OF INTERNET ENGINEERING, VOL. 1, NO. 1, JANUARY 2007 Smartacking: Improving TCP... more Page 1. 6 JOURNAL OF INTERNET ENGINEERING, VOL. 1, NO. 1, JANUARY 2007 Smartacking: Improving TCP Performance from the Receiving End Daniel K. Blandford, Sally A. Goldman, Sergey Gorinsky, Yan Zhou, and Daniel R. Dooly ...
Proceedings of the 12th International Conference on Algorithmic Learning Theory, Nov 25, 2001
While there has been a significant amount of theoretical and empirical research on the multiple-i... more While there has been a significant amount of theoretical and empirical research on the multiple-instance learning model, most of this research is for concept learning. However, for the important application area of drug discovery, a real-valued classification is preferable. In this paper we initiate a theoretical study of real-valued multiple-instance learning. We prove that the problem of finding a target point consistent with a set of labeled multiple-instance examples (or bags) is NP-complete, and that the problem of learning from real-valued multiple-instance examples is as hard as learning DNF. Another contribution of our work is in defining and studying a multiple-instance membership query (MI-MQ). We give a positive result on exactly learning the target point for a multiple-instance problem in which the learner is provided with a MI-MQ oracle and a single adversarially selected bag.
Proceedings of the Nineteenth International Conference on Machine Learning, Jul 8, 2002
Siamcomp, 1993
Page 1. Exact Identification of Circuits Using Fixed Points of Amplification Functions (Extended ... more Page 1. Exact Identification of Circuits Using Fixed Points of Amplification Functions (Extended Abstract) Sally A. Goldman Michael J. Kearns Robert E. Schapire Laboratory for Computer Science Massachusetts Inst it ute of Technology Cambridge, Massachusetts 02139 ...
Ml, 1999
Developing the ability to recognize a landmark from a visual image of a robot's current location ... more Developing the ability to recognize a landmark from a visual image of a robot's current location is a fundamental problem in robotics. We describe a way in which the landmark matching problem can be mapped to that of learning a one-dimensional geometric pattern. The first contribution of our work is an efficient noisetolerant algorithm (designed using the statistical query model) to PAC learn the class of one-dimensional geometric patterns. The second contribution of our work is an empirical study of our algorithm that provides some evidence that statistical query algorithms may be valuable for use in practice for handling noisy data.
Proceedings of the tenth annual conference on Computational learning theory - COLT '97, 1997
Proceedings of the seventh annual conference on Computational learning theory - COLT '94, 1994
Page 1. Learning Unions of Boxes with Membership and Equivalence Queries Paul W. Goldberg* Sally ... more Page 1. Learning Unions of Boxes with Membership and Equivalence Queries Paul W. Goldberg* Sally A. Goldmant H. David Mathias Department 1423 Dept. of Computer Science Dept. of Computer Science Sandia National ...
Proceedings of the fifth annual workshop on Computational learning theory - COLT '92, 1992
Page 1. Learning k-term DNF Formulas with an Incomplete Membership Oracle Sally A. Goldman Depart... more Page 1. Learning k-term DNF Formulas with an Incomplete Membership Oracle Sally A. Goldman Department of Computer Science Washington University St. Louis, MO 63130 sg@cs.wustl.edu H. David Mathias Department of Computer Science Washington University St. ...
Maximum-Entropy and Bayesian Methods in Science and Engineering, 1988
ABSTRACT This paper presents a new way to compute the probability distribution with maximum entro... more ABSTRACT This paper presents a new way to compute the probability distribution with maximum entropy satisfying a set of constraints. Unlike previous approaches, our method is integrated with the planning of data collection and tabulation. We show how adding constraints and performing the associated additional tabulations can substantially speed up computation by replacing the usual iterative techniques with a straight-forward computation. These extra constraints are shown to correspond to the intermediate tables used in Cheeseman's method. We also show that the class of constraint graphs that our method handles is a proper generalization of Pearl's singly-connected networks. An open problem is to determine a minimal set of constraints necessary to make a hypergraph acyclic. We conjecture that this problem is NP-complete, and discuss heuristics to approximate the optimal solution.
Colt Proceedings 1990, 1990
Page 1. Exact Identification of Circuits Using Fixed Points of Amplification Functions (Extended ... more Page 1. Exact Identification of Circuits Using Fixed Points of Amplification Functions (Extended Abstract) Sally A. Goldman Michael J. Kearns Robert E. Schapire Laboratory for Computer Science Massachusetts Inst it ute of Technology Cambridge, Massachusetts 02139 ...
Message Understanding Conference, 1993
Proceedings of the 5th conference on Message understanding - MUC5 '93, 1993
The primary goal of our effort is the development of robust and portable language processin g cap... more The primary goal of our effort is the development of robust and portable language processin g capabilities for information extraction applications. The system under evaluation here is based on language processing components that have demonstrated strong performance capabilities in previous evaluation s [ ] . Having demonstrated the general viability of these techniques, we are no w concentrating on the practicality of our technology by creating trainable system components to replac e hand-coded data and manually-engineered software.
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1 (CVPR'06), 2006
Image segmentation is a fundamental step in many computer vision applications. Generally, the cho... more Image segmentation is a fundamental step in many computer vision applications. Generally, the choice of a segmentation algorithm, or parameterization of a given algorithm, is selected at the application level and fixed for all images within that application. Our goal is to create a stand-alone method to evaluate segmentation quality. Stand-alone methods have the advantage that they do not require a manually-segmented reference image for comparison, and can therefore be used for real-time evaluation. Current stand-alone evaluation methods often work well for some types of images, but poorly for others. We propose a meta-evaluation method in which any set of base evaluation methods are combined by a machine learning algorithm that coalesces their evaluations based on a learned weighting function, which depends upon the image to be segmented. The training data used by the machine learning algorithm can be labeled by a human, based on similarity to a human-generated reference segmentation, or based upon system-level performance. Experimental results demonstrate that our method performs better than the existing stand-alone segmentation evaluation methods.
Chapman & Hall/CRC Applied Algorithms and Data Structures series, 1998
Machine Intelligence and Pattern Recognition, 1988