John Case - Academia.edu (original) (raw)
Papers by John Case
Theoretical Computer Science, 2006
An algorithm for learning a subclass of erasing regular pattern languages is presented. On extend... more An algorithm for learning a subclass of erasing regular pattern languages is presented. On extended regular pattern languages generated by patterns π of the form x 0 α 1 x 1. .. α m x m , where x 0 ,. .. , x m are variables and α 1 ,. .. , α m strings of terminals of length c each, it runs with arbitrarily high probability of success using a number of examples polynomial in m (and exponential in c). It is assumed that m is unknown, but c is known and that samples are randomly drawn according to some distribution, for which we only require that it has certain natural and plausible properties. Aiming to improve this algorithm further we also explore computer simulations of a heuristic.
Information and Computation, 2002
Information and Computation, 2012
Introduced is a new inductive inference paradigm, dynamic modeling. Within this learning paradigm... more Introduced is a new inductive inference paradigm, dynamic modeling. Within this learning paradigm, for example, function h learns function g iff, in the i-th iteration, h and g both produce output, h gets the sequence of all outputs from g in prior iterations as input, g gets all the outputs from h in prior iterations as input, and, from some iteration on, the sequence of h's outputs will be programs for the output sequence of g. Dynamic modeling provides an idealization of, for example, a social interaction in which h seeks to discover program models of g's behavior it sees in interacting with g, and h openly discloses to g its sequence of candidate program models to see what g says back. Sample results: every g can be so learned by some h; there are g that can only be learned by an h if g can also learn that h back; there are extremely secretive h which cannot be learned back by any g they learn, but which, nonetheless, succeed in learning infinitely many g; quadratic time learnability is strictly more powerful than linear time learnability. This latter result, as well as others, follows immediately from general correspondence theorems obtained from a unified approach to the paradigms within inductive inference. Many proofs, some sophisticated, employ machine self-reference, a.k.a., recursion theorems.
Theoretical Computer Science, 2001
Concept drift means that the concept about which data is obtained may shift from time to time, ea... more Concept drift means that the concept about which data is obtained may shift from time to time, each time after some minimum permanence. Except for this minimum permanence, the concept shifts may not have to satisfy any further requirements and may occur infinitely often. Within this work is studied to what extent it is still possible to predict or learn values for a data sequence produced by drifting concepts. Various ways to measure the quality of such predictions, including martingale betting strategies and density and frequency of correctness, are introduced and compared with one another. For each of these measures of prediction quality, for some interesting concrete classes, (nearly) optimal bounds on permanence for attaining learnability are established. The concrete classes, from which the drifting concepts are selected, include regular languages accepted by finite automata of bounded size, polynomials of bounded degree, and sequences defined by recurrence relations of bounded size. Some important, restricted cases of drifts are also studied, for example, the case where the intervals of permanence are computable. In the case where the concepts shift only
Theory of Computing Systems, 2009
Mathematical Problems from Applied Logic II
This paper begins by briefly indicating the principal, non-standard motivations of the author for... more This paper begins by briefly indicating the principal, non-standard motivations of the author for his decades of work in Computability Theory (CT), a.k.a. Recursive Function Theory. Then it discusses its proposed, general directions beyond those from pure mathematics for CT. These directions are as follows. 1. Apply CT to basic sciences, for example, biology, psychology, physics, chemistry, and economics. 2. Apply the resultant insights from 1 to philosophy and, more generally, apply CT to areas of philosophy in addition to the philosophy and foundations of mathematics. 3. Apply CT for insights into engineering and other professional fields. Lastly, this paper provides a progress report on the above non-pure mathematical directions for CT, including examples for biology, cognitive science and learning theory, philosophy of science, physics, applied machine learning, and computational complexity. Interweaved with the report are occasional remarks about the future.
Information and Computation, 2016
Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliograf... more Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http://dnb.d-nb.de abrufbar. The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.d-nb.de ISBN 978-3-8487-3688-1 (Print) 978-3-8452-8040-0 (ePDF) British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN 978-3-8487-3688-1 (Print) 978-3-8452-8040-0 (ePDF) Library of Congress Cataloging-in-Publication Data Griech-Polelle, Beth A. The Nuremberg War Crimes Trial and its Policy Consequences Today 2 nd , revised and extended edition Beth A. Griech-Polelle (ed.) 274 pp. Includes bibliographic references and index. ISBN 978-3-8487-3688-1 (Print) 978-3-8452-8040-0 (ePDF) 2 nd , revised and extended edition, 2020 © Nomos Verlagsgesellschaft, Baden-Baden 2020. Gedruckt in Deutschland. Alle Rechte, auch die des Nachdrucks von Auszügen, der fotomechanischen Wiedergabe und der Übersetzung, vorbehalten. Gedruckt auf alterungsbeständigem Papier. This work is subject to copyright. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without prior permission in writing from the publishers. Under § 54 of the German Copyright Law where copies are made for other than private use a fee is payable to "Verwertungs gesellschaft Wort", Munich. No responsibility for loss caused to any individual or organization acting on or refraining from action as a result of the material in this publication can be accepted by Nomos or the editor.
Lecture Notes in Computer Science, 2008
In computational function learning in the limit, an algorithmic learner tries to nd a program for... more In computational function learning in the limit, an algorithmic learner tries to nd a program for a computable function g given successively more values of g, each time outputting a conjectured program for g. A learner is called postdictively complete i all available data is correctly postdicted by each conjecture. Akama and Zeugmann presented, for each choice of natural number δ, a relaxation to postdictive completeness: each conjecture is required to postdict only all except the last δ seen data points. This paper extends this notion of delayed postdictive completeness from constant delays to dynamically computed delays. On the one hand, the delays can be dierent for dierent data points. On the other hand, delays no longer need to be by a xed nite number, but any type of computable countdown is allowed, including, for example, countdown in a system of ordinal notations and in other graphs disallowing computable innitely descending counts. We extend many of the theorems of Akama and Zeugmann and provide some feasible learnability results. Regarding fairness in feasible learning, one needs to limit use of tricks that postpone output hypotheses until there is enough time to think about them. We see, for polytime learning, postdictive completeness (and delayed variants): 1. allows some but not all postponement tricks, and 2. there is a surprisingly tight boundary, for polytime learning, between what postponement is allowed and what is not. For example: 1. the set of polytime computable functions is polytime postdictively completely learnable employing some postponement, but 2. the set of exptime computable functions, while polytime learnable with a little more postponement, is not polytime postdictively completely learnable! We have that, for w a notation for ω, the set of exptime functions is polytime learnable with w-delayed postdictive completeness. Also provided are generalizations to further, small constructive limit ordinals.
Lecture Notes in Computer Science, 2013
Re learning in the limit from positive data, a major concern is which classes of languages are le... more Re learning in the limit from positive data, a major concern is which classes of languages are learnable with respect to a given learning criterion. We are particularly interested herein in the reasons for a class of languages to be unlearnable. We consider two types of reasons. One type is called topological where it does not help if the learners are allowed to be uncomputable (an example of Gold's is that no class containing an infinite language and all its finite sub-languages is learnable-even by an uncomputable learner). Another reason is called computational (where the learners are required to be algorithmic). In particular, two learning criteria might allow for learning different classes of languages from one another-but with dependence on whether the unlearnability is of type topological or computational. In this paper we formalize the idea of two learning criteria separating topologically in learning power. This allows us to study more closely why two learning criteria separate in learning power. For a variety of learning criteria, concerning vacillatory, monotone, (several kinds of) iterative and feedback learning, we show that certain learning criteria separate topologically, and certain others, which are known to separate, are shown not to separate topologically. Showing that learning criteria do not separate topologically implies that any known separation must necessarily exploit algorithmicity of the learner.
Topics in Cognitive Science, 2013
A U-shaped curve in a cognitive-developmental trajectory refers to a three-step process: good per... more A U-shaped curve in a cognitive-developmental trajectory refers to a three-step process: good performance followed by bad performance followed by good performance once again. U-shaped curves have been observed in a wide variety of cognitive-developmental and learning contexts. Ushaped learning seems to contradict the idea that learning is a monotonic, cumulative process and thus constitutes a challenge for competing theories of cognitive development and learning. U-shaped behaviour in language learning (in particular in learning English past tense) has become a central topic in the Cognitive Science debate about learning models. Antagonist models (e.g., connectionism vs. nativism) are often judged on their ability of modeling or accounting for U-shaped behaviour. The prior literature is mostly occupied with explaining how U-shaped behaviour occurs. Instead, we are interested in the necessity of this kind of apparently inefficient strategy. We present and discuss a body of results in the abstract mathematical setting of (extensions of) Gold-style computational learning theory addressing a mathematically precise version of the following question: Are there learning tasks that require U-shaped behaviour? All notions considered are learning in the limit from positive data. We present results about the necessity of U-shaped learning in classical models of learning as well as in models with bounds on the memory of the learner. The pattern emerges that, for parameterized, cognitively relevant learning criteria, beyond very few initial parameter values, U-shapes are necessary for full learning power! We discuss the possible relevance of the above results for the Cognitive Science debate about learning models as well as directions for future research.
Theoretical Computer Science, 2010
It is investigated for which choice of a parameter q, denoting the number of contexts, the class ... more It is investigated for which choice of a parameter q, denoting the number of contexts, the class of simple external contextual languages is iteratively learnable. On the one hand, the class admits, for all values of q, polynomial time learnability provided an adequate choice of the hypothesis space is given. On the other hand, additional constraints like consistency and conservativeness or the use of a one-one hypothesis space changes the picture-iterative learning limits the long term memory of the learner to the current hypothesis and these constraints further hinder storage of information via padding of this hypothesis. It is shown that if q > 3, then simple external contextual languages are not iteratively learnable using a class preserving one-one hypothesis space, while for q = 1 it is iteratively learnable, even in polynomial time. It is also investigated for which choice of the parameters, the simple external contextual languages can be learnt by a consistent and conservative iterative learner. 1 Supported by a Marie Curie International Fellowship within the 6th European Community Framework Programme. 2 Supported in part by NUS grant number R252-000-308-112. 3 Supported in part by NUS grant numbers R252-000-308-112 and R146-000-114-112.
Theoretical Computer Science, 2009
Iterative learning (It-learning) is a Gold-style learning model in which each of a learner's outp... more Iterative learning (It-learning) is a Gold-style learning model in which each of a learner's output conjectures may depend only upon the learner's current conjecture and the current input element. Two extensions of the It-learning model are considered, each of which involves parallelism. The first is to run, in parallel, distinct instantiations of a single learner on each input element. The second is to run, in parallel, n individual learners incorporating the first extension, and to allow the n learners to communicate their results. In most contexts, parallelism is only a means of improving efficiency. However, as shown herein, learners incorporating the first extension are more powerful than It-learners, and, collective learners resulting from the second extension increase in learning power as n increases. Attention is paid to how one would actually implement a learner incorporating each extension. Parallelism is the underlying mechanism employed.
Theoretical Computer Science, 2013
The Journal of Symbolic Logic, 1994
A generator program for a computable function (by definition) generates an infinite sequence of p... more A generator program for a computable function (by definition) generates an infinite sequence of programs all but finitely many of which compute that function. Machine learning of generator programs for computable functions is studied. To motivate these studies partially, it is shown that, in some cases, interesting global properties for computable functions can be proved from suitable generator programs which cannot be proved from any ordinary programs for them. The power (for variants of various learning criteria from the literature) of learning generator programs is compared with the power of learning ordinary programs. The learning power in these cases is also compared to that of learning limiting programs, i.e., programs allowed finitely many mind changes about their correct outputs.
SIAM Journal on Computing, 2006
The present work studies clustering from an abstract point of view and investigates its propertie... more The present work studies clustering from an abstract point of view and investigates its properties in the framework of inductive inference. Any class S considered is given by a hypothesis space, i.e., numbering, A 0 , A 1 ,. .. of nonempty recursively enumerable (r.e.) subsets of N or Q k. A clustering task is a finite and nonempty set of r.e. indices of pairwise disjoint such sets. The class S is said to be clusterable if there is an algorithm which, for every clustering task I, converges in the limit on any text for i∈I A i to a finite set J of indices of pairwise disjoint clusters such that j∈J A j = i∈I A i. A class is called semiclusterable if there is such an algorithm which finds a J with the last condition relaxed to j∈J A j ⊇ i∈I A i. The relationship between natural topological properties and clusterability is investigated. Topological properties can provide sufficient or necessary conditions for clusterability, but they cannot characterize it. On the one hand, many interesting conditions make use of both the topological structure of the class and a well-chosen numbering. On the other hand, the clusterability of a class does not depend on which numbering of the class is used as a hypothesis space for the clusterer. These ideas are demonstrated in the context of naturally geometrically defined classes. Besides the text for the clustering task, clustering of many of these classes requires the following additional information: the class of convex hulls of finitely many points in a rational vector space can be clustered with the number of clusters as additional information. Interestingly, the class of polygons (together with their interiors) is clusterable if the number of clusters and the overall number of vertices of these clusters is given to the clusterer as additional information. Intriguingly, this additional information is not sufficient for classes including figures with holes. While some classes are unclusterable due to their topological structure, others are only computationally intractable. An oracle might permit clustering all computationally intractable clustering tasks but fail on some classes which are topologically difficult. It is shown that an oracle E permits clustering all computationally difficult classes iff E ≥ T K ∧ E ≥ T K. Furthermore, no 1-generic oracle below K and no 2-generic oracle permits clustering any class which is not clusterable without an oracle.
SIAM Journal on Computing, 1999
Some extensions are considered of Gold's influential model of language learning by machine from p... more Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there are classes of languages that can be learned if convergence in the limit to up to (n + 1) exactly correct grammars is allowed but which cannot be learned if convergence in the limit is to no more than n grammars, where the no more than n grammars can each make finitely many mistakes. This contrasts sharply with results of Barzdin and Podnieks and, later, Case and Smith for learnability from both positive and negative data. A subset principle from a 1980 paper of Angluin is extended to the vacillatory and other criteria of this paper. This principle provides a necessary condition for avoiding overgeneralization in learning from positive data. It is applied to prove another theorem to the effect that one can optimally eliminate half of the mistakes from final programs for vacillatory criteria if one is willing to converge in the limit to infinitely many different programs instead. Child language learning may be sensitive to the order or timing of data presentation. It is shown, though, that for the vacillatory success criteria of this paper, there is no loss of learning power for machines which are insensitive to order in several ways simultaneously. For example, partly set-driven machines attend only to the set and length of sequence of positive data, not the actual sequence itself. A machine M is weakly n-ary order independent def ⇔ for each language L on which, for some ordering of the positive data about L, M converges in the limit to a finite set of grammars, there is a finite set of grammars D (of cardinality ≤ n) such that M converges to a subset of this same D for each ordering of the positive data for L. The theorem most difficult to prove in the paper implies that machines which are simultaneously partly set-driven and weakly n-ary order independent do not lose learning power for converging in the limit to up to n grammars. Several variants of this theorem are obtained by modifying its proof, and some of these variants have application in this and other papers. Along the way it is also shown, for the vacillatory criteria, that learning power is not increased if one restricts the sequence of positive data presentation to be computable. Some of these results are nontrivial lifts of prior work for the n = 1 case done by the Blums; Wiehagen; Osherson, Stob, and Weinstein; Schäfer; and Fulk.
Machine Learning, 2008
This paper solves an important problem left open in the literature by showing that U-shapes are u... more This paper solves an important problem left open in the literature by showing that U-shapes are unnecessary in iterative learning from positive data. A U-shape occurs when a learner first learns, then unlearns, and, finally, relearns, some target concept. Iterative learning is a Gold-style learning model in which each of a learner's output conjectures depends only upon the learner's most recent conjecture and input element. Previous results had shown, for example, that U-shapes are unnecessary for explanatory learning, but are necessary for behaviorally correct learning. Work on the aforementioned problem led to the consideration of an iterative-like learning model, in which each of a learner's conjectures may, in addition, depend upon the number of elements so far presented to the learner. Learners in this new model are strictly more powerful than traditional iterative learners, yet not as powerful as full explanatory learners. Can any class of languages learnable in this new model be learned without U-shapes? For now, this problem is left open.
Logical Methods in Computer Science, 2013
The present work determines the exact nature of {\em linear time computable} notions which charac... more The present work determines the exact nature of {\em linear time computable} notions which characterise automatic functions (those whose graphs are recognised by a finite automaton). The paper also determines which type of linear time notions permit full learnability for learning in the limit of automatic classes (families of languages which are uniformly recognised by a finite automaton). In particular it is shown that a function is automatic iff there is a one-tape Turing machine with a left end which computes the function in linear time where the input before the computation and the output after the computation both start at the left end. It is known that learners realised as automatic update functions are restrictive for learning. In the present work it is shown that one can overcome the problem by providing work tapes additional to a resource-bounded base tape while keeping the update-time to be linear in the length of the largest datum seen so far. In this model, one additiona...
Journal of Experimental & Theoretical Artificial Intelligence, 1992
Suppose LC 1 and LC 2 are two machine learning classes each based on a criterion of success. Supp... more Suppose LC 1 and LC 2 are two machine learning classes each based on a criterion of success. Suppose, for every machine which learns a class of functions according to the LC 1 criterion of success, there is a machine which learns this class according to the LC 2 criterion. In the case where the converse does not hold LC 1 is said to be separated from LC 2. It is shown that for many such separated learning classes from the literature a much stronger separation holds: (∀C ∈ LC 1)(∃C ∈ (LC 2 − LC 1))[C ⊃ C]. It is also shown that there is a pair of separated learning classes from the literature for which the stronger separation just above does not hold. A philosophical heuristic toward the design of artificially intelligent learning programs is presented with each strong separation result.
Journal of Computer and System Sciences, 1995
Gold-style language learning is a formal theory of learning from examples by algorithmic devices ... more Gold-style language learning is a formal theory of learning from examples by algorithmic devices called learning machines. Originally motivated by child language learning, it features the algorithmic synthesis (in the limit) of grammars for formal languages from information about those languages. In traditional Gold-style language learning, learning machines are not provided with negative information, i.e., information about the complements of the input languages. We investigate two approaches to providing small amounts of negative information and demonstrate in each case a strong resulting increase in learning power. Finally, we show that small packets of negative information also lead to increased speed of learning. This result agrees with a psycholinguistic hypothesis of McNeill correlating the availability of parental expansions with the speed of child language development.
Theoretical Computer Science, 2006
An algorithm for learning a subclass of erasing regular pattern languages is presented. On extend... more An algorithm for learning a subclass of erasing regular pattern languages is presented. On extended regular pattern languages generated by patterns π of the form x 0 α 1 x 1. .. α m x m , where x 0 ,. .. , x m are variables and α 1 ,. .. , α m strings of terminals of length c each, it runs with arbitrarily high probability of success using a number of examples polynomial in m (and exponential in c). It is assumed that m is unknown, but c is known and that samples are randomly drawn according to some distribution, for which we only require that it has certain natural and plausible properties. Aiming to improve this algorithm further we also explore computer simulations of a heuristic.
Information and Computation, 2002
Information and Computation, 2012
Introduced is a new inductive inference paradigm, dynamic modeling. Within this learning paradigm... more Introduced is a new inductive inference paradigm, dynamic modeling. Within this learning paradigm, for example, function h learns function g iff, in the i-th iteration, h and g both produce output, h gets the sequence of all outputs from g in prior iterations as input, g gets all the outputs from h in prior iterations as input, and, from some iteration on, the sequence of h's outputs will be programs for the output sequence of g. Dynamic modeling provides an idealization of, for example, a social interaction in which h seeks to discover program models of g's behavior it sees in interacting with g, and h openly discloses to g its sequence of candidate program models to see what g says back. Sample results: every g can be so learned by some h; there are g that can only be learned by an h if g can also learn that h back; there are extremely secretive h which cannot be learned back by any g they learn, but which, nonetheless, succeed in learning infinitely many g; quadratic time learnability is strictly more powerful than linear time learnability. This latter result, as well as others, follows immediately from general correspondence theorems obtained from a unified approach to the paradigms within inductive inference. Many proofs, some sophisticated, employ machine self-reference, a.k.a., recursion theorems.
Theoretical Computer Science, 2001
Concept drift means that the concept about which data is obtained may shift from time to time, ea... more Concept drift means that the concept about which data is obtained may shift from time to time, each time after some minimum permanence. Except for this minimum permanence, the concept shifts may not have to satisfy any further requirements and may occur infinitely often. Within this work is studied to what extent it is still possible to predict or learn values for a data sequence produced by drifting concepts. Various ways to measure the quality of such predictions, including martingale betting strategies and density and frequency of correctness, are introduced and compared with one another. For each of these measures of prediction quality, for some interesting concrete classes, (nearly) optimal bounds on permanence for attaining learnability are established. The concrete classes, from which the drifting concepts are selected, include regular languages accepted by finite automata of bounded size, polynomials of bounded degree, and sequences defined by recurrence relations of bounded size. Some important, restricted cases of drifts are also studied, for example, the case where the intervals of permanence are computable. In the case where the concepts shift only
Theory of Computing Systems, 2009
Mathematical Problems from Applied Logic II
This paper begins by briefly indicating the principal, non-standard motivations of the author for... more This paper begins by briefly indicating the principal, non-standard motivations of the author for his decades of work in Computability Theory (CT), a.k.a. Recursive Function Theory. Then it discusses its proposed, general directions beyond those from pure mathematics for CT. These directions are as follows. 1. Apply CT to basic sciences, for example, biology, psychology, physics, chemistry, and economics. 2. Apply the resultant insights from 1 to philosophy and, more generally, apply CT to areas of philosophy in addition to the philosophy and foundations of mathematics. 3. Apply CT for insights into engineering and other professional fields. Lastly, this paper provides a progress report on the above non-pure mathematical directions for CT, including examples for biology, cognitive science and learning theory, philosophy of science, physics, applied machine learning, and computational complexity. Interweaved with the report are occasional remarks about the future.
Information and Computation, 2016
Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliograf... more Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http://dnb.d-nb.de abrufbar. The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.d-nb.de ISBN 978-3-8487-3688-1 (Print) 978-3-8452-8040-0 (ePDF) British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN 978-3-8487-3688-1 (Print) 978-3-8452-8040-0 (ePDF) Library of Congress Cataloging-in-Publication Data Griech-Polelle, Beth A. The Nuremberg War Crimes Trial and its Policy Consequences Today 2 nd , revised and extended edition Beth A. Griech-Polelle (ed.) 274 pp. Includes bibliographic references and index. ISBN 978-3-8487-3688-1 (Print) 978-3-8452-8040-0 (ePDF) 2 nd , revised and extended edition, 2020 © Nomos Verlagsgesellschaft, Baden-Baden 2020. Gedruckt in Deutschland. Alle Rechte, auch die des Nachdrucks von Auszügen, der fotomechanischen Wiedergabe und der Übersetzung, vorbehalten. Gedruckt auf alterungsbeständigem Papier. This work is subject to copyright. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without prior permission in writing from the publishers. Under § 54 of the German Copyright Law where copies are made for other than private use a fee is payable to "Verwertungs gesellschaft Wort", Munich. No responsibility for loss caused to any individual or organization acting on or refraining from action as a result of the material in this publication can be accepted by Nomos or the editor.
Lecture Notes in Computer Science, 2008
In computational function learning in the limit, an algorithmic learner tries to nd a program for... more In computational function learning in the limit, an algorithmic learner tries to nd a program for a computable function g given successively more values of g, each time outputting a conjectured program for g. A learner is called postdictively complete i all available data is correctly postdicted by each conjecture. Akama and Zeugmann presented, for each choice of natural number δ, a relaxation to postdictive completeness: each conjecture is required to postdict only all except the last δ seen data points. This paper extends this notion of delayed postdictive completeness from constant delays to dynamically computed delays. On the one hand, the delays can be dierent for dierent data points. On the other hand, delays no longer need to be by a xed nite number, but any type of computable countdown is allowed, including, for example, countdown in a system of ordinal notations and in other graphs disallowing computable innitely descending counts. We extend many of the theorems of Akama and Zeugmann and provide some feasible learnability results. Regarding fairness in feasible learning, one needs to limit use of tricks that postpone output hypotheses until there is enough time to think about them. We see, for polytime learning, postdictive completeness (and delayed variants): 1. allows some but not all postponement tricks, and 2. there is a surprisingly tight boundary, for polytime learning, between what postponement is allowed and what is not. For example: 1. the set of polytime computable functions is polytime postdictively completely learnable employing some postponement, but 2. the set of exptime computable functions, while polytime learnable with a little more postponement, is not polytime postdictively completely learnable! We have that, for w a notation for ω, the set of exptime functions is polytime learnable with w-delayed postdictive completeness. Also provided are generalizations to further, small constructive limit ordinals.
Lecture Notes in Computer Science, 2013
Re learning in the limit from positive data, a major concern is which classes of languages are le... more Re learning in the limit from positive data, a major concern is which classes of languages are learnable with respect to a given learning criterion. We are particularly interested herein in the reasons for a class of languages to be unlearnable. We consider two types of reasons. One type is called topological where it does not help if the learners are allowed to be uncomputable (an example of Gold's is that no class containing an infinite language and all its finite sub-languages is learnable-even by an uncomputable learner). Another reason is called computational (where the learners are required to be algorithmic). In particular, two learning criteria might allow for learning different classes of languages from one another-but with dependence on whether the unlearnability is of type topological or computational. In this paper we formalize the idea of two learning criteria separating topologically in learning power. This allows us to study more closely why two learning criteria separate in learning power. For a variety of learning criteria, concerning vacillatory, monotone, (several kinds of) iterative and feedback learning, we show that certain learning criteria separate topologically, and certain others, which are known to separate, are shown not to separate topologically. Showing that learning criteria do not separate topologically implies that any known separation must necessarily exploit algorithmicity of the learner.
Topics in Cognitive Science, 2013
A U-shaped curve in a cognitive-developmental trajectory refers to a three-step process: good per... more A U-shaped curve in a cognitive-developmental trajectory refers to a three-step process: good performance followed by bad performance followed by good performance once again. U-shaped curves have been observed in a wide variety of cognitive-developmental and learning contexts. Ushaped learning seems to contradict the idea that learning is a monotonic, cumulative process and thus constitutes a challenge for competing theories of cognitive development and learning. U-shaped behaviour in language learning (in particular in learning English past tense) has become a central topic in the Cognitive Science debate about learning models. Antagonist models (e.g., connectionism vs. nativism) are often judged on their ability of modeling or accounting for U-shaped behaviour. The prior literature is mostly occupied with explaining how U-shaped behaviour occurs. Instead, we are interested in the necessity of this kind of apparently inefficient strategy. We present and discuss a body of results in the abstract mathematical setting of (extensions of) Gold-style computational learning theory addressing a mathematically precise version of the following question: Are there learning tasks that require U-shaped behaviour? All notions considered are learning in the limit from positive data. We present results about the necessity of U-shaped learning in classical models of learning as well as in models with bounds on the memory of the learner. The pattern emerges that, for parameterized, cognitively relevant learning criteria, beyond very few initial parameter values, U-shapes are necessary for full learning power! We discuss the possible relevance of the above results for the Cognitive Science debate about learning models as well as directions for future research.
Theoretical Computer Science, 2010
It is investigated for which choice of a parameter q, denoting the number of contexts, the class ... more It is investigated for which choice of a parameter q, denoting the number of contexts, the class of simple external contextual languages is iteratively learnable. On the one hand, the class admits, for all values of q, polynomial time learnability provided an adequate choice of the hypothesis space is given. On the other hand, additional constraints like consistency and conservativeness or the use of a one-one hypothesis space changes the picture-iterative learning limits the long term memory of the learner to the current hypothesis and these constraints further hinder storage of information via padding of this hypothesis. It is shown that if q > 3, then simple external contextual languages are not iteratively learnable using a class preserving one-one hypothesis space, while for q = 1 it is iteratively learnable, even in polynomial time. It is also investigated for which choice of the parameters, the simple external contextual languages can be learnt by a consistent and conservative iterative learner. 1 Supported by a Marie Curie International Fellowship within the 6th European Community Framework Programme. 2 Supported in part by NUS grant number R252-000-308-112. 3 Supported in part by NUS grant numbers R252-000-308-112 and R146-000-114-112.
Theoretical Computer Science, 2009
Iterative learning (It-learning) is a Gold-style learning model in which each of a learner's outp... more Iterative learning (It-learning) is a Gold-style learning model in which each of a learner's output conjectures may depend only upon the learner's current conjecture and the current input element. Two extensions of the It-learning model are considered, each of which involves parallelism. The first is to run, in parallel, distinct instantiations of a single learner on each input element. The second is to run, in parallel, n individual learners incorporating the first extension, and to allow the n learners to communicate their results. In most contexts, parallelism is only a means of improving efficiency. However, as shown herein, learners incorporating the first extension are more powerful than It-learners, and, collective learners resulting from the second extension increase in learning power as n increases. Attention is paid to how one would actually implement a learner incorporating each extension. Parallelism is the underlying mechanism employed.
Theoretical Computer Science, 2013
The Journal of Symbolic Logic, 1994
A generator program for a computable function (by definition) generates an infinite sequence of p... more A generator program for a computable function (by definition) generates an infinite sequence of programs all but finitely many of which compute that function. Machine learning of generator programs for computable functions is studied. To motivate these studies partially, it is shown that, in some cases, interesting global properties for computable functions can be proved from suitable generator programs which cannot be proved from any ordinary programs for them. The power (for variants of various learning criteria from the literature) of learning generator programs is compared with the power of learning ordinary programs. The learning power in these cases is also compared to that of learning limiting programs, i.e., programs allowed finitely many mind changes about their correct outputs.
SIAM Journal on Computing, 2006
The present work studies clustering from an abstract point of view and investigates its propertie... more The present work studies clustering from an abstract point of view and investigates its properties in the framework of inductive inference. Any class S considered is given by a hypothesis space, i.e., numbering, A 0 , A 1 ,. .. of nonempty recursively enumerable (r.e.) subsets of N or Q k. A clustering task is a finite and nonempty set of r.e. indices of pairwise disjoint such sets. The class S is said to be clusterable if there is an algorithm which, for every clustering task I, converges in the limit on any text for i∈I A i to a finite set J of indices of pairwise disjoint clusters such that j∈J A j = i∈I A i. A class is called semiclusterable if there is such an algorithm which finds a J with the last condition relaxed to j∈J A j ⊇ i∈I A i. The relationship between natural topological properties and clusterability is investigated. Topological properties can provide sufficient or necessary conditions for clusterability, but they cannot characterize it. On the one hand, many interesting conditions make use of both the topological structure of the class and a well-chosen numbering. On the other hand, the clusterability of a class does not depend on which numbering of the class is used as a hypothesis space for the clusterer. These ideas are demonstrated in the context of naturally geometrically defined classes. Besides the text for the clustering task, clustering of many of these classes requires the following additional information: the class of convex hulls of finitely many points in a rational vector space can be clustered with the number of clusters as additional information. Interestingly, the class of polygons (together with their interiors) is clusterable if the number of clusters and the overall number of vertices of these clusters is given to the clusterer as additional information. Intriguingly, this additional information is not sufficient for classes including figures with holes. While some classes are unclusterable due to their topological structure, others are only computationally intractable. An oracle might permit clustering all computationally intractable clustering tasks but fail on some classes which are topologically difficult. It is shown that an oracle E permits clustering all computationally difficult classes iff E ≥ T K ∧ E ≥ T K. Furthermore, no 1-generic oracle below K and no 2-generic oracle permits clustering any class which is not clusterable without an oracle.
SIAM Journal on Computing, 1999
Some extensions are considered of Gold's influential model of language learning by machine from p... more Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there are classes of languages that can be learned if convergence in the limit to up to (n + 1) exactly correct grammars is allowed but which cannot be learned if convergence in the limit is to no more than n grammars, where the no more than n grammars can each make finitely many mistakes. This contrasts sharply with results of Barzdin and Podnieks and, later, Case and Smith for learnability from both positive and negative data. A subset principle from a 1980 paper of Angluin is extended to the vacillatory and other criteria of this paper. This principle provides a necessary condition for avoiding overgeneralization in learning from positive data. It is applied to prove another theorem to the effect that one can optimally eliminate half of the mistakes from final programs for vacillatory criteria if one is willing to converge in the limit to infinitely many different programs instead. Child language learning may be sensitive to the order or timing of data presentation. It is shown, though, that for the vacillatory success criteria of this paper, there is no loss of learning power for machines which are insensitive to order in several ways simultaneously. For example, partly set-driven machines attend only to the set and length of sequence of positive data, not the actual sequence itself. A machine M is weakly n-ary order independent def ⇔ for each language L on which, for some ordering of the positive data about L, M converges in the limit to a finite set of grammars, there is a finite set of grammars D (of cardinality ≤ n) such that M converges to a subset of this same D for each ordering of the positive data for L. The theorem most difficult to prove in the paper implies that machines which are simultaneously partly set-driven and weakly n-ary order independent do not lose learning power for converging in the limit to up to n grammars. Several variants of this theorem are obtained by modifying its proof, and some of these variants have application in this and other papers. Along the way it is also shown, for the vacillatory criteria, that learning power is not increased if one restricts the sequence of positive data presentation to be computable. Some of these results are nontrivial lifts of prior work for the n = 1 case done by the Blums; Wiehagen; Osherson, Stob, and Weinstein; Schäfer; and Fulk.
Machine Learning, 2008
This paper solves an important problem left open in the literature by showing that U-shapes are u... more This paper solves an important problem left open in the literature by showing that U-shapes are unnecessary in iterative learning from positive data. A U-shape occurs when a learner first learns, then unlearns, and, finally, relearns, some target concept. Iterative learning is a Gold-style learning model in which each of a learner's output conjectures depends only upon the learner's most recent conjecture and input element. Previous results had shown, for example, that U-shapes are unnecessary for explanatory learning, but are necessary for behaviorally correct learning. Work on the aforementioned problem led to the consideration of an iterative-like learning model, in which each of a learner's conjectures may, in addition, depend upon the number of elements so far presented to the learner. Learners in this new model are strictly more powerful than traditional iterative learners, yet not as powerful as full explanatory learners. Can any class of languages learnable in this new model be learned without U-shapes? For now, this problem is left open.
Logical Methods in Computer Science, 2013
The present work determines the exact nature of {\em linear time computable} notions which charac... more The present work determines the exact nature of {\em linear time computable} notions which characterise automatic functions (those whose graphs are recognised by a finite automaton). The paper also determines which type of linear time notions permit full learnability for learning in the limit of automatic classes (families of languages which are uniformly recognised by a finite automaton). In particular it is shown that a function is automatic iff there is a one-tape Turing machine with a left end which computes the function in linear time where the input before the computation and the output after the computation both start at the left end. It is known that learners realised as automatic update functions are restrictive for learning. In the present work it is shown that one can overcome the problem by providing work tapes additional to a resource-bounded base tape while keeping the update-time to be linear in the length of the largest datum seen so far. In this model, one additiona...
Journal of Experimental & Theoretical Artificial Intelligence, 1992
Suppose LC 1 and LC 2 are two machine learning classes each based on a criterion of success. Supp... more Suppose LC 1 and LC 2 are two machine learning classes each based on a criterion of success. Suppose, for every machine which learns a class of functions according to the LC 1 criterion of success, there is a machine which learns this class according to the LC 2 criterion. In the case where the converse does not hold LC 1 is said to be separated from LC 2. It is shown that for many such separated learning classes from the literature a much stronger separation holds: (∀C ∈ LC 1)(∃C ∈ (LC 2 − LC 1))[C ⊃ C]. It is also shown that there is a pair of separated learning classes from the literature for which the stronger separation just above does not hold. A philosophical heuristic toward the design of artificially intelligent learning programs is presented with each strong separation result.
Journal of Computer and System Sciences, 1995
Gold-style language learning is a formal theory of learning from examples by algorithmic devices ... more Gold-style language learning is a formal theory of learning from examples by algorithmic devices called learning machines. Originally motivated by child language learning, it features the algorithmic synthesis (in the limit) of grammars for formal languages from information about those languages. In traditional Gold-style language learning, learning machines are not provided with negative information, i.e., information about the complements of the input languages. We investigate two approaches to providing small amounts of negative information and demonstrate in each case a strong resulting increase in learning power. Finally, we show that small packets of negative information also lead to increased speed of learning. This result agrees with a psycholinguistic hypothesis of McNeill correlating the availability of parental expansions with the speed of child language development.