Logic programming on a neural network (original) (raw)
Logic Programming on a Neural Network
Wan Ahmad Tajuddin Wan Abdullah
Jabatan Fizik, Universiti Malaya, 59100 Kuala Lumpur, Malaysia
Abstract
We propose a method of doing logic programming on a Hopfield neural network. Optimization of logical consistency is carried out by the network after the connection strengths are defined from the logic program; the network relaxes to neural states corresponding to a valid (or near-valid) interpretation.
I. INTRODUCTION
Neural networks or connectionist architectures have emerged as a new computational paradigm (see, e.g., Wan Abdullah 1{ }^{1} ). It departs from the traditional von Neumann serial processing and instead is based on distributed processing via connections between simple elements. This is motivated by biology, and, indeed, there are efforts to construct neural networks as close to biological reality as possible, but at the very least the basic idea of processing in the connections offers new and alternative ways of computation. 2{ }^{2} An immediate problem would then be to implement on such architectures applications which are currently implementable on serial machines. One such application is logic programming, 3{ }^{3} which conventionally involves a database of declared knowledge, and a sequential procedure, resolution, to prove or disprove goals based on this knowledge.
There are various versions of the neural network, of which the symmetric (of connections), densely connected Hopfield model 4{ }^{4} is one. This network has been shown 5{ }^{5} to evolve in such a way as to minimize a configurational energy function and can thus be used for solving combinatorial optimization problems and that adequately good solutions to hard problems can be found in linear time or less. Proving/disproving goals by resolution can be hard: many cases have been shown 6{ }^{6} to occur where a family of clauses is unsatisfiable but the resolution proof of its unsatisfiability takes exponential time if carried out sequentially. Here we propose a method of implementing logic programming on a symmetric neural network which uses combinatorial optimization, to benefit from the advantage of faster solution (in principle, at least) due to the inherent parallelism of the network. There have been proposals 7−11{ }^{7-11} for inference mechanisms on
neural networks which involve structures and procedures that seem artificial and constructed; we are rather interested here in those “fundamental” ones in which all neurons play more or less uniform roles.
Previously, 12{ }^{12} we have shown how propositional logic programming can be done on the model of a single neuron. In this article, we propose a method of doing logical inference through the minimization of logical inconsistencies on a symmetric neural network. This technique is in general applicable to wellformed formulae, but we will concentrate on its application to Horn clauses.
We briefly review the relevant aspects of Hopfield neural networks in the next section and show how logic programming can be carried out in Section III. As we shall see, a network with multiconnections 13−18{ }^{13-18} is generally required. However, such a network has similar energy-minimizing dynamics to the conventional version, and indeed, the former actually has a larger storage capacity, 13−15{ }^{13-15} has improved attractivity of stored patterns, 16{ }^{16} and is better at pattern discrimination. 17{ }^{17} We conclude in the last section.
II. NEURAL NETWORKS
A neuron ii can formally be modeled as a two-state (binary) element ViV_{i} whose state depends on the input from other neurons jj via connections TijT_{i j} of various strengths (positive or negative):
Vi(t+1)=Q{hj(t)}=Q[∑jTijVj(t)−Ui]V_{i}(t+1)=Q\left\{h_{j}(t)\right\}=Q\left[\sum_{j} T_{i j} V_{j}(t)-U_{i}\right]
where QQ is a step function ( QQ is 1 if its argument is greater than zero, and 0 otherwise), tt is a measure of time in neural processing units, UiU_{i} is a threshold value, and the sum is over all NN neurons jj. If TijT_{i j} is zero-diagonal and symmetric, we may write a configurational energy 5{ }^{5}
E(t)=−(1/2)∑i≠jTijVi(t)Vj(t)+∑iUiVi(t)E(t)=-(1 / 2) \sum_{i \neq j} T_{i j} V_{i}(t) V_{j}(t)+\sum_{i} U_{i} V_{i}(t)
which is monotone decreasing with the evolution of Vi(t)V_{i}(t). Thus, depending on the initial configuration Vi(0)V_{i}(0), the system evolves into a configuration for which EE is a minimum.
By mapping neurons to switches indicating choices in a combinatorial optimization problem, we can arrive at the combination with least cost if we equate EE to the cost function and thereby define the values for connection strengths, and allow the network to relax. Adequately good solutions for difficult problems like the traveling salesman problem have been shown 5{ }^{5} to be achieved in linear time or less in this way. The complexity which is spread over time in a sequential machine is spread over space in the massively connected network.
Multiconnected neurons 13−18{ }^{13-18} may be required by, and is used in, some optimization problems. 19,20{ }^{19,20} These neurons are simply higher-order generalizations which allow some multiplicative input; for example, if third-order interactions are allowed, then
hi(t)=∑j≠kTijkVj(t)Vk(t)+∑jTijVj(t)−Uih_{i}(t)=\sum_{j \neq k} T_{i j k} V_{j}(t) V_{k}(t)+\sum_{j} T_{i j} V_{j}(t)-U_{i}
giving
E(t)=−(1/3)∑i≠j≠k∑jTijkVj(t)Vj(t)Vk(t)−(1/2)∑i≠jTijVi(t)Vj(t)+∑iUiVi(t)\begin{aligned} E(t)= & -(1 / 3) \sum_{i \neq j \neq k} \sum_{j} T_{i j k} V_{j}(t) V_{j}(t) V_{k}(t) \\ & -(1 / 2) \sum_{i \neq j} T_{i j} V_{i}(t) V_{j}(t)+\sum_{i} U_{i} V_{i}(t) \end{aligned}
where Tijk=Tikj=Tjki=Tjik=Tkij=TkjiT_{i j k}=T_{i k j}=T_{j k i}=T_{j i k}=T_{k i j}=T_{k j i}. Multiconnections can actually be simulated by extra neurons: the product VjVkV_{j} V_{k} is given by VjV_{j} if for example Tij=Tik=1T_{i j}=T_{i k}=1 and Uj=1.5U_{j}=1.5.
We show below how logic programming can be interpreted as a problem of optimization and implemented on a neural network.
III. LOGIC PROGRAMMING
A logic program consists of clauses with implied conjunction between them (for a mathematical treatment of logic programming, see Lloyd 21{ }^{21} ). A propositional example may be
P=A←B,C∧D←B∧C←\begin{array}{cc} P= & A \leftarrow B, C \\ \wedge & D \leftarrow B \\ \wedge & C \leftarrow \end{array}
Given the goal
←G\leftarrow G
we require to show that P∧¬GP \wedge \neg G is inconsistent in order to prove the goal. Alternatively, we require to find an interpretation for the Herbrand base of the problem which is consistent with PP (i.e., which yields PP true) and examine the truth of GG in such an interpretation. If we assign the values 1 to true and 0 to false then ¬P=0\neg P=0 indicates a consistent interpretation while ¬P=1\neg P=1 reveals that at least one of the clauses in the program is not satisfied. Therefore, looking for a consistent interpretation is a combinatorial (of assigning truth values to ground atoms) minimization of the inconsistency, the value of ¬P\neg P.
Since in our example
¬P=(¬A∧B∧C)∨(¬D∧B)∨(¬C)\neg P=(\neg A \wedge B \wedge C) \vee(\neg D \wedge B) \vee(\neg C)
we may write a cost function to be minimized as follows:
EP=(1−VA)VBVC+(1−VD)VB+(1−VC)E_{P}=\left(1-V_{A}\right) V_{B} V_{C}+\left(1-V_{D}\right) V_{B}+\left(1-V_{C}\right)
where the neurons VAV_{A}, etc., represent the truth values of AA, etc. Notice that for binary VAV_{A}, and so forth, EPE_{P} can at least be 0 corresponding to the satisfaction of all the clauses. As we have chosen arithmetic addition to represent logical disjunction, the value of EPE_{P} depends on the number of clauses satisfied by the interpretation-the more the clauses unsatisfied, the bigger the value of EPE_{P}.
Minimum EPE_{P} corresponds to the most consistent selection of truth value assignments. The cost function yields
Tijk=1/2 if i,j,k=A,B,C in any order =0 otherwise Tij=−1 if i,j=B,C in any order =1 if i,j=D,B in any order =0 otherwise Ui=1 if i=B=−1 if i=C=0 otherwise \begin{aligned} T_{i j k}= & 1 / 2 \text { if } i, j, k=A, B, C \text { in any order } \\ & =0 \quad \text { otherwise } \\ T_{i j}= & -1 \quad \text { if } i, j=B, C \text { in any order } \\ & =1 \quad \text { if } i, j=D, B \text { in any order } \\ & =0 \quad \text { otherwise } \\ & U_{i}=1 \quad \text { if } i=B \\ & =-1 \quad \text { if } i=C \\ & =0 \quad \text { otherwise } \end{aligned}
There are several things to notice here. First, with respect to the cost function, the addition of more rules or facts to the database is simply additive in TijT_{i j}, and so forth. This eases knowledge acquisition or learning. Second, the contributions to TijkT_{i j k}, and so forth, are as expected through a straightforward logical interpretation of the neural connections, for example, the rule
A←B,CA \leftarrow B, C
results in TABCT_{A B C} being increased, which can be envisaged as follows:
(while TBCT_{B C} and TCBT_{C B} are decreased to offset symmetrical increases in TBCAT_{B C A}, etc.). Lastly, notice also that multiconnections are needed when there are long clauses like the first one in the example logic program.
So far we have not mentioned variables. One way of dealing with universal quantification is to replace all clauses with variables with the corresponding clauses with all possible instantiations of the variables. This exercise terminates and is possible for the limited number of clauses and terms in the certain logic program; though the resulting number of neurons necessary can be very large if for example natural numbers (up to a certain higher limit) are dealt with (conventional logic programming gets away by having built-in predicates for these numbers). On a neural network we have the advantage of, in principle at least, having as many neurons as we want, while not causing the solution time to deteriorate. Neurons can then represent whole (variable-free) relations like rel1(arg1,arg2), rel1(arg1,arg3), . . . , rel2(arg1,arg2), and so forth. Variables in functions too are dealt with similarly. In principle the set of ground instances
can be infinite if functors are used recursively, but when we limit ourselves to only the occurrences in the certain logic program that we are dealing with, the instantiations terminate though, again, it can be very large.
The above example illustrates how we may do logic programming on a neural network. The procedure below is followed:
(1) List all clauses as given in program. Clauses with variables are replaced by all possible instantiations from the Herbrand base.
(2) Identify a neuron to each ground atom.
(3) Initiate all connection strengths to zero. Multiconnections of up to degree pp are needed where pp is the length (number of ground atoms) of the longest clause. For each clause, modify the appropriate connections in a similar way as demonstrated in the example above. Note that for each clause two kinds of modifications are made: the connections corresponding to the conjunction of all the atoms (antecedent and consequent) are strengthened, and negative values are added to connections corresponding to the conjunction of atoms only from the antecedent.
(4) Initiate neural states to probable values where known (e.g., when the truth of a ground atom is given by an assertion), or to random values otherwise. (Appropriate neurons may also be “clamped” to the known values: one way is to increase or decrease their thresholds by large amounts.)
(5) Let the neural network evolve until an energy minimum is reached. The neural states then provide a solution interpretation for the logic program, and the truth of a ground atom in this interpretation may be checked. If the goal sought involved variables, then the set of appropriate ground atoms are checked.
A thing to note is that the neural network would also yield local minima as well as the global minimum, depending on the initial configuration. One way to escape local minima in order to find the global minimum is to employ simulated annealing, 22,23{ }^{22,23} where we resort to stochastic dynamics for the neurons:
Vi(t+1)=1 with probability 1/{1+exp(−hi(t)/T)}V_{i}(t+1)=1 \text { with probability } 1 /\{1+\exp \left(-h_{i}(t) / T\right)\}
starting with a finite value for TT and slowly decreasing it to zero (notice that previously we have the T=0T=0 special case). This allows the system to climb over energy barriers around local minima, and continue the search for a more global minimum. This method has been found 20{ }^{20} to yield, even for multiconnected neurons, near-optimal solutions in very short times. There are also other methods 24,25{ }^{24,25} for global minimum search. The ease of escape from local minima to more global ones corresponding to better solutions can perhaps be associated with creativity.
As a simple experiment, the following logic program was executed on a neural network:
ParseError: KaTeX parse error: Expected 'EOF', got '_' at position 85: …), \text { Have_̲feathers }(x) .…
The network, albeit simple, relaxes to the global minimum in 11 out of 16 trials without simulated annealing. There is a local minimum where Have_feathers
(Tweety) is incorrectly taken to be false. The relaxation times are typically 2 updates/neuron. The definition of the step function QQ being zero if its argument is zero discourages ground atoms from being true and provides interpretations rather similar to those with negation as failure. 26{ }^{26}
IV. CONCLUSION
We have proposed a method of doing logic programming on a neural network. Although the formalism sacrifices the completeness of solutions obtained by the exhaustive conventional resolution-in-time, and the procedural interpretation of clauses which enables built-in predicates useful for numbers, it can easily handle recursive clauses which could cause endless loops for resolution-in-time.
This formalism also introduces several extra features. First, it can easily be generalized to non-Horn clauses and to higher-order logics. Also, some kind of nonmonotonicity is allowed in the sense that even when clauses in the logic program are inconsistent with each other, the neural network still proposes a solution, which is an interpretation which gives the least logical inconsistency as defined by the energy function. The clauses can be given different weightings in the energy function to reflect some degree of truth associated with each of them: clauses with larger weightings tend to be more often satisfied.
Closely associated to neural networks is the concept of learning 27{ }^{27} where strengths of useful or correct connections are increased, and vice versa. This, when viewed with the logic programming mechanism proposed, allows logical rules to be learned. 28{ }^{28}
We have attempted to integrate a fundamental symbolic processing mechanism, logical inference, with a biologically favored structure, neural networks, at a level where we look for basic underlying principles. This has revised ideas on inference (see also Caianiello 29{ }^{29} for a general discussion on inference on neural networks). We do not claim that it is in reality what goes on in the human brain, but, on the other hand neither do we reject the possibility that it may give clues to it. 30{ }^{30} At the very least, it provides a method for the parallel implementation of logic programs with adequately good solutions in nonexponential time.
This research was partially supported by MPKSN Grant R&D 4/41/01.
References
W.A.T. Wan Abdullah, “The connectionist paradigm,” Proc. Ist Natl. Comput. Sci. Conf., Kuala Lumpur, Jan. 1989, pp. 95-111.
W.A.T. Wan Abdullah, “Computations with neural networks and neural-networklike structures,” In Computational Techniques and Applications: CTAC-87, J. Noye and C. Fletcher (Eds.), Elsevier, North-Holland, Amsterdam, 1988.
R. Kowalski, Logic for Problem Solving, North-Holland, New York, 1979.
J.J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proc. Natl. Acad. Sci. USA, 79, 2554-2558 (1982).
J.J. Hopfield and D.W. Tank, “‘Neural’ computation of decisions in optimization problems,” Biol. Cybern., 52, 141-152 (1985).
V. Chvatal and E. Szemeredi, “Many hard examples for resolution,” J. ACM, 35, 759-768 (1988).
D.H. Ballard, Parallel Logical Inference and Energy Minimization, Technical Report TR 142, Computer Science Dept., Univ. Rochester, March 1986.
D.S. Touretzky and G.E. Hinton, A Distributed Connectionist Production System, Technical Report CMU-CS-86-172, Dept. Computer Science, Carnegie-Mellon Univ., Dec. 1986.
D.S. Touretzky and M.A. Derthick, “Symbol structures in connectionist networks: Five properties and two architectures,” Proc. IEEE COMPCON, Spring, 1987, San Francisco, Feb. 1987.
M. Derthick, “A connectionist architecture for representing and reasoning about structured knowledge,” Proc. Ninth Annual Conf. Cognitive Science Soc., Lawrence Erlbaum, 1987, pp. 131-142.
L. Shastri, “Default reasoning in semantic networks: A formalization of recognition and inheritance,” Artificial Intelligence, 39, 283-355 (1989).
W.A.T. Wan Abdullah, “Biologic,” Cybernetica, 31, 245-251 (1988).
P. Baldi and S.S. Venkatesh, “Number of stable points for spin-glasses and neural networks of higher orders,” Phys. Rev. Lett., 58, 913-916 (1987).
E. Gardner, “Multiconnected neural network models,” J. Phys. A: Math. Gen., 20, 3453-3464 (1987).
L.F. Abbott and Y. Arian, “Storage capacity of generalized networks,” Phys. Rev. A, 36, 5091-5094 (1987).
L. Personnaz, I. Guyon, and G. Dreyfus, “High-order neural networks: Information storage without errors,” Europhys. Lett. 4, 863-867 (1987).
G.A. Kohring, “Neural networks with many-neuron interactions,” J. Phys. France, 51, 145-155 (1990).
R.M.C. de Almeida and J.R. Iglesias, “An alternative model for neural networks,” Phys. Lett. A, 146, 239-244 (1990).
W.A.T. Wan Abdullah, “Dendritic trees and nonquadratic combinatorial optimisation,” Malaysian J. Sci., 9, 105-109 (1987).
H. Mueller-Krumbhaar, “Fuzzy logic, mm-spin glasses and 3-SAT,” Europhys. Lett., 7, 479-484 (1988).
J.W. Lloyd, Foundations of Logic Programming, Springer-Verlag, Berlin, 1984.
S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, “Optimization by simulated annealing,” Science, 220, 671-680 (1983).
S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images,” IEEE Trans. Pattern Analysis & Machine Intell., PAMI-6, 721-741 (1986).
P. Rujan, “Searching for optimal configurations by simulated tunneling,” Z. Phys. B-Condensed Matter, 73, 391-416 (1988).
G. Dueck and T. Scheuer, “Threshold accepting: A general purpose optimization algorithm appearing superior to simulated annealing,” Journal of Computational Physics, 90, 161-175 (1990).
K.L. Clark, “Negation as failure,” In Logic and Databases, Gallaire and Minker (Eds.), Plenum Press, New York, 1978, pp. 293-322.
D.O. Hebb, The Organization of Behavior, Wiley, New York, 1949.
W.A.T. Wan Abdullah, “The logic of neural networks,” work in preparation.
P. Caianiello, “Neural models, structure and learning,” In Parallel Architectures and Neural Networks, E.R. Caianiello (Ed.), World Scientific, Singapore, 1989.
W.A.T. Wan Abdullah, “A connectionist epistemology,” Cybernetica, 34, 75-83 (1991).