Pigeons can learn a difficult discrimination if reinforcement is delayed following choice (original) (raw)

Discriminative stimuli that follow a delay have added value for pigeons

Psychonomic bulletin & review, 2004

Clement, Feltus, Kaiser, and Zentall (2000) reported that pigeons prefer discriminative stimuli that require greater effort (more pecks) to obtain over those that require less effort. In the present experiment, we examined two variables associated with this phenomenon. First, we asked whether delay of reinforcement, presumably a relatively aversive event similar to effort, would produce similar effects. Second, we asked whether the stimulus preference produced by a prior relatively aversive event depends on its anticipation. Anticipation of delay was accomplished by signaling its occurrence. Results indicated that delays can produce preferences similar to those produced by increased effort, but only if the delays are signaled.

Pigeons Prefer Conditional Stimuli Over Their Absence: A Comment on Roberts et al. (2009)

2010

Recently, have suggested that pigeons performing delayed matching-to-sample appear unwilling to request to see the sample again (or even for the first time) prior to choice, even if that choice would result in an increase in matching accuracy. In each of their four experiments, however, presentation (Experiments 3 & 4) or representation of the sample (Experiments 1 & 2) resulted in an added delay to reinforcement. Thus, the pigeons had to choose between an immediate reinforcer on about 50% of the trials and a delayed reinforcer on a significantly higher percentage of the trials. In the present research, when we equated the two alternatives for delay to reinforcement, we found that pigeons generally showed a significant preference for trials with a relevant sample over trials with an irrelevant sample. When the contingencies were reversed, most of the pigeons reversed their preference. Although these results do not present evidence for metacognition, they do show that pigeons are sensitive to the potential for a higher probability of reinforcement when delay to reinforcement is controlled.

Pigeons prefer discriminative stimuli independently of the overall probability of reinforcement and of the number of presentations of the conditioned reinforcer

Journal of Experimental Psychology: Animal Behavior Processes, 2012

When pigeons are given a choice between two alternatives, one leading to a stimulus 20% of the time that always signals reinforcement (Sϩ) or another stimulus 80% of the time that signals the absence of reinforcement (SϪ) and the other alternative leading to one of two stimuli each signaling reinforcement 50% of the time, the 20% reinforcement alternative is preferred although it provides only 40% as much reinforcement. In Phase 1 of the present experiment, we tested the hypothesis that pigeons compare the Sϩ associated with each alternative and ignore the SϪ by giving them a choice between two pairs of discriminative stimuli (20% Sϩ, 80% SϪ and 50% Sϩ, 50% SϪ). Reinforcement theory suggests that the alternative associated with more reinforcement should be preferred but the pigeons showed indifference. In Phase 2, the pigeons were divided into two groups. For one group, the discriminative function was removed from the 50% reinforcement alternative and a strong preference for the 20% reinforcement alternative was found. For the other group, the discriminative function was removed from both alternatives and a strong preference was found for the 50% reinforcement alternative. Thus, the indifference found in Phase 1 was not due to the absence of discriminability of the differential reinforcement associated with the two alternatives (20% vs. 50% reinforcement); rather, the indifference can be attributed to the pigeons' insensitivity to the differential frequency of the two Sϩ and two SϪ stimuli. The relevance to human gambling behavior is discussed.

The nature of discrimination learning in pigeons

Learning & Behavior, 2008

remain influential. For example, he rejected the noncontinuity view that during a discrimination, animals try out a succession of hypotheses about which stimulus will signal a reward until they discover the correct one and the problem is solved (e.g., Krechevsky, 1932; Lashley, 1929). Instead, he advocated a continuity position, whereby learning at the outset of a discrimination was assumed to progress incrementally with all the stimuli that were perceived (Spence, 1940). The importance of this proposal is apparent, because it has been incorporated into the majority of currently influential theories of associative learning (e.g., Rescorla & Wagner, 1972). Another enduring influence of the theory has been the proposal that after subjects have received a discrimination between two stimuli from the same dimension, responding to those stimuli, as well as to others from the same dimension, is determined by the interaction between excitatory and inhibitory generalization gradients (see Figure 1). The importance of this proposal is apparent: The majority of current theories assume that the solution of many discriminations involves the acquisition of excitatory and inhibitory associative strength. Furthermore, they assume that the algebraic interaction between these opposing strengths will determine the magnitude of a conditioned response (e.g., Pearce, 1994; Rescorla & Wagner, 1972). Other aspects of Spence's theory have not withstood the test of time. The theory has been described as nonselective because the increment in associative strength to a stimulus on a trial is assumed to be unaffected by the associative properties of the stimuli that accompany it. This assumption leads to a number of incorrect predictions about a variety of experimental findings. One such finding is the relativevalidity effect, which was first demonstrated by Wagner, Logan, Haberlandt, and Price (1968). An experimental group received a true discrimination, AX /BX , in which a compound of two stimuli signaled reward (AX), and a compound of two stimuli signaled the absence of reward (BX). A control group received a pseudodiscrimination, AX /BX , in which both compounds signaled food on half the trials. Stimulus X was thus paired with food according to the same intermittent reinforcement schedule for both groups. It follows from the nonselective principle that these different treatments will result in similar rates of responding during X, if it should subsequently be presented by itself. In contrast to this prediction, Wagner et al. discovered that the rate of responding during a test with X was stronger after training with the pseudo-than with the true discrimination. A variety of explanations have been developed for the relative-validity effect, and they reveal the different directions that theorizing about the nature of discrimination learning in animals has taken since Spence put forward his ideas. Some theories have replaced Spence's nonselective principle with the idea that the growth of associative strength by a stimulus is affected by the associative properties of the stimuli that accompany it. The most influential theory to make this assumption was proposed by

Choice and percentage reinforcement in pigeons

Animal Learning & Behavior, 1976

Pigeons responded on a two-key concurrent chains choice procedure with the same level of percentage reinforcement on each key. During the initial links, a choice response on either key occasionally produced a conditioned reinforcer-which on one key was associated with a 15-sec, and on the other key with a 30-sec, interreinforcement interval-or an extinction stimulus. In Part 1, the initial links were equal. With successive decreases in the probability of a reinforcer, choice shifted from preference for the 15-sec terminal link toward indifference. In Part 2, the initial links were unequal and were arranged so that the shorter initial link preceded the 30-sec terminalIink. At a high probability of a reinforcer, the pigeons again preferred the 15-sec terminalIink. However, at a low probability, the pigeons reversed and preferred the alternate key. It was concluded that the conditioned reinforcers tended to become functionally equivalent at a low probability of a reinforcer, despite the nominally different interreinforcement intervals, with the result that choice was then modulated by the relative size of the initial links. The data are inconsistent with the view that choice and the strength of conditioned reinforcers are isomorphic with the reduction in delay to reward correlated with terminal link stimuli.

Pigeons may not remember the stimuli that reinforced their recent behavior

Journal of the Experimental Analysis of Behavior, 2000

In two experiments the conditioned reinforcing and delayed discriminative stimulus functions of stimuli that signal delays to reinforcement were studied. Pigeons' pecks to a center key produced delayed-matching-to-sample trials according to a variable-interval 60-s (or 30-s in 1 pigeon) schedule (Experiment 1) or a multiple variable-interval 20-s variable-interval 120-s schedule (Experiment 2). The trials consisted of a 2-s illumination of one of two sample key colors followed by delays ranging across phases from 0.1 to 27.0 s followed in turn by the presentation of matching and nonmatching comparison stimuli on the side keys. Pecks to the key color that matched the sample were reinforced with 4-s access to grain. Under some conditions of Experiment 1, pecks to nonmatching comparison stimuli produced a 4-s blackout and the start of the next interval. Under other conditions of Experiment 1 and each condition of Experiment 2, pecks to nonmatching stimuli had no effect and trials ended only when pigeons pecked the other, matching stimulus and received food. The functions relating pretrial response rates to delays differed markedly from those relating matching-to-sample accuracy to delays. Specifically, response rates remained relatively high until the longest delays (15.0 to 27.0 s) were arranged, at which point they fell to low levels. Matching accuracy was high at short delays, but fell to chance at delays between 3.0 and 9.0 s. In Experiment 2, both matching accuracy and response rates remained high over a wider range of delays in the variable-interval 120-s component relative to the variable-interval 20-s component. The difference in matching accuracy between the components was not due to an increased tendency in the variable-interval 20-s component toward proactive interference following short intervals. Thus, under these experimental conditions the conditioned reinforcing and the delayed discriminative functions of the sample stimulus depended on the same variables (delay and variable-interval value), but were nevertheless dissociated.

Choice and the Initial Delay to a Reinforcer

The Psychological Record, 2008

Pigeons were trained in two experiments that used the concurrent-chains procedure. These experiments sought to identify the variables controlling the preference of pigeons for a constant duration over a variable duration of exposure to an aperiodic, time-based, terminal-link schedule. The results indicated that two variables correlated with the constant-duration terminal link combined to control preference: (a) a shorter initial delay to a reinforcer; and (b) the probabilistic occurrence of multiple reinforcers. Grace and Nevin (2000) trained pigeons on a concurrent-chains procedure with equal variable-interval (VI) schedules in the initial links and equal VI schedules in the terminal links. The terminal links differed in that one ended after a single reinforcer, which they called "variable-duration" terminal link, whereas the other ended after a fixed period of exposure equal to the average interreinforcement interval (IRI) of the schedule, which they called "constantduration" terminal link. As Grace and Nevin identified, and as discussed at some length below, an important feature of the constant-duration terminal link is that it probabilistically yielded 0, 1, or multiple reinforcers per entry, although it provided the same average rate of reinforcement overall as the variable-duration terminal link. Grace and Nevin (2000) found that three of four pigeons clearly preferred the constant-duration terminal link. In their words, the data of a fourth pigeon "demonstrated a consistent right-key bias" (p. 178), and the present conclusion is that its data are more difficult to interpret. In any case, an important question is what variables caused the preference. Ordinarily, one would have expected the pigeons to be indifferent, since the schedules in effect during the alternatives were identical, and each alternative yielded the same overall rate of reinforcement. Grace and Nevin (2000) initially pondered the role of multiple reinforcers in the constant-duration terminal link, because research has shown that subjects may well prefer a choice alternative associated with multiple reinforcers rather than a single reinforcer per terminal-link entry (e.g.,

Do pigeons prefer information in the absence of differential reinforcement?

Learning & Behavior, 2012

Prior research has indicated that pigeons do not prefer an alternative that provides a sample (for matching to sample) over an alternative that does not provide a sample (i.e., there is no indication of which comparison stimulus is correct). However, Zentall and Stagner (Journal of Experimental Psychology. Animal Behavior Processes 36: [506][507][508][509] 2010) showed that when delay of reinforcement was controlled, pigeons had a strong preference for matching over pseudomatching (i.e., there was a sample, but it did not indicate which comparison stimulus was correct). Experiment 1 of the present study replicated and extended the results of the Zentall and Stagner (Journal of Experimental Psychology. Animal Behavior Processes 36:506-509, 2010) study by including an identity relation between the sample and one of the comparison stimuli in both the matching and pseudomatching tasks. In Experiment 2, in which we asked whether the pigeons would still prefer matching if we equated the two tasks for probability of reinforcement, we found no systematic preference for matching over pseudomatching. Thus, it appears that in the absence of differential reinforcement, the information provided by a sample that signals which of the two comparison stimuli is correct is insufficient to produce a preference for that alternative.

Responding of pigeons under variable-interval schedules of signaled-delayed reinforcement: effects of delay-signal duration

Journal of the Experimental Analysis of Behavior, 1990

Two experiments with pigeons examined the relation of the duration of a signal for delay ("delay signal") to rates of key pecking. The first employed a multiple schedule comprised of two components with equal variable-interval 60-s schedules of 27-s delayed food reinforcement. In one component, a short (0.5-s) delay signal, presented immediately following the key peck that began the delay, was increased in duration across phases; in the second component the delay signal initially was equal to the length of the programmed delay (27 s) and was decreased across phases. Response rates prior to delays were an increasing function of delay-signal duration. As the delay signal was decreased in duration, response rates were generally higher than those obtained under identical delay-signal durations as the signal was increased in duration. In Experiment 2 a single variable-interval 60-s schedule of 27-s delayed reinforcement was used. Delay-signal durations were again increased gradually across phases. As in Experiment 1, response rates increased as the delay-signal duration was increased. Following the phase during which the signal lasted the entire delay, shorter delay-signal-duration conditions were introduced abruptly, rather than gradually as in Experiment 1, to determine whether the gradual shortening of the delay signal accounted for the differences observed in response rates under identical delay-signal conditions in Experiment 1. Response rates obtained during the second exposures to the conditions with shorter signals were higher than those observed under identical conditions as the signal duration was increased, as in Experiment 1. In both experiments, rates and patterns of responding during delays varied greatly across subjects and were not systematically related to delay-signal durations. The effects of the delay signal may be related to the signal's role as a discriminative stimulus for adventitiously reinforced intradelay behavior, or the delay signal may have served as a conditioned reinforcer by virtue of the temporal relation between it and presentation of food.