Choice in a variable environment: every reinforcer counts (original) (raw)

Some effects of reinforcer availability on the pigeon’s responding in 24-hour sessions

Animal Learning & Behavior, 1981

Restrictions on food availability produced by schedules of reinforcement were examined in three homing pigeons continuously housed in operant chambers. Total daily access to food was free to vary and depended on the subject's contact with the schedule in effect. Experiment 1 varied reinforcer duration within a continuous reinforcement schedule in order to provide a description of the pigeon's feeding pattern under minimal constraints. In Experiments 2 and 3, access to food was contingent on responding in fixed-interval schedules, and limits on availability of food were varied by changing the duration of reinforcement (Experiment 2) or the frequency of reinforcement (Experiment 3l. In all three experiments, a decline in the scheduled availability of food produced an increase in both the overall response rate and the local response rate. In addition, the distribution of responding across the day followed a diurnal rhythm typical of the pigeon's unconstrained pattern of food intake. These effects are consistent with previous studies showing an inverse relationship between instrumental response rate and reinforcer availability in the absence of fixed deprivation, and support the interpretation that this inverse relationship results from constraints imposed on preferred patterns of intake. The data on the localdistribution of responses were consistent with an extension of the response-deprivation hypothesis to localresponse patterning. 411

Choice and percentage reinforcement in pigeons

Animal Learning & Behavior, 1976

Pigeons responded on a two-key concurrent chains choice procedure with the same level of percentage reinforcement on each key. During the initial links, a choice response on either key occasionally produced a conditioned reinforcer-which on one key was associated with a 15-sec, and on the other key with a 30-sec, interreinforcement interval-or an extinction stimulus. In Part 1, the initial links were equal. With successive decreases in the probability of a reinforcer, choice shifted from preference for the 15-sec terminal link toward indifference. In Part 2, the initial links were unequal and were arranged so that the shorter initial link preceded the 30-sec terminalIink. At a high probability of a reinforcer, the pigeons again preferred the 15-sec terminalIink. However, at a low probability, the pigeons reversed and preferred the alternate key. It was concluded that the conditioned reinforcers tended to become functionally equivalent at a low probability of a reinforcer, despite the nominally different interreinforcement intervals, with the result that choice was then modulated by the relative size of the initial links. The data are inconsistent with the view that choice and the strength of conditioned reinforcers are isomorphic with the reduction in delay to reward correlated with terminal link stimuli.

Pigeons’ midsession reversal: Greater magnitude of reinforcement on the first half of the session leads to improved accuracy

Learning & Behavior, 2020

In the midsession reversal task, pigeons are trained on a simultaneous two-alternative discrimination in which S1 is correct for the first half of the session and S2 is correct for the second half of the session. Optimally, pigeons should choose S1 until it stops being correct and choose S2 afterward. Instead, pigeons anticipate S2 too early and continue choosing S1 even after the reversal. Research suggests that they attempt to time the reversal rather than use the feedback from the preceding response(s). Recently, there is evidence that performance is almost optimized by generating an asymmetry between S1 and S2. For example, pigeons' accuracy improves if correct S1 responses are reinforced 100% of the time, but correct S2 responses are reinforced only 20% of the time. Similarly, accuracy improves if S1 requires one peck but S2 requires 10 pecks. Accuracy does not improve, however, if the value of S1 is less than the value of S2. In the current experiment, we manipulated the magnitude of reinforcement. For the experimental group, correct responses to S1 were reinforced with five pellets of food and correct responses to S2 were reinforced with one pellet. For the control group, all correct responses were reinforced with three pellets. Consistent with the earlier findings, results indicated that there was a significant reduction in anticipatory errors in the experimental group compared with the control, and there was no significant increase in perseverative errors. Keywords Midsession reversal. Win-stay/lose-shift. Timing. Magnitude of reinforcement. Pigeons One measure of intelligence is the ability to use past experience when one encounters new learning. Harlow (1949) referred to this as learning to learn. In a variation of this principle, Mackintosh, McGonigle, Holgate, and Vanderver (1968) trained rats on a simple discrimination and then repeatedly reversed that discrimination. They found that the more reversals that were trained, the faster the rats acquired them. Rayburn-Reeves, Molet, and Zentall (2011; see also Cook & Rosen, 2010) trained pigeons on a version of the multiplereversal task, in which on each session the same stimulus (S1) is correct for the first half of the session, and the other stimulus (S2) is correct for the last half of each session. Following a large number of training sessions, several strategies may be used to near optimally perform this task. Animals could learn to count the number of trials to the reversal, but as most research has used an 80-trial session, that would be beyond the ability of most animals. Alternatively, one could choose S1

Concurrent schedules: Discriminating reinforcer-ratio reversals at a fixed time after the previous reinforcer

Journal of the Experimental Analysis of Behavior, 2013

Six pigeons worked on concurrent exponential variable-interval schedules in which the relative frequency of food deliveries for responding on the two alternatives reversed at a fixed time after each food delivery. Across conditions, the point of food-ratio reversal was varied from 10 s to 30 s, and the overall reinforcer rate was varied from 1.33 to 4 per minute. The effect of rate of food delivery and food-ratio-reversal time on choice and response rates was small. In all conditions, postfood choice was toward the locally richer key, regardless of the last-food location. Unlike the local food ratio which changed in a stepwise fashion, local choice changed according to a decelerating monotonic function, becoming substantially less extreme than the local food ratio soon after food delivery. This deviation in choice appeared to result from the birds' inaccurate discrimination of the time of food deliveries; local choice was described well by a model that assumed that log response ratios matched food ratios that were redistributed across surrounding time bins with mean time t and a constant coefficient of variation. We suggest that local choice is controlled by the likely availability of food in time, and that choice matches the discriminated log of the ratio of food rates across time since the last food delivery.

Rates of responding in the pigeon generated by simple and complex schedules which provide the same rates of reinforcement

Animal Learning & Behavior, 1976

Four pigeons pecked for food reinforcement on variable interval L-min schedules and on the variable-interval I-min components of multiple. concurrent. and pseudoconcurrent schedules. The pseudoconcurrent schedule provided only one schedule of reinforcement; but. any reinforcer could be collected by responding on either of two keys. The rate of responding generated by the variable interval schedule was not greater than the rates of responding generated by the components of the complex schedules. But. the rate of reinforcement obtained from the variable interval schedule was greater than the rates of reinforcement obtained from the components of the multiple schedule. These results may contradict the equation proposed by Herrnstein (19701. The equation predicts that the rate of responding generated by a schedule of reinforcement will be greater when the schedule appears alone. than when it appears as one component of a complex schedule.

Choice and the Initial Delay to a Reinforcer

The Psychological Record, 2008

Pigeons were trained in two experiments that used the concurrent-chains procedure. These experiments sought to identify the variables controlling the preference of pigeons for a constant duration over a variable duration of exposure to an aperiodic, time-based, terminal-link schedule. The results indicated that two variables correlated with the constant-duration terminal link combined to control preference: (a) a shorter initial delay to a reinforcer; and (b) the probabilistic occurrence of multiple reinforcers. Grace and Nevin (2000) trained pigeons on a concurrent-chains procedure with equal variable-interval (VI) schedules in the initial links and equal VI schedules in the terminal links. The terminal links differed in that one ended after a single reinforcer, which they called "variable-duration" terminal link, whereas the other ended after a fixed period of exposure equal to the average interreinforcement interval (IRI) of the schedule, which they called "constantduration" terminal link. As Grace and Nevin identified, and as discussed at some length below, an important feature of the constant-duration terminal link is that it probabilistically yielded 0, 1, or multiple reinforcers per entry, although it provided the same average rate of reinforcement overall as the variable-duration terminal link. Grace and Nevin (2000) found that three of four pigeons clearly preferred the constant-duration terminal link. In their words, the data of a fourth pigeon "demonstrated a consistent right-key bias" (p. 178), and the present conclusion is that its data are more difficult to interpret. In any case, an important question is what variables caused the preference. Ordinarily, one would have expected the pigeons to be indifferent, since the schedules in effect during the alternatives were identical, and each alternative yielded the same overall rate of reinforcement. Grace and Nevin (2000) initially pondered the role of multiple reinforcers in the constant-duration terminal link, because research has shown that subjects may well prefer a choice alternative associated with multiple reinforcers rather than a single reinforcer per terminal-link entry (e.g.,

Choice and number of reinforcers

Journal of The Experimental Analysis of Behavior, 1979

Pigeons were exposed to the concurrent-chains procedure in two experiments designed to investigate the effects of unequal numbers of reinforcers on choice. In Experiment 1, the pigeons were indifferent between long and short durations of access to variable-interval schedules of equal reinforcement density, but preferred a short high-density terminal link over a longer, lower density terminal link, even though in both sets of comparisons there were many more reinforcers per cycle in the longer terminal link. In Experiment 2, the pigeons preferred five reinforcers, the first of which was available after 30 sec, over a single reinforcer available at 30 sec, but only when the local interval between successive reinforcers was short. The pigeons were indifferent when this local interval was sufficiently long. The pigeons' behavior appeared to be under the control of local terminal-link variables, such as the intervals to the first reinforcer and between successive reinforcers, and was not well described in terms of transformed delays of reinforcement or reductions in average delay to reinforcement.

Every reinforcer counts: reinforcer magnitude and local preference

Journal of The Experimental Analysis of Behavior, 2003

Six pigeons were trained on concurrent variable-interval schedules. Sessions consisted of seven components, each lasting 10 reinforcers, with the conditions of reinforcement differing between components. The component sequence was randomly selected without replacement. In Experiment 1, the concurrent-schedule reinforcer ratios in components were all equal to 1.0, but across components reinforcer-magnitude ratios varied from 1:7 through 7:1. Three different overall reinforcer rates were arranged across conditions. In Experiment 2, the reinforcer-rate ratios varied across components from 27:1 to 1:27, and the reinforcer-magnitude ratios for each alternative were changed across conditions from 1:7 to 7:1. The results of Experiment 1 replicated the results for changing reinforcer-rate ratios across components reported by Baum (2000, 2002): Sensitivity to reinforcer-magnitude ratios increased with increasing numbers of reinforcers in components. Sensitivity to magnitude ratio, however, fell short of sensitivity to reinforcer-rate ratio. The degree of carryover from component to component depended on the reinforcer rate. Larger reinforcers produced larger and longer postreinforcer preference pulses than did smaller reinforcers. Similar results were found in Experiment 2, except that sensitivity to reinforcer magnitude was considerably higher and was greater for magnitudes that differed more from one another. Visit durations following reinforcers measured either as number of responses emitted or time spent responding before a changeover were longer following larger than following smaller reinforcers, and were longer following sequences of same reinforcers than following other sequences. The results add to the growing body of research that informs model building at local levels.

Pigeons prefer discriminative stimuli independently of the overall probability of reinforcement and of the number of presentations of the conditioned reinforcer

Journal of Experimental Psychology: Animal Behavior Processes, 2012

When pigeons are given a choice between two alternatives, one leading to a stimulus 20% of the time that always signals reinforcement (Sϩ) or another stimulus 80% of the time that signals the absence of reinforcement (SϪ) and the other alternative leading to one of two stimuli each signaling reinforcement 50% of the time, the 20% reinforcement alternative is preferred although it provides only 40% as much reinforcement. In Phase 1 of the present experiment, we tested the hypothesis that pigeons compare the Sϩ associated with each alternative and ignore the SϪ by giving them a choice between two pairs of discriminative stimuli (20% Sϩ, 80% SϪ and 50% Sϩ, 50% SϪ). Reinforcement theory suggests that the alternative associated with more reinforcement should be preferred but the pigeons showed indifference. In Phase 2, the pigeons were divided into two groups. For one group, the discriminative function was removed from the 50% reinforcement alternative and a strong preference for the 20% reinforcement alternative was found. For the other group, the discriminative function was removed from both alternatives and a strong preference was found for the 50% reinforcement alternative. Thus, the indifference found in Phase 1 was not due to the absence of discriminability of the differential reinforcement associated with the two alternatives (20% vs. 50% reinforcement); rather, the indifference can be attributed to the pigeons' insensitivity to the differential frequency of the two Sϩ and two SϪ stimuli. The relevance to human gambling behavior is discussed.

Choice in a variable environment: every reinforcer counts (original) (raw)

Related papers