Theories of probabilistic reinforcement.
Total time a learner spends in a signaled condition drives choice more than simple delay when reinforcers are uncertain.
01Research in Context
What this study did
Mazur (1989) let pigeons choose between two keys. One key gave food every time. The other key gave food only some of the time.
The birds could see colored lights while they waited. The lights told them which key was which. The team changed how long each light stayed on.
They wanted to know if the birds cared more about delay or about the total time they spent looking at each signal.
What they found
The birds picked the key where they spent the most total time looking at the signal. Delay alone did not explain their choice.
The simple rule “stay where the light is on longer” beat three other ideas.
How this fits with other research
Mellitz et al. (1983) said pigeons pick the key with the highest momentary chance of food. Mazur (1989) shows the birds really use total time in the signal. The two papers seem to clash, but they tested different things. Momentary odds matter in quick switches; total time matters when signals stay on.
Adkins et al. (1997) went further. They showed pigeons look ahead to the next three or four reinforcers, not just the next one. E’s total-time rule still holds, but K’s model adds the future view.
Hawkes et al. (1974) and Rilling et al. (1969) set the base line: reinforcement rate controls choice. Mazur (1989) keeps that idea but swaps “rate” for “total time in the signal” when reinforcers are uncertain.
Why it matters
When you use probabilistic reinforcement, pair each option with a clear, lasting signal. Let the learner stay in that signal long enough to value it. Short flashes or quick cues will lose to options that keep the signal on longer, even if the payoff is the same.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Keep your token or “mystery” signal visible for longer stretches before the reinforcer hits to boost its pulling power.
02At a glance
03Original abstract
In three experiments, pigeons chose between two alternatives that differed in the probability of reinforcement and the delay to reinforcement. A peck at a red key led to a delay of 5 s and then a possible reinforcer. A peck at a green key led to an adjusting delay and then a certain reinforcer. This delay was adjusted over trials so as to estimate an indifference point, or a duration at which the two alternatives were chosen about equally often. In Experiments 1 and 2, the intertrial interval was varied across conditions, and these variations had no systematic effects on choice. In Experiment 3, the stimuli that followed a choice of the red key differed across conditions. In some conditions, a red houselight was presented for 5 s after each choice of the red key. In other conditions, the red houselight was present on reinforced trials but not on nonreinforced trials. Subjects exhibited greater preference for the red key in the latter case. The results were used to evaluate four different theories of probabilistic reinforcement. The results were most consistent with the view that the value or effectiveness of a probabilistic reinforcer is determined by the total time per reinforcer spent in the presence of stimuli associated with the probabilistic alternative. According to this view, probabilistic reinforcers are analogous to reinforcers that are delivered after variable delays.
Journal of the experimental analysis of behavior, 1989 · doi:10.1901/jeab.1989.51-87