Choice with delayed and probabilistic reinforcers: effects of variability, time between trials, and conditioned reinforcers.
Unpredictable work requirements strengthen risky choices, but flashing lights or tokens can pull learners back to safer options.
01Research in Context
What this study did
Researchers let pigeons peck two keys. One key always paid off. The other paid off only half the time and sometimes made the birds wait.
Sometimes the wait came after a fixed number of pecks. Other times the number changed every trial. Green houselights blinked during the wait to act as tiny reinforcers.
What they found
Birds picked the risky key more when the peck-count varied than when it stayed the same.
The green lights cut that preference. The results were messy: variability helped, but the lights pulled the birds back toward the safe key.
How this fits with other research
Anonymous (1995) ran the same 50 % vs 100 % setup and got the same drop in risky choice when a gap was added. The two studies echo each other even though one is only an abstract.
Martin et al. (1997) also inserted a short gap before the food signal and saw the same pull toward safety. Together the three papers show that anything that weakens the conditioned reinforcer weakens the bad bet.
Henton (1972) looks like a contradiction: pigeons cared little whether terminal-link stimuli were present at all. The difference is task design. W used simple chained versus tandem schedules with no probability gamble, so the stimuli had no uncertainty to signal. Add probabilistic reinforcers and the same stimuli suddenly matter.
Why it matters
When you use token boards, points, or praise during long delays, you are using conditioned reinforcers. Vary the work requirement and those tiny reinforcers may gain or lose power. If a client keeps picking the "long shot" task, check whether your tokens arrive on a fixed or variable schedule and whether they are delayed too long. A small timing tweak can nudge better choices.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Try adding a brief, consistent token right after each response during long delays to see if preference shifts toward the safer task.
02At a glance
03Original abstract
In a discrete-trials procedure with pigeons, a response on a green key led to a 4-s delay (during which green houselights were lit) and then a reinforcer might or might not be delivered. A response on a red key led to a delay of adjustable duration (during which red houselights were lit) and then a certain reinforcer. The delay was adjusted so as to estimate an indifference point--a duration for which the two alternatives were equally preferred. Once the green key was chosen, a subject had to continue to respond on the green key until a reinforcer was delivered. Each response on the green key, plus the 4-s delay that followed every response, was called one "link" of the green-key schedule. Subjects showed much greater preference for the green key when the number of links before reinforcement was variable (averaging four) than when it was fixed (always exactly four). These findings are consistent with the view that probabilistic reinforcers are analogous to reinforcers delivered after variable delays. When successive links were separated by 4-s or 8-s "interlink intervals" with white houselights, preference for the probabilistic alternative decreased somewhat for 2 subjects but was unaffected for the other 2 subjects. When the interlink intervals had the same green houselights that were present during the 4-s delays, preference for the green key decreased substantially for all subjects. These results provided mixed support for the view that preference for a probabilistic reinforcer is inversely related to the duration of conditioned reinforcers that precede the delivery of food.
Journal of the experimental analysis of behavior, 1992 · doi:10.1901/jeab.1992.58-513