Short-term memory in the pigeon: the previously reinforced response.
A brief pause between trials lets the last reinforcer’s pull fade—fill or shorten that gap to keep prior wins from overruling your teaching plan.
01Research in Context
What this study did
Shimp (1976) tested pigeons on a simple choice task. After each peck, the bird waited 2.5, 4, or 6 seconds before the next trial began.
The question: does the last trial’s payoff still sway the pigeon’s next pick as the wait grows longer?
What they found
The birds acted like tiny statisticians. They mostly repeated the last rewarded response and switched after non-reward.
This “win-stay/lose-shift” rule faded as the gap stretched from 2.5 to 6 seconds.
How this fits with other research
Nelson et al. (1978) soon copied the decay curve using a matching-to-sample task. Same species, same slide in accuracy—just a new procedure.
Bernal et al. (1980) seemed to clash: longer delays hurt performance. But they let birds peck during the wait; those extra pecks acted like rehearsal and softened the drop. The two studies don’t fight—they show the same memory fade, plus a handy fix.
Rider et al. (1984) modeled the loss as a hyperbola, giving later workers a ready equation for predicting forgetting.
Why it matters
Your learner’s last success (or failure) drives the next response most when the inter-trial gap is short. If you must wait—while data are scored or materials reset—let the learner do a quick mediating action such as naming, touching, or sorting to “hold” the lesson in mind.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Insert a 2-second model or echoic prompt between trials to bridge the delay and lock in the new target.
02At a glance
03Original abstract
Eighteen pigeons served in a discrete-trials short-term memory experiment in which the reinforcement probability for a peck on one of two keys depended on the response reinforced on the previous trial: either the probability of reinforcement on a trial was 0.8 for the same response reinforced on the previous trial and was 0.2 for the other response (Group A), or, it was 0 or 0.2 for the same response and 1.0 or 0.8 for the other response (Group B). A correction procedure ensured that over all trials reinforcement was distributed equally across the left and right keys. The optimal strategy was either a winstay, lose-shift strategy (Group A) or a win-shift, lose-stay strategy (Group B). The retention interval, that is the intertrial interval, was varied. The average probability of choosing the optimal alternative reinforced 80% of the time was 0.96, 0.84, and 0.74 after delays of 2.5, 4.0, and 6.0 sec, respectively for Group A, and was 0.87, 0.81, and 0.55 after delays of 2.5, 4.0, and 6.0 sec, respectively, for Group B. This outcome is consistent with the view that behavior approximated the optimal response strategy but only to an extent permitted by a subject's short-term memory for the cue correlated with reinforcement, that is, its own most-recently reinforced response. More generally, this result is consistent with "molecular" analyses of operant behavior, but is inconsistent with traditional "molar" analyses holding that fundamental controlling relations may be discovered by routinely averaging over different local reinforcement contingencies. In the present experiment, the molar results were byproducts of local reinforcement contingencies involving an organism's own recent behavior.
Journal of the experimental analysis of behavior, 1976 · doi:10.1901/jeab.1976.26-487