Reinforcement of least-frequent sequences of choices.
Matching can emerge even when maximizing isn’t possible—sequential reinforcement contingencies alone can produce the matching relation.
01Research in Context
What this study did
Shimp (1967) worked with three pigeons in a lab box.
The birds could peck four keys in any order.
Food arrived only when the bird picked the four-peck pattern that had happened least often.
What they found
The birds still matched.
Their response ratios lined up with the food ratios, even though the best-paying pattern kept changing.
Matching showed up without any chance to maximize.
How this fits with other research
Kunz et al. (1982) later shortened the game to two-peck strings and still saw matching.
Together the papers show the rule works for short or long response chains.
Glover et al. (1976) looks like a clash: when food stopped, the birds drifted away from matching.
The gap is timing.
Shimp (1967) watched steady-state choices; Glover et al. (1976) tested what happens after the rule is turned off.
Both can be true: matching holds while the rule is on, then fades.
Why it matters
You now know matching can be built even when clients cannot pick a single best move.
If you reinforce varied play actions, social phrases, or academic responses that have been rare lately, you may see the whole response class settle into matching proportions.
Use this to boost flexibility without extra prompts.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Reinforce the least-used 3-step play sequence today and watch if new patterns appear more often.
02At a glance
03Original abstract
When a pigeon's choices between two keys are probabilistically reinforced, as in discrete trial probability learning procedures and in concurrent variable-interval schedules, the bird tends to maximize, or to choose the alternative with the higher probability of reinforcement. In concurrent variable-interval schedules, steady-state matching, which is an approximate equality between the relative frequency of a response and the relative frequency of reinforcement of that response, has previously been obtained only as a consequence of maximizing. In the present experiment, maximizing was impossible. A choice of one of two keys was reinforced only if it formed, together with the three preceding choices, the sequence of four successive choices that had occurred least often. This sequence was determined by a Bernoulli-trials process with parameter p. Each of three pigeons matched when p was (1/2) or (1/4). Therefore, steady-state matching by individual birds is not always a consequence of maximizing. Choice probability varied between successive reinforcements, and sequential statistics revealed dependencies which were adequately described by a Bernoulli-trials process with p depending on the time since the preceding reinforcement.
Journal of the experimental analysis of behavior, 1967 · doi:10.1901/jeab.1967.10-57