Preference pulses and the win-stay, fix-and-sample model of choice.
Preference pulses are real, extinction-produced, and best caught with a win-stay tweak in your model.
01Research in Context
What this study did
Hachiga et al. (2015) watched pigeons choose between two keys while food stopped. They tracked tiny choice jumps right after a reinforcer. They added a "win-stay" rule to a computer model to see if it copied the birds' patterns.
What they found
The little post-food choice bursts, called preference pulses, only showed up when food ended. When the model used win-stay, the fit errors shrank. This says the pulse is real, not a math trick.
How this fits with other research
Sawyer et al. (2014) claimed pulses can appear without any real reinforcement effect—just a side effect of how visits are counted. Yosuke et al. answer: pulses vanish when food keeps coming, so they are tied to reinforcement history, not just bookkeeping.
Malone (1999) showed rats stay or switch based on local payoffs. Adding win-stay to the new model keeps that local idea but explains why birds repeat the last successful key after food stops.
McSweeney et al. (1993) found post-reinforcement pauses track food timing. Yosuke et al. add that the next choice, not just the pause, is also controlled by what happened last.
Why it matters
If you run concurrent schedules and see brief jumps back to the just-paid alternative, do not call them artifacts. Under thinning or extinction, those pulses tell you the learner is using a win-stay rule. You can use this to check if your reinforcement history is still driving choice when the chips are gone.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →During extinction probes, watch for a quick return to the last-reinforced option and code it as win-stay, not error.
02At a glance
03Original abstract
Two groups of six rats each were trained to respond to two levers for a food reinforcer. One group was trained on concurrent variable-ratio 20 extinction schedules of reinforcement. The second group was trained on a concurrent variable-interval 27-s extinction schedule. In both groups, lever-schedule assignments changed randomly following reinforcement; a light cued the lever providing the next reinforcer. In the next condition, the light cue was removed and reinforcer assignment strictly alternated between levers. The next two conditions redetermined, in order, the first two conditions. Preference pulses, defined as a tendency for relative response rate to decline to the just-reinforced alternative with time since reinforcement, only appeared during the extinction schedule. Although the pulse's functional form was well described by a reinforcer-induction equation, there was a large residual between actual data and a pulse-as-artifact simulation (McLean, Grace, Pitts, & Hughes, 2014) used to discern reinforcer-dependent contributions to pulsing. However, if that simulation was modified to include a win-stay tendency (a propensity to stay on the just-reinforced alternative), the residual was greatly reduced. Additional modifications of the parameter values of the pulse-as-artifact simulation enabled it to accommodate the present results as well as those it originally accommodated. In its revised form, this simulation was used to create a model that describes response runs to the preferred alternative as terminating probabilistically, and runs to the unpreferred alternative as punctate with occasional perseverative response runs. After reinforcement, choices are modeled as returning briefly to the lever location that had been just reinforced. This win-stay propensity is hypothesized as due to reinforcer induction.
Journal of the experimental analysis of behavior, 2015 · doi:10.1002/jeab.170