ABA Fundamentals

Preference pulses and the win-stay, fix-and-sample model of choice.

Hachiga et al. (2015) · Journal of the experimental analysis of behavior

★ The Verdict

Preference pulses are real, extinction-produced, and best caught with a win-stay tweak in your model.

✓ Read this if BCBAs who run concurrent schedules or study choice under extinction.

✗ Skip if Clinicians who only work on skill acquisition with continuous reinforcement.

01Research in Context

What this study did

Hachiga et al. (2015) watched pigeons choose between two keys while food stopped. They tracked tiny choice jumps right after a reinforcer. They added a "win-stay" rule to a computer model to see if it copied the birds' patterns.

What they found

The little post-food choice bursts, called preference pulses, only showed up when food ended. When the model used win-stay, the fit errors shrank. This says the pulse is real, not a math trick.

How this fits with other research

Sawyer et al. (2014) claimed pulses can appear without any real reinforcement effect—just a side effect of how visits are counted. Yosuke et al. answer: pulses vanish when food keeps coming, so they are tied to reinforcement history, not just bookkeeping.

Malone (1999) showed rats stay or switch based on local payoffs. Adding win-stay to the new model keeps that local idea but explains why birds repeat the last successful key after food stops.

McSweeney et al. (1993) found post-reinforcement pauses track food timing. Yosuke et al. add that the next choice, not just the pause, is also controlled by what happened last.

Why it matters

If you run concurrent schedules and see brief jumps back to the just-paid alternative, do not call them artifacts. Under thinning or extinction, those pulses tell you the learner is using a win-stay rule. You can use this to check if your reinforcement history is still driving choice when the chips are gone.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

During extinction probes, watch for a quick return to the last-reinforced option and code it as win-stay, not error.

02At a glance

Intervention

not applicable

Design

single case other

Sample size

Population

neurotypical

Finding

not reported

03Original abstract

Two groups of six rats each were trained to respond to two levers for a food reinforcer. One group was trained on concurrent variable-ratio 20 extinction schedules of reinforcement. The second group was trained on a concurrent variable-interval 27-s extinction schedule. In both groups, lever-schedule assignments changed randomly following reinforcement; a light cued the lever providing the next reinforcer. In the next condition, the light cue was removed and reinforcer assignment strictly alternated between levers. The next two conditions redetermined, in order, the first two conditions. Preference pulses, defined as a tendency for relative response rate to decline to the just-reinforced alternative with time since reinforcement, only appeared during the extinction schedule. Although the pulse's functional form was well described by a reinforcer-induction equation, there was a large residual between actual data and a pulse-as-artifact simulation (McLean, Grace, Pitts, & Hughes, 2014) used to discern reinforcer-dependent contributions to pulsing. However, if that simulation was modified to include a win-stay tendency (a propensity to stay on the just-reinforced alternative), the residual was greatly reduced. Additional modifications of the parameter values of the pulse-as-artifact simulation enabled it to accommodate the present results as well as those it originally accommodated. In its revised form, this simulation was used to create a model that describes response runs to the preferred alternative as terminating probabilistically, and runs to the unpreferred alternative as punctate with occasional perseverative response runs. After reinforcement, choices are modeled as returning briefly to the lever location that had been just reinforced. This win-stay propensity is hypothesized as due to reinforcer induction.

Journal of the experimental analysis of behavior, 2015 · doi:10.1002/jeab.170