Topic Guide · Practitioner

Preference Assessment in ABA: Formats, Decision Logic, and How to Pick Reinforcers That Actually Reinforce

Query target: preference assessment ABA · BBC Editorial Team
★ Summary

A preference assessment is the systematic procedure behavior analysts use to identify stimuli likely to function as reinforcers, by directly measuring a learner's choice or engagement rather than relying on adult report. The formats most BCBAs and RBTs run day-to-day — free operant (FO), single stimulus (SS), paired stimulus (PS), multiple stimulus without replacement (MSWO), multiple stimulus with replacement (MSW), and brief MSWO — trade off speed against the precision of the resulting hierarchy, with paired stimulus and MSWO yielding the most stable rankings across repeated administrations Verriden & Roscoe (2016). Critically, "preferred" is not the same as "reinforcing": rank order from a standard PA can break down once response requirements increase, the social partner changes, or motivating operations shift, so a preference assessment is the first step in reinforcer selection — not the last Frank‐Crawford et al. (2018) (Morris et al., 2024). Run the format that matches your client's repertoire and risk profile, then validate the resulting top items against the actual treatment context with a brief reinforcer test.

01What the Research Says

Preference is not the same as reinforcement

The single most important conceptual point this corpus makes is one most introductory texts under-state: a stimulus the learner chooses is not automatically a stimulus that will strengthen behavior. Frank-Crawford, Castillo, and DeLeon ranked 12-14 food items via paired-stimulus preference assessments for three children with escape-maintained problem behavior, then probed whether the top four stimuli remained equally effective when fixed-ratio response requirements increased during a demand-escape analysis; higher-preference items did not always retain their advantage at larger response costs, meaning rank order from a standard PA did not reliably predict reinforcer substitution under treatment-like work demands Frank‐Crawford et al. (2018). Morris, Conine, Slanzi, Kronfli, and Etchison's survey of 227 BCBAs delivering early intensive ABA confirms how practitioners actually behave around this gap: clinicians treat formal preference assessments as a starting point but make real-time reinforcer decisions based on momentary engagement cues, swapping items mid-session when response rates drop and running on-the-spot micro-checks rather than relying on yesterday's full MSWO data (Morris et al., 2024). The operational implication is consistent: rank a hierarchy with a standard PA, then validate the top one or two items under the actual schedule and demand context you intend to use them in Frank‐Crawford et al. (2018).

The major formats produce different rankings — and different problem-behavior risks

Verriden and Roscoe's direct comparison of four PA formats across six children and adults with autism or TBI found that paired stimulus and MSWO yielded the most stable item rankings across repeated administrations, while free operant and response restriction produced lower correspondence; reinforcement efficacy was modestly stronger for stimuli that maintained high preference across time, supporting PS and MSWO as the default formats when stability matters Verriden & Roscoe (2016). Stability is not the only dimension to optimize. Tung, Donaldson, and Kahng compared FO, PS, and MSWO formats in an 11-year-old with tangible-maintained problem behavior and showed that FO generated significantly less problem behavior than either trial-based format because it never withholds preferred items Tung et al. (2017). Herbek and colleagues replicated and extended this finding across six children with severe challenging behavior using a response-restriction free operant: rankings were comparable to PS, but rates of challenging behavior were lower in four of six cases Herbek et al. (2026). The format choice trades off precision (PS/MSWO) against evocation risk (FO/RR-FO), and the right answer depends on the client's behavior history.

Brief MSWO and classroom-friendly variants extend the procedure into real settings

Most published MSWO papers use 5- to 7-item arrays in clinic-like conditions; the field has spent the last decade adapting the procedure for classrooms, telehealth, and groups. Radley and colleagues compared individual four-choice edible MSWO with a simultaneous group Plickers-card assessment across 19 elementary students; both procedures yielded identical top edible reinforcers for every participant, meaning a teacher can identify a class-wide reinforcer in minutes without pulling each student aside Radley et al. (2019). Curiel and colleagues extended brief MSWO to web-based video preference assessments, and Hoffmann and colleagues ran tablet-based MSWO using app-icon pictures for six adults with disabilities, then validated that highly identified apps subsequently functioned as reinforcers in a follow-up reinforcer test — with the explicit caveat that learners must be able to match the icon picture to the corresponding app before this format can substitute for tangible MSWO Curiel et al. (2018) Hoffmann et al. (2019). Chebli and Lanovaz used a tablet preference assessment to identify videos for five children with autism, then verified the assessment's predictions in a concurrent-operant reinforcer test where children spent markedly more time in a chair linked to the assessment-identified high-preference video Chebli & Lanovaz (2016). Digital and brief variants reproduce the precision of full MSWO when prerequisite skills are intact and a brief reinforcer-validation step is layered on top.

Single-stimulus and paired-stimulus formats for specific populations

Single-stimulus assessment — the procedurally simplest format, presenting one item at a time and recording approach versus avoidance — has been overshadowed in modern training but remains the strongest option for some populations. Ford, Bayles, and Bruzek compared vocal-SS, picture-SS, MSWO, and choice-rank formats in six older-adult women with neurocognitive disorder and found that single-stimulus assessments — particularly the verbally delivered version — produced more consistent responding and better predicted later engagement than rank-order formats; SS preferences also remained stable for most participants across a 20- to 32-week span Ford et al. (2022). Bigwood, Staples, and Sharp extended this work in adults with dementia, showing that small adaptations to standard SPAs — adding brief social interaction during item presentation, using yes/no questions for participants with limited verbal repertoires — kept the assessment usable when the standard form had become too cognitively demanding Bigwood et al. (2026). The take-home for adult and geriatric practice: do not default to MSWO because that is what graduate school taught; SS is often more valid for cognitively impaired adults Ford et al. (2022).

Paired-stimulus assessment, in turn, is the safer first move for new clients and bilingual learners. Waits and Gilroy used a two-stage culturally-informed paired-stimulus PA — caregiver RAISD interview followed by paired-stimulus choice trials with 30-second access periods — to identify Spanish/English reinforcers for two bilingual autistic children before designing a bilingual communication intervention (Waits & Gilroy, 2025). The combination of caregiver interview plus PS probe is the literature's standard answer to "I don't know what this client likes yet"; it surfaces idiosyncratic and culturally specific candidate items that closed-ended checklists miss, then ranks them with a procedure tolerant of small response repertoires.

Indirect preference assessments: useful as a starter, not a substitute

Indirect formats — caregiver and teacher interviews, the RAISD, open-ended questionnaires — generate the candidate item list that a direct PA then tests. They are fast and cost nothing in materials, but indirect data are not interchangeable with direct measurement. Cameron, Skurski, and Alligood's adaptation of paired-stimulus PA to three adult zoo-housed gorillas makes the point cleanly: caregivers were better at predicting which items were least preferred than which were most preferred, so caregiver intuition is more useful for ruling items out than ranking items in Cameron et al. (2025). Oliveira and colleagues' PRISMA-guided systematic review of 14 stimulus-stimulus pairing studies reinforces the procedural norm: 13 of 14 papers ran a formal preference assessment — RAISD, paired stimulus, MSWO, or concurrent chains — to identify preferred pairing stimuli before the intervention; the field treats the formal PA as the procedural standard (Oliveira et al., 2025). Use indirect tools as inputs to direct PA, not as outputs you treat with (Waits & Gilroy, 2025).

Concurrent-operants validation, stability, and assessor effects

A preference rank tells you what the learner picks; a concurrent-operants reinforcer test tells you whether the picked item actually changes behavior. Castelluccio and Johnson used a pictorial PA to rank break environments for two students with autism, then ran a concurrent-chains reinforcer test that confirmed the most-preferred environments functioned as stronger reinforcers than less-preferred ones Castelluccio & Johnson (2019). Walsh, Lydon, and Holloway extended the same logic to adult vocational planning by using an iPad-based MSWO of 12 short job-task videos to predict on-the-job engagement for four adults with autism and intellectual disability — a 10- to 15-minute video preference screen replaced longer in-vivo try-outs and predicted later performance better than staff opinion Walsh et al. (2020). Concurrent-operants validation is the methodological answer to the preference-isn't-reinforcement problem.

Stability matters at two timescales. Melanson and colleagues' analysis of 40 MSWO sessions with 17 autistic children quantified intra-assessment stability via Spearman rank-order correlations between consecutive rounds: preferences remained stable in only about 60% of within-assessment round comparisons, with roughly 40% showing meaningful rank shifts within the same MSWO administration Melanson et al. (2023). The first round of an MSWO does not reliably predict the third round even on the same day. Across days, Verriden and Roscoe's longitudinal comparison shows PS and MSWO produce more stable rankings than FO or response restriction, while Ford and colleagues documented preference stability over 20-32 weeks for SS in older adults with NCD — but only for highly preferred items Verriden & Roscoe (2016) Ford et al. (2022). The practical rule: re-assess any time engagement drops, motivation appears to shift, the social partner changes, or 4-8 weeks have passed.

The assessor matters too — especially for social reinforcers. Huntington and Schwartz repeated paired-stimulus PAs for one adult with autism across four different assessors (mother, staff member, unfamiliar researcher, and the participant himself) and found he selected different social interactions depending on the assessor; the highly-ranked, assessor-specific interactions actually generated higher response rates in a follow-up reinforcer test than those ranked lower Huntington & Schwartz (2022). Morris and Vollmer's Social Interaction Preference Assessment (SIPA) work formalized this for social stimuli: their video-modeled, free-operant SIPA reliably identified a differentially preferred and reinforcing social interaction in five children with autism, and their methods comparison showed that MSWO produces a valid social-reinforcer hierarchy only when the learner can reliably tact pictures of those interactions; without that prerequisite, SIPA remains valid where MSWO breaks down Morris & Vollmer (2019) Morris & Vollmer (2020). When the planned reinforcer is social, run the assessment with the person who will deliver it, document that assessor on the data sheet, and pick a format that doesn't assume picture-tact prerequisites unless verified.

Decision frameworks, competing-stimulus extensions, and staff training

Lill, Shriver, and Allen's Stimulus Preference Assessment Decision-Making System (SPADS) is the field's clearest written articulation of how an experienced clinician chooses among free operant, MSWO, and trial-based formats based on client characteristics, available time, and stimulus class, with embedded rules for when to extend, repeat, or terminate an assessment — though the model itself awaits empirical validation Lill et al. (2021). Where SPADS ends, Haddock and Hagopian's Competing Stimulus Assessment (CSA) review picks up: when the goal is not "identify a reinforcer for skill acquisition" but rather "identify a stimulus that, delivered noncontingently, will reduce automatically maintained problem behavior," a CSA — which directly measures whether free access to an item suppresses the target behavior — is the correct procedure, not a standard PA Haddock & Hagopian (2020). Avery and Akers's Picture-based Stimulus Demand Assessment (PSDA) covers the inverse case — quantifying aversion to demands — by recording how often clients select between two demand pictures, producing a non-intrusive demand hierarchy that complements traditional preference data when planning task sequencing Avery & Akers (2021). Distinguishing PA from CSA from PSDA is a frequent error in fieldwork supervision.

The procedural-fidelity research is unusually encouraging. Bovi and colleagues trained two public-school staff to MSWO mastery using a single brief video-modeling-with-voice-over (VMVO) lesson Bovi et al. (2017); both staff acquired and maintained 100% correct implementation after one viewing Bovi et al. (2017). O'Handley and colleagues trained four preservice school psychology graduate students to MSWO mastery using an instructional script plus live feedback O’Handley et al. (2021). Ausenhus and Higgins used real-time telehealth feedback to bring three direct-care staff to mastery within 3-4 sessions Ausenhus & Higgins (2019). "We don't have time to train RBTs to run MSWO" is no longer defensible; an effective package is short, can run remotely, and produces high fidelity. The remaining gap is institutional adoption: Miranda and colleagues' qualitative interviews with preschool special-education teachers found mixed familiarity, infrequent use, and systemic barriers to classroom adoption — even though teachers viewed PAs as acceptable and useful — meaning the bottleneck is workload pressure, not teachability Miranda et al. (2025).

Stimulus presentation, transition planning, and component-based extensions

Stimulus presentation itself is an independent variable most practitioners under-control. Moore and colleagues had five typically developing adults complete two modified MSWOs differing only in stimulus size (uniform mass versus caregiver-reported portion); top three items matched 60% of the time, but lower-ranked items diverged considerably — if precise full-rank hierarchies matter, presentation must be standardized Moore et al. (2017). Heinicke, Carr, and Copsey's systematic review of 32 studies comparing alternative-modality SPAs (pictures, video, verbal) against tangible controls concluded that all three can approach tangible accuracy when (a) the learner has the prerequisite picture-matching, sight-reading, or video-comprehension skill, and (b) chosen items are delivered contingently after the assessment to maintain correspondence between reported and actual preference Heinicke et al. (2019). Pictorial PAs fail silently when those prerequisites are missing.

Preference assessment also feeds skill acquisition, transition planning, and FCT. Tullis and Seaman-Tullis map the SPA literature onto IDEA-mandated transition planning: SPA-derived vocational and leisure stimuli should drive IEP goals and community-based work-site selection by age 16, with quarterly re-checks tied to documented growth Tullis & Seaman-Tullis (2019). Isenhower and colleagues went further with a component-based concurrent-operants leisure assessment for two adults with ASD, decomposing leisure activities into interaction, movement, and modality components; the resulting profile predicted later activity and job choice better than item-level PA alone Isenhower et al. (2025). LaMarca and LaMarca's ADDIE-based programming guide reinforces the integration: rapid concurrent-chains preference probes belong inside instructional design, not just at intake (LaMarca et al., 2024).

02Evidence Tier Breakdown

The preference-assessment literature is dominated by single-subject experimental designs (SCED) and methodology papers, with two systematic reviews, one decision-making model, and a small set of survey, qualitative, and theoretical contributions Heinicke et al. (2019) Haddock & Hagopian (2020).

Systematic and methodological reviews. Heinicke, Carr, and Copsey's 32-study review anchors the validity claims for picture, video, and verbal formats Heinicke et al. (2019). Haddock and Hagopian's CSA review provides the conceptual boundary between PA and competing-stimulus assessment Haddock & Hagopian (2020). Oliveira and colleagues' 14-study PRISMA review documents the procedural norm of running a formal PA before stimulus-stimulus pairing (Oliveira et al., 2025). Lill, Shriver, and Allen's SPADS is methodological and explicitly notes its lack of empirical validation Lill et al. (2021).

Single-subject experimental designs. Most of the corpus sits here. Verriden and Roscoe (n=6) anchor the format-comparison evidence Verriden & Roscoe (2016). Frank-Crawford and colleagues (n=3) provide the strongest demonstration that preference rank does not always predict reinforcer substitution under demand Frank‐Crawford et al. (2018). Tung and colleagues (n=1) and Herbek and colleagues (n=6) ground the FO and RR-FO problem-behavior arguments Tung et al. (2017) Herbek et al. (2026). Ford and colleagues (n=6) document SS validity and 20-32-week stability in older adults with NCD; Bigwood and colleagues (n=4) extend the dementia-adaptation work Ford et al. (2022) Bigwood et al. (2026). Melanson and colleagues (n=17) quantify intra-MSWO instability Melanson et al. (2023). Huntington and Schwartz (n=1), Morris and Vollmer (n=5 and n=8) cover assessor effects and social-stimulus formats Huntington & Schwartz (2022) Morris & Vollmer (2019) Morris & Vollmer (2020). Castelluccio and Johnson (n=2), Chebli and Lanovaz (n=5), Walsh and colleagues (n=4), Hoffmann and colleagues (n=6), Curiel and colleagues, Radley and colleagues (n=19), Isenhower and colleagues (n=2), Waits and Gilroy (n=2), Cameron and colleagues (n=3 gorillas), and Moore and colleagues (n=5) cover pictorial, video, tablet, group, component, bilingual, and presentation variations Castelluccio & Johnson (2019) Chebli & Lanovaz (2016) Walsh et al. (2020) Hoffmann et al. (2019) Curiel et al. (2018) Radley et al. (2019) Isenhower et al. (2025) (Waits & Gilroy, 2025) Cameron et al. (2025) Moore et al. (2017). Staff-training SCED includes Bovi et al. (n=2) Bovi et al. (2017), O'Handley et al. (n=4) O’Handley et al. (2021), and Ausenhus and Higgins (n=3) Ausenhus & Higgins (2019).

Survey and qualitative. Morris, Conine, and colleagues' survey of 227 BCBAs is the largest practice-pattern data point and complicates the assumption that formal PAs drive day-to-day reinforcer selection in EIBI (Morris et al., 2024). Miranda and colleagues' qualitative interviews with four preschool teachers describe the institutional barriers to classroom adoption Miranda et al. (2025).

Theoretical and conceptual. Tullis and Seaman-Tullis on transition planning, Avery and Akers on demand assessment, and LaMarca and LaMarca on ADDIE-based programming frame how PA fits into broader case management Tullis & Seaman-Tullis (2019) Avery & Akers (2021) (LaMarca et al., 2024).

Bottom line. The convergent picture is strong for: format selection trade-offs between PS/MSWO precision and FO/RR-FO safety Verriden & Roscoe (2016) Tung et al. (2017) Herbek et al. (2026); the preference-versus-reinforcement gap Frank‐Crawford et al. (2018); alternative-modality validity when prerequisites are intact Heinicke et al. (2019); and the trainability of staff via brief or telehealth-delivered packages Bovi et al. (2017) Ausenhus & Higgins (2019). It is weaker for any specific claim about long-term population-level reinforcer durability outside individual longitudinal SCED, and SPADS specifically still awaits empirical validation Lill et al. (2021).

03Decision Logic

The format-selection question is not "MSWO or paired stimulus?" so much as "what is this client's repertoire, what is the safety profile, and what is the assessment for?"

  1. New client, item set not yet culturally or developmentally vetted. Start with a caregiver RAISD interview to surface candidate items, then run paired stimulus to rank the resulting set. Caregiver intuition is more useful for ruling items out than ranking them in (Waits & Gilroy, 2025) Cameron et al. (2025).
  2. History of severe problem behavior, particularly tangible- or escape-maintained. Default to free operant or response-restriction free operant; both reduce problem-behavior rates relative to trial-based formats, while RR-FO produces rankings comparable to paired stimulus Tung et al. (2017) Herbek et al. (2026).
  3. Time-pressed in-session reinforcer adjustment. Run a brief MSWO (single round, 1-3 minutes). This is what experienced clinicians actually do and what the field's largest practice-pattern survey documents as routine (Morris et al., 2024).
  4. Severe disability, limited choice repertoire, or older adult with NCD. Use single stimulus, ideally with verbal delivery; SS produces more consistent responding, better predicts engagement, and remains stable for 20-32 weeks in older adults with NCD Ford et al. (2022). Add brief social interaction during item presentation and yes/no questions when verbal repertoire is limited Bigwood et al. (2026).
  5. Client has scanning and selection prerequisites; you need a precise hierarchy. Run MSWO with at least three rounds; use the final-round hierarchy or the mode across rounds rather than round one — intra-assessment instability is about 40% Melanson et al. (2023).
  6. Whole-classroom group reinforcer identification. Run group brief MSWO via Plickers cards or a similar simultaneous-response tool Radley et al. (2019).
  7. Reinforcer is a social interaction. Run the assessment with the person who will deliver it, document the assessor on the data sheet, and pick a format that doesn't assume picture-tact prerequisites — SIPA if the learner cannot tact pictures of social interactions; MSWO only if the prerequisite is verified Huntington & Schwartz (2022) Morris & Vollmer (2019) Morris & Vollmer (2020).
  8. Items will be delivered digitally (videos, apps). Use a tablet- or web-based MSWO and validate the top items with a brief concurrent-operants reinforcer test before embedding them in token systems or self-management programs Chebli & Lanovaz (2016) Hoffmann et al. (2019).
  9. Goal is to reduce automatically maintained problem behavior with noncontingent stimulus access. Do not run a standard PA — run a Competing Stimulus Assessment, which directly measures whether free access to an item suppresses the target behavior Haddock & Hagopian (2020).
  10. Goal is to identify a hierarchy of demand aversion. Run a Picture-based Stimulus Demand Assessment rather than a preference assessment Avery & Akers (2021).
  11. Top items identified — what now? Validate the top one or two with a concurrent-operants reinforcer test under conditions resembling the actual treatment schedule before committing to them in the BIP or skill-acquisition plan; preference rank does not always predict reinforcer substitution at higher response costs Frank‐Crawford et al. (2018) Castelluccio & Johnson (2019).

04Across Settings

Clinic and outpatient

The clinic literature is the densest section of the corpus. Standard PS, MSWO, and FO are the default formats; SPADS provides written decision rules for when to extend, repeat, or terminate an assessment under typical clinic time constraints Lill et al. (2021). Concurrent-operants reinforcer validation is procedurally cheap in clinic — two-chair setups, alternating-treatments designs, and tablet-based video delivery are all standard Chebli & Lanovaz (2016) Castelluccio & Johnson (2019). For clients with severe problem behavior, the clinic is typically where FO and RR-FO are run because the environment allows controlled item arrays without disrupting natural routines Herbek et al. (2026).

School and classroom

School practice diverges from clinic in two ways: time per student is short, and group-level identification often matters more than individual hierarchies. Group brief MSWO via Plickers cards reliably identifies class-wide top edible reinforcers without pulling each student aside Radley et al. (2019). Embedded preference checks during transition times, snack votes, or daily routines reduce the perceived "extra workload" that drives the institutional adoption barrier teachers describe Miranda et al. (2025). For individual student programming, a brief tablet- or web-based MSWO is more feasible than a 30-minute PS session; ready-made data sheets and pre-validated VMVO training materials are the lever for raising classroom adoption Bovi et al. (2017). When the planned reinforcer is teacher attention, run the assessment with the actual teacher who will deliver it Huntington & Schwartz (2022). School-based PA also supports IDEA-mandated transition planning: SPA-derived vocational and leisure preferences should drive IEP goals and community-based work-site selection by age 16, with quarterly re-checks Tullis & Seaman-Tullis (2019).

Home and parent-administered

Home-based PA leans on simpler formats and caregiver implementation. SS and FO are the practical defaults because they require minimal trial structure and tolerate small response repertoires Ford et al. (2022). The RAISD interview is itself a home-friendly tool — it surfaces idiosyncratic items in the family environment that clinic-based item arrays would never include (Waits & Gilroy, 2025). For bilingual families, pair the RAISD with a culturally informed paired-stimulus assessment using both languages so the hierarchy reflects the family's actual reinforcing environment (Waits & Gilroy, 2025). Telehealth coaching closes the loop: brief real-time feedback packages bring direct-care staff and parents to PA mastery within 3-4 sessions, with internet connectivity the only meaningful constraint Ausenhus & Higgins (2019).

Adult disability services and residential

Adult and residential settings concentrate two challenges: cognitive impairment that defeats rank-order formats, and dispersed staff who cannot all be trained the same way. SS is the format of choice for older adults with NCD; verbal delivery improves predictive validity, and adaptations like brief paired social interaction during item presentation and yes/no question prompts keep SS usable as cognition declines Ford et al. (2022) Bigwood et al. (2026). For vocational and leisure programming, component-based concurrent-operants assessments — decomposing activities into interaction, movement, and modality components — predict job and activity choice better than item-level PA alone Isenhower et al. (2025). Tablet-based video MSWO replaces longer in-vivo job try-outs in supported employment Walsh et al. (2020). VMVO and telehealth training packages give residential teams a way to install fidelity across dispersed staff without on-site travel Bovi et al. (2017) Ausenhus & Higgins (2019).

05Case Examples

Each of the six formats below has a defined procedure, time signature, indication profile, and validity evidence drawn from the corpus Verriden & Roscoe (2016). Practitioners should be fluent in all six and reach for the one that matches the case rather than defaulting to the format taught most often in graduate school.

Free Operant (FO)

Procedure. Provide simultaneous noncontingent access to an array of items (5-10) for a fixed duration, typically 5 minutes. Record duration of engagement with each item; the hierarchy is the proportion of session time allocated to each stimulus.

Time required. 5-10 minutes per session; multiple sessions recommended for stability.

Indications. Use FO when the client has a history of severe problem behavior evoked by item removal — FO never withholds preferred items and consistently produces lower problem-behavior rates than trial-based formats Tung et al. (2017) Herbek et al. (2026); when the client has a small response repertoire that makes choice trials difficult Verriden & Roscoe (2016); or when you specifically want to confirm interview-derived candidate items before running an FA condition Herbek et al. (2026). Response-restriction free operant (RR-FO) — which removes a previously-engaged item before the next session — preserves the safety advantage while producing rankings comparable to paired stimulus Herbek et al. (2026).

Validity evidence. FO produces less stable rankings across repeated administrations than PS or MSWO, so it should not be the default when precision matters Verriden & Roscoe (2016). RR-FO closes part of this gap Herbek et al. (2026).

Single Stimulus (SS)

Procedure. Present items one at a time and record approach (reach, take, consume, engage) versus avoidance within a defined latency window. Repeat across multiple trials; the hierarchy is approach percentage per item Ford et al. (2022).

Time required. Roughly 30 seconds per trial × items × trials; a 10-item, 3-trial assessment runs about 15 minutes Ford et al. (2022).

Indications. Use SS when the client has limited choice-making repertoires; for older adults with neurocognitive disorder, where SS — particularly verbally delivered — produces more consistent responding and better predicts later engagement than rank-order formats Ford et al. (2022); or when you need the simplest possible procedure for caregiver-administered home assessment. SS preferences in older adults with NCD also remain stable across 20-32 weeks Ford et al. (2022).

Validity evidence. Strong predictive validity for highly preferred items in cognitively impaired adults; weaker discrimination among mid- and low-preferred items Ford et al. (2022). Adapt with brief social interaction during item presentation and yes/no questions when verbal repertoire is limited Bigwood et al. (2026).

Paired Stimulus (PS)

Procedure. Present items in all possible pairs (n items → n(n-1)/2 pairs); on each trial both items are presented, the learner selects one, the chosen item is briefly accessed, and the trial is recorded (Waits & Gilroy, 2025). The hierarchy is the percentage of opportunities each item was selected — (selections / (selections + non-selections)) × 100 (Waits & Gilroy, 2025).

Time required. Longer than other formats — a 7-item PS generates 21 pairs and runs 25-35 minutes including brief access periods (Waits & Gilroy, 2025).

Indications. Use PS when you need maximum precision in the resulting hierarchy — PS produces among the most stable rankings across repeated administrations Verriden & Roscoe (2016); when the client is new and the item set has not been culturally vetted (PS pairs especially well with a caregiver RAISD interview) (Waits & Gilroy, 2025); or when you need to rank a small set of very similar items where MSWO array-based discrimination may collapse.

Validity evidence. Stable rankings; small but real boost in reinforcer efficacy for items that maintain high preference across time Verriden & Roscoe (2016). Higher problem-behavior risk than FO/RR-FO for tangible-maintained populations Tung et al. (2017) Herbek et al. (2026).

Multiple Stimulus Without Replacement (MSWO)

Procedure. Present an array of items (5-7) simultaneously; the learner selects one, that item is removed from the array, and the remaining items are re-presented after a brief inter-trial interval, repeating until all items are selected or refused Melanson et al. (2023). Run multiple rounds (typically 3-5) and average rank order Melanson et al. (2023).

Time required. A 5-item, 3-round MSWO runs 10-15 minutes.

Indications. Use MSWO as the default rank-order assessment when the client has scanning and selection prerequisites, the item class is well-defined, and you need a precise hierarchy efficiently. MSWO has the densest variation literature: web-based, tablet picture-icon, group Plickers, and video MSWO are all validated Curiel et al. (2018) Hoffmann et al. (2019) Radley et al. (2019) Chebli & Lanovaz (2016).

Validity evidence. Stable rankings comparable to PS across days Verriden & Roscoe (2016). The catch: intra-assessment stability is weaker than practitioners assume — about 40% of within-administration round comparisons show meaningful rank shifts, so run at least three rounds and use the final-round hierarchy or the mode rather than trusting round one Melanson et al. (2023). For social interactions specifically, MSWO produces a valid hierarchy only when the learner can reliably tact pictures of those interactions; otherwise switch to SIPA Morris & Vollmer (2020).

Multiple Stimulus With Replacement (MSW)

Procedure. Same as MSWO except the selected item is replaced into the array after each selection rather than removed. The hierarchy is built from selection frequency across a fixed number of trials Verriden & Roscoe (2016).

Time required. Slightly longer than MSWO for the same array (selection patterns can become repetitive).

Indications. Most useful when you specifically want to identify a single dominant item rather than a full hierarchy — MSW tends to surface the strongest preference clearly, sometimes at the cost of differentiating mid-rank items. In modern practice MSWO has largely replaced MSW because the hierarchy is more informative.

Validity evidence. Less common in the recent literature than MSWO; format-comparison studies treat MSWO as the default Verriden & Roscoe (2016) Morris & Vollmer (2020).

Brief MSWO

Procedure. A single MSWO round (or a 30-60 second condensed administration) used as a rapid screen. The hierarchy is the rank order from that single round.

Time required. 1-3 minutes.

Indications. Use brief MSWO when you need an in-the-moment reinforcer check during a teaching block — clinicians' actual self-reported behavior in the field is to run brief micro-assessments rather than rely on yesterday's full MSWO when engagement drops (Morris et al., 2024); for whole-class group preference identification (group Plickers brief MSWO produced identical top-edible reinforcers as individual four-choice MSWO across 19 elementary students) Radley et al. (2019); or to screen videos, app icons, or other digital stimuli rapidly via web or tablet Curiel et al. (2018) Hoffmann et al. (2019).

Validity evidence. Brief MSWO produces less stable hierarchies than full multi-round MSWO — given that intra-assessment instability runs about 40% across rounds, a single round is by definition a noisier measurement Melanson et al. (2023). Use it for screening and rapid in-session adjustment, not for definitive treatment-stimulus selection.

06Common Pitfalls

  • Treating the preference hierarchy as the reinforcement hierarchy. Preference rank does not always predict reinforcer substitution under treatment-like response requirements; validate the top one or two items with a concurrent-operants test under conditions resembling the actual schedule before committing to them in the BIP Frank‐Crawford et al. (2018) Castelluccio & Johnson (2019).
  • One-and-done PA thinking. Preferences shift within a single MSWO administration about 40% of the time, and engagement-driven mid-session shifts are routine in EIBI; build brief micro-checks into every 30-40 minute teaching block Melanson et al. (2023) (Morris et al., 2024).
  • Ignoring satiation and motivating operations. A high-preference item identified at 9am with a hungry learner is not the same item at 11am after free-access snack time. Re-assess any time MOs meaningfully shift, the social partner changes, or 4-8 weeks have passed Verriden & Roscoe (2016) Ford et al. (2022).
  • Defaulting to MSWO for clients who can't sustain it. For older adults with NCD and learners with limited choice repertoires, single stimulus — particularly verbally delivered — produces more valid and stable hierarchies than rank-order formats Ford et al. (2022).
  • Running PS or MSWO when problem behavior is tangible-maintained. Both formats withhold preferred items between trials and reliably evoke higher problem-behavior rates than FO or RR-FO for these clients Tung et al. (2017) Herbek et al. (2026).
  • Using picture or video MSWO without verifying prerequisites. Pictorial and video formats only approach tangible accuracy when the learner can match the picture or comprehend the video; missing prerequisites produce silent invalid hierarchies Heinicke et al. (2019) Morris & Vollmer (2020).
  • Assuming the assessor doesn't matter for social stimuli. Social interactions selected in PA depend on the person delivering them; a praise type ranked high when Mom delivers it may be lower-preferred — and a weaker reinforcer — when a staff member delivers the same form Huntington & Schwartz (2022).
  • Running a PA when the question is competing-stimulus or demand-aversion. PA produces a rank order, not a behavior-reduction metric. For automatically maintained PB, run a Competing Stimulus Assessment; for sequencing demands, run a PSDA Haddock & Hagopian (2020) Avery & Akers (2021).
  • Failing to standardize stimulus presentation. Stimulus size, portion, and presentation modality shift the rank of mid- and low-preferred items; if precise full-rank hierarchies matter, hold presentation parameters constant across administrations Moore et al. (2017).
  • Treating caregiver intuition as a substitute for direct PA. Caregivers are better at predicting which items are least preferred than which are most preferred; use indirect tools as inputs, not outputs Cameron et al. (2025).

07When to Refer Out

  • Suspected medical or biological substrate driving apparent preference shifts. Sudden item rejection, food refusal, or new aversion patterns in a previously stable hierarchy can mask GI, dental, or sensory-related issues; document a medical referral before re-running or re-validating the PA.
  • Severe automatically maintained problem behavior unresponsive to standard preference-derived reinforcers. When competing-stimulus assessments fail to identify suppressing items across replications, refer for specialist consultation with a team experienced in matched-stimulation and sensory-enrichment programming Haddock & Hagopian (2020).
  • Persistent inability to identify any preferred item across formats and re-assessments. If FO, SS, and brief MSWO all fail to surface any reliably approached stimulus across multiple sessions, escalate to a specialist team rather than committing to a treatment plan without an empirical reinforcer.
  • Client repertoire requirements beyond local staff training capacity. When the client population requires SIPA or component-based concurrent-operants formats and local staff cannot reach mastery via brief VMVO or telehealth packages, refer to a regional consultation team rather than running an underpowered assessment in-house Morris & Vollmer (2020) Ausenhus & Higgins (2019).

08Future Research Directions

The corpus has strong operational claims about format selection, training, and validity, but key gaps remain. SPADS is the field's clearest written decision-making model and lacks empirical validation of its effects on either reinforcer efficacy or assessment efficiency — a prospective comparison of SPADS-guided versus default-format PA is the most obvious next study Lill et al. (2021). Intra-assessment stability has been quantified at around 60% rank correspondence within MSWO, but the field lacks consensus on which specific re-assessment cadence (daily, weekly, MO-triggered) optimally balances precision against staff time Melanson et al. (2023). The preference-versus-reinforcement gap has been demonstrated at the case level, but no published prospective study compares treatment outcomes when clinicians do versus do not validate the top PA items via concurrent-operants tests before BIP implementation Frank‐Crawford et al. (2018). Cultural and linguistic adaptations remain under-studied — Waits and Gilroy's bilingual paired-stimulus work is one of very few published demonstrations (Waits & Gilroy, 2025). Component-based concurrent-operants leisure assessments show promise but have only been demonstrated with two adult participants Isenhower et al. (2025). The institutional-adoption problem identified by Miranda and colleagues — teachers who view PA as acceptable but rarely use it — needs implementation-science work to solve Miranda et al. (2025).

09Practitioner Takeaways

  1. Distinguish preference from reinforcement explicitly. A PA produces a hierarchy of choice; a reinforcer test produces evidence of behavior change Frank‐Crawford et al. (2018). Use the PA to nominate the top one or two items, then validate them with a concurrent-operants test under conditions resembling the actual treatment schedule Frank‐Crawford et al. (2018) Castelluccio & Johnson (2019).
  2. Default to PS or MSWO when stability matters; default to FO or RR-FO when safety matters. PS/MSWO produce more stable hierarchies; FO/RR-FO produce lower problem-behavior rates for tangible-maintained clients Verriden & Roscoe (2016) Tung et al. (2017) Herbek et al. (2026).
  3. Use single stimulus for older adults with NCD and any client with limited choice repertoires. SS — particularly verbally delivered — produces more consistent responding, better predicts engagement, and remains stable across 4-8 months in NCD Ford et al. (2022).
  4. Run brief MSWO as your in-session reinforcer-adjustment tool. A 1-3 minute single-round MSWO is what experienced clinicians actually do mid-session (Morris et al., 2024).
  5. Run at least three MSWO rounds and use the final-round hierarchy. Intra-assessment rank shifts occur in roughly 40% of within-administration round comparisons Melanson et al. (2023).
  6. Pair the RAISD or open-ended caregiver interview with a direct PA on a new client. Caregivers are better at ruling items out than ranking them in (Waits & Gilroy, 2025) Cameron et al. (2025).
  7. For social reinforcers, run the PA with the person who will deliver them. Document the assessor on every data sheet; the same praise form can be a strong reinforcer with one assessor and weak with another Huntington & Schwartz (2022).
  8. For social-stimulus preference, use SIPA when picture-tact prerequisites are absent, MSWO only when verified. A picture-MSWO of social interactions is invalid for learners who cannot tact those pictures Morris & Vollmer (2019) Morris & Vollmer (2020).
  9. Verify prerequisite skills before adopting picture, video, or verbal formats. Alternative modalities approach tangible accuracy only when prerequisites are intact and chosen items are delivered contingently Heinicke et al. (2019).
  10. Standardize stimulus presentation across administrations. Stimulus size, portion, and modality shift the rank of mid- and low-preferred items Moore et al. (2017).
  11. Use the right tool for the right question: PA for reinforcer identification, CSA for competing-stimulus selection, PSDA for demand sequencing. Mixing these up is one of the most common errors in fieldwork supervision Haddock & Hagopian (2020) Avery & Akers (2021).
  12. Train staff with a brief VMVO or telehealth package and document fidelity. A single VMVO session produced 100% MSWO procedural fidelity; real-time telehealth feedback brings staff to mastery in 3-4 sessions Bovi et al. (2017) Ausenhus & Higgins (2019).

10Frequently Asked Questions

How is a preference assessment different from a reinforcer assessment?

A preference assessment produces a rank order based on what the learner chooses; a reinforcer assessment produces evidence of behavior change based on whether contingent access to the item actually increases responding Frank‐Crawford et al. (2018). The two are correlated but not identical: rank from a standard PA does not always predict reinforcer substitution under increased response requirements Frank‐Crawford et al. (2018). The standard practitioner workflow is PA → top one or two items → concurrent-operants reinforcer validation under conditions resembling the actual treatment schedule Castelluccio & Johnson (2019) Chebli & Lanovaz (2016).

Which format should I run if my client has severe problem behavior?

Default to free operant or response-restriction free operant. Both produce lower problem-behavior rates than PS or MSWO for tangible-maintained clients because they never withhold preferred items between trials, and RR-FO closes most of the precision gap with PS Tung et al. (2017) Herbek et al. (2026). Reserve PS or MSWO for when the safety profile is low and you need a precise hierarchy.

How often should I re-run a preference assessment?

There is no published consensus cadence, but the corpus supports a few defaults. Re-assess any time engagement drops mid-session — this is what experienced BCBAs in EIBI actually do (Morris et al., 2024). Plan a brief micro-PA into every 30-40 minute teaching block. For long-stable populations like older adults with NCD, SS preferences remain stable across 4-8 months, so a quarterly cadence is empirically defensible Ford et al. (2022). For most pediatric ABA clients, re-run at least every 4-8 weeks, when MOs shift, when the social partner changes for social reinforcers, and any time hierarchy assumptions seem to be failing in treatment data Huntington & Schwartz (2022) Melanson et al. (2023).

Are caregiver interviews and the RAISD enough on their own?

No. Indirect tools surface candidate items — including idiosyncratic and culturally specific stimuli that closed-ended checklists miss — but they should feed a direct PA, not replace it (Waits & Gilroy, 2025). Caregivers tend to be better at predicting which items are least preferred than which are most preferred Cameron et al. (2025). The published systematic review of stimulus-stimulus pairing confirms that a formal direct PA is the procedural standard before any reinforcer-dependent intervention (Oliveira et al., 2025).

When should I use a Competing Stimulus Assessment instead of a preference assessment?

When the clinical question is about behavior reduction via noncontingent stimulus access — typically for automatically maintained problem behavior — run a CSA, which directly measures whether free access to an item suppresses the target behavior Haddock & Hagopian (2020). PA produces a rank order, which does not answer that question. A high-preference item is not necessarily a competing stimulus, and a competing stimulus is not necessarily a high-preference item.

Can I run preference assessments via telehealth or with a tablet?

Yes, with verified prerequisites. Tablet-based MSWO using app-icon pictures or video clips reliably ranks digital reinforcers when the learner can match the icon or comprehend the video Hoffmann et al. (2019) Chebli & Lanovaz (2016). Web-based brief MSWO works for video preference identification across age and population Curiel et al. (2018). For staff training, brief telehealth coaching with real-time feedback brings direct-care staff to mastery within 3-4 sessions Ausenhus & Higgins (2019). Always pair an alternative-modality assessment with a brief reinforcer-validation step using the actual digital stimulus contingent on responding Heinicke et al. (2019).

How do I run a group preference assessment in a classroom?

Use a group brief MSWO via Plickers cards or a similar simultaneous-response tool. Across 19 elementary students, a group Plickers assessment produced identical top edible reinforcers as individual four-choice MSWO for every participant Radley et al. (2019). Reserve individual assessment only when the group method fails to detect a particular learner's strongly preferred item.

Can RBTs and teachers be trained to run these assessments?

Yes. A single brief video-modeling-with-voice-over (VMVO) session has produced 100% MSWO procedural fidelity across one viewing for public-school staff Bovi et al. (2017) Bovi et al. (2017). Preservice school psychology graduate students reach MSWO mastery via instructional script plus live feedback O’Handley et al. (2021). Real-time telehealth feedback brings direct-care staff to mastery in 3-4 sessions Ausenhus & Higgins (2019). The remaining barrier is institutional adoption — preschool teachers describe PA as acceptable and useful but rarely use it because of systemic workload pressures, not because the procedure is hard to teach Miranda et al. (2025).

11References

Primary research synthesized in this guide. DOIs link to the original source.