Introduction
Teaching learners to answer questions about pictures is a common goal in ABA programming, but the best way to get there isn’t always obvious. Should you use simple picture cards with one item, or compound cards with multiple elements? This study compares both approaches to see which leads to better question-answering skills—and the findings may change how you structure your daily teaching trials. See also: the original research study.
What is the research question being asked and why does it matter?
Many learners can label a picture when you show it to them but struggle when you ask questions about that same picture. For example, they may see a picture with a teacher, an apple, and a classroom, yet cannot answer “who,” “what,” and “where” about it. This matters because real-life conversation often involves more than one cue at a time—a question plus what the learner sees. See also: Behavior Analysis in Practice.
The research question was straightforward: When you teach tacts, should you use simple pictures (one thing per card) or compound pictures (several things per card) to help question-answering skills emerge later without direct teaching?
Clinically, this is a daily programming choice. If one approach helps wh-question skills emerge faster, it could save teaching time and reduce frustration for everyone involved.
The researchers also cared about “emergence”—new skills that appear after teaching something else. Emergence is helpful only if it actually happens in your setting with your learners. If it doesn’t, you still need a plan to teach those missing skills directly.
What did the researchers do to answer that question?
In Experiment 1, they worked with seven college students using made-up shapes, colors, and symbols with one-syllable names.
One condition taught tacts with simple stimuli (one feature per card). The other taught tacts with compound stimuli (a card showing a color, shape, and symbol together), with the teacher pointing to the part being taught. In both conditions, the teacher said the category name, the participant said the item name, and correct answers received praise. Errors led to an echoic model prompt.
During training, they ran brief probe trials for “intraverbal-tacts.” On these probes, the person saw a compound card, but the teacher did not point. The teacher asked about one feature at a time, and the learner had to say the correct name for that feature. Mastery was set at about 89% correct for three consecutive sessions.
They also ran pre- and post-tests of other skills without prompts or feedback. These included listener tasks (touching the correct item when asked) and speaker tasks (saying the category name).
In Experiment 2, they repeated the comparison with one 9-year-old child with autism (Lucy). They added tokens and used a prompt-delay setup for tact teaching. They also adjusted the post-test because Lucy’s behavior suggested the original probe format may not have captured her best performance.
How you can use this in your day-to-day clinical practice
If your main goal is for a learner to answer questions about a picture by looking at the right part, this study suggests compound-stimulus tact teaching isn’t automatically better than simple-stimulus tact teaching. In both experiments, learners reached the intraverbal-tact outcome after tact teaching in either format.
Practically, this means you can pick the stimulus format based on what fits your learner and your materials—not because one method clearly wins.
What seems more important is how trials are arranged. The teacher rotated the auditory cue often (different category words across trials) from the start.
In day-to-day programming, don’t block your trials so the learner can ignore the question. If you run 10 trials in a row of only “who” questions, some learners will respond based on the picture alone and stop listening. Mix question types early, even with heavy prompting at first, so the learner practices checking the question every time.
Also pay attention to how you vary the visual stimuli. In simple-stimulus teaching, each visual item appeared once per session, reducing “I got it because it looks familiar” responding. In compound-stimulus teaching, the combinations changed across trials, which also pushes the learner to actually look.
In practice: rotate pictures, rotate people, rotate scenes. If the learner only answers correctly with one exact picture, you may be teaching a narrow skill that won’t generalize.
You can copy the idea of interspersed probes, but do it carefully. The researchers ran short checks during teaching to see if the untrained intraverbal-tact skill was emerging.
Clinically, this can save time because you don’t need a separate long probe session every week. But keep probes brief and low-pressure. If probes upset the learner or lead to many errors, you may be testing too hard, too soon, or too long.
Don’t assume other speaker skills will emerge just because intraverbal-tacts did. In Experiment 1, listener skills were strong after training, but many speaker-category skills stayed low.
For practice: if you need the learner to say category labels (like “fruit,” “vehicle,” “emotion”), plan to teach and probe that directly. Don’t rely on it emerging from tact training alone.
When teaching compound pictures, the researchers pointed to the relevant part during tact teaching but not during intraverbal-tact probes. That shift matters.
In your sessions, fade pointing or highlighting on purpose so the learner learns to scan the picture independently. If the learner only answers when you point, the skill isn’t ready for natural conversation.
For learners like Lucy, probe format can change results. She showed signs of knowing answers but didn’t always demonstrate it on long, feedback-free tests.
In practice, if a learner’s probe data look worse than training data, don’t jump straight to “they didn’t learn it.” First check motivation, session length, response effort, and whether probe directions are clear. You may need shorter probes, breaks, or a different way to let the learner show the skill while keeping your measurement honest.
Use these findings with limits in mind. Most data came from adults learning made-up labels, plus only one child with autism. The teaching also used strong prompting and dense reinforcement.
Treat this as guidance for trying an approach, not proof it will work for every learner or target. The safest takeaway: build conditional discrimination from the start by varying questions and visuals, measure whether question-answering about pictures is emerging, and be ready to teach missing speaker skills directly if they don’t.
Works Cited
Halbur, M., Kodak, T., & Reidy, J. (2025). A comparison of training procedures on the emergence of intraverbal-tacts. The Analysis of Verbal Behavior, 40, 379–402. https://doi.org/10.1007/s40616-024-00214-6