A study of the effects of frequency of probe data collection and graph characteristics on teachers' visual analysis.
Collect more probe data when the trend is unclear—teachers judge better with fuller pictures.
01Research in Context
What this study did
Malagodi et al. (1989) asked teachers to look at graphs. The graphs showed student progress. Some graphs had many data points. Others had few. The team wanted to know if the number of points changed how teachers read the graphs.
They also tested different graph styles. Some lines were steep. Some were flat. Some bounced around. Teachers judged each graph by eye.
What they found
When the line clearly went up, teachers agreed. More data points did not help. When the line was flat, falling, or jumpy, teachers disagreed. More data points helped them agree better.
So, extra points matter most when the trend is messy.
How this fits with other research
Dykens et al. (1991) ran a near-copy study two years later. They added one twist: some teachers saw numbers only, no graph. Result: graph or no graph, the same three things swayed teachers—trend, variability, and how often you record data. The 1991 paper builds on the 1989 finding instead of fighting it.
Wolfe et al. (2023) zoomed in on agreement. They showed that steep trend and big effect size hurt inter-rater agreement the most. This extends the 1989 warning: messy trends need extra care.
Kahng et al. (2010) looks like a contradiction at first. Their experts agreed strongly when judging graphs. But their judges were trained BCBAs, not regular teachers. The 1989 study used teachers with little graph training. The gap shows training, not the graph, drives agreement.
Why it matters
When you eyeball a graph, add more probe days if the data are flat or jumpy. Extra points cut teacher guesswork. If you train staff, show them how trend and variability fool the eye. Pair visual checks with simple rules or brief computer training like O’Grady et al. (2018) to keep teams consistent.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Add two extra probe trials this week for any goal with flat or variable data before you ask staff to judge progress.
02At a glance
03Original abstract
Teachers often rely on visual analysis of graphed student performance data to evaluate progress and make program decisions. However, because collecting data can be time consuming and interfere with instruction, teachers would like to know how much data is necessary to make reliable judgments. To investigate the effect of frequency of data collection on teachers' judgments and decisions, this study addressed the question of whether teachers' judgments differ according to frequency of data collection, whether teachers' judgments differ according to type of trend, and whether teachers' judgments based on different types of graphs vary with frequency of data collection. A set of 16 graphs of actual student performance data was analyzed by 59 teachers of students with moderate to profound handicaps. The data were analyzed by a two-factor repeated measures design, and results indicated that when asked to evaluate student performance, teachers' judgments tended to be consistent and accurate when the graphed data represented continuous and systematic improvement in performance. However, when the data represented a decrease in performance, no change, or highly variable performance, judgments tended to differ by frequency. When asked to make program recommendations, teachers' judgments tended to differ by frequency for all types of trends.
Research in developmental disabilities, 1989 · doi:10.1016/0891-4222(89)90001-2