Blog > Blog details

D.6. Critique and interpret data from single-case experimental designs.

Pencil sketch illustration for: D.6. Critique and interpret data from single-case experimental designs.

Critique and Interpret Data from Single-Case Experimental Designs: A Practical Guide for Clinicians

If you work in applied behavior analysis, you’ve likely collected single-case experimental design (SCED) data—careful measurements of a client’s behavior across baseline and intervention phases. But collecting the data is only half the work. The harder part is looking at that graph and answering the question that matters most: Did my intervention actually cause the change I’m seeing, or is something else going on?

This article is for BCBAs, supervisors, clinic directors, and senior clinicians who need to confidently critique and interpret SCED data in real time. Misreading a graph can lead to continuing an ineffective intervention, abandoning a successful one, or missing a confounding factor that’s actually driving change. The stakes are real—both for your clients and for your credibility. See also: single-case experimental design methodology.

We’ll walk through what SCED data actually tells you, how to read it visually, what experimental control really means, and the ethical guardrails you need to keep in place. By the end, you’ll have a practical checklist to use before making any major clinical decision based on SCED results.

Get free CEUs every Wednesday

Join 1,000+ BCBAs getting weekly CEUs and access to the free ABA Clubhouse.

    No spam. Unsubscribe anytime.

    What Is a Single-Case Experimental Design and Why Does It Matter?

    A single-case experimental design is a research method that tests whether an intervention caused a change in behavior by using the individual as their own control. Instead of comparing one group to another, you measure the same person’s behavior repeatedly—first without intervention (baseline), then with intervention—and look for clear changes that line up with when you made the treatment change.

    The purpose of critiquing SCED data is straightforward: you need to know whether your intervention produced the change or whether other factors might explain what you’re seeing. Did the client’s escape behavior really decrease because you introduced a new reinforcement strategy, or did it happen to drop because their home situation stabilized at the same time? Did on-task behavior increase because of your prompting protocol, or because the classroom became quieter that week? (Behavior Analysis in Practice)

    Single-case designs are especially useful in ABA because they let you test functional relationships with one individual without needing a large group. They also give you fast feedback—you can often see whether something is working within days or weeks, not months.

    The Core Elements of SCED Data

    When you look at a single-case graph, several features tell the story of whether your intervention worked.

    Repeated measurement means you collect data on the target behavior consistently across many sessions. You’re not making a judgment based on one observation or a feeling; you have a data trail. This is what lets you spot real patterns instead of random ups and downs.

    Phases are the distinct periods on your graph: baseline (before intervention), intervention (during treatment), and sometimes a withdrawal phase (turning off the intervention to test whether behavior reverses). Clear phase labels are non-negotiable. Anyone reading your data should instantly see where one phase ends and another begins.

    Within each phase, four visual features matter most:

    • Level is where the data points sit on the graph. Did behavior move up or down when the phase changed?
    • Trend is whether data are moving in a direction over time. Is the behavior steadily improving, getting worse, or staying flat?
    • Variability is how much the data bounce around. High variability can make it harder to see real changes.
    • Immediacy of effect is whether behavior changed right when you started the intervention, or gradually over time. A true causal effect often shows an immediate shift.

    Overlap refers to how much the data from one phase look like the data from the previous phase. If your baseline and intervention data almost completely overlap, it’s harder to claim the intervention caused change. Clear separation is stronger evidence.

    Replication is perhaps the most powerful feature. When you see the same effect happen more than once—across multiple phases, settings, or participants—your confidence grows much stronger. A one-time improvement could be luck; a repeated, predictable improvement is compelling evidence of a functional relationship.

    Experimental Control Versus Correlation

    One of the most common mistakes clinicians make is confusing correlation (two things happening at the same time) with experimental control (you changing something and seeing a predictable result because of that change).

    Suppose a client’s aggressive behavior decreases the same week their new medication starts and you introduce a new behavior plan. Both changes happened at the same time. Did the medication work? The behavior plan? Both? You cannot tell from correlation alone.

    Experimental control requires systematic manipulation—you intentionally change one thing and hold everything else steady—plus consistent replication. If you introduce the intervention and behavior improves; then you remove it and behavior returns to baseline; then you reintroduce it and improvement happens again—you now have replication. That pattern is much harder to explain by coincidence alone.

    In the simplest terms: experimental control means you can predict what will happen before you change the intervention, and the data confirm your prediction. Correlation means two things occurred together, but you cannot rule out other explanations.

    Common SCED Designs You’ll See in Practice

    You don’t need to memorize every design, but it helps to recognize the main ones and understand how each builds evidence differently.

    ABAB designs consist of baseline, intervention, withdrawal, and return to intervention. The withdrawal phase is where you pause treatment to see if behavior reverses. If it does, that replication strengthens your confidence. The ethical caution here is real: if the behavior is dangerous, withdrawing the intervention can be harmful. That’s why many clinicians prefer designs that don’t require withdrawal.

    Multiple baseline designs avoid the withdrawal problem entirely. Instead, you introduce an intervention at different times across multiple settings, participants, or behaviors. If behavior improves only where you introduced the intervention—and does so repeatedly as you introduce it in each new setting—you have strong evidence that your intervention, not time or history, caused the change.

    Alternating treatments designs compare two or more interventions by switching between them rapidly and looking at which data path performs better. This is useful when you want to test which method works best without long withdrawal phases.

    The key insight across all designs: the more you replicate the effect under controlled conditions, the stronger your causal claim becomes.

    How to Read and Critique a Graph

    Start by checking the basics. Do the phases have clear labels? Can you tell where baseline ends and intervention begins? Are the axes labeled? A poorly drawn graph makes interpretation nearly impossible.

    Next, look for level and trend changes at the phase boundary. Did the data shift noticeably right when the phase changed, or was the behavior already moving in that direction? If baseline already shows an upward trend, you cannot claim the intervention caused the improvement without showing that it accelerated or magnified that trend beyond what baseline predicted.

    Then assess variability within phases. If data are all over the place, it becomes harder to spot a real effect. That doesn’t mean unstable data always means the intervention didn’t work—but it does mean you need to investigate why. Is measurement inconsistent? Are conditions changing?

    Look for overlap between phases. Is there clear separation, or do baseline and intervention data overlap substantially? More separation means stronger evidence.

    Finally, ask yourself: If I had to predict what will happen next, what does the graph tell me? If you see a clear, stable pattern and you introduced an intervention, can you confidently predict that stopping it would reverse the pattern? That’s the hallmark of experimental control.

    When and Why This Matters in Your Practice

    SCED critique directly shapes clinical decisions that affect real clients. When you evaluate whether an intervention is working, you’re deciding whether to continue, modify, or stop treatment. Poor graph interpretation could mean a client receives an ineffective strategy longer than necessary. Conversely, abandoning an effective intervention based on misreading a graph deprives a client of something that works.

    Supervision is another critical use. When you train a clinician to collect single-case data, reviewing their graphs becomes a window into their understanding of measurement and behavior change. A trainee who cannot interpret overlap or variability may also struggle to spot procedural fidelity problems.

    You also use this skill when communicating with families and teams. Parents want to know: Is the intervention working? Your ability to point to a graph and explain in plain language why the data show (or don’t show) a causal effect builds trust and informs shared decisions.

    Real-World Examples

    Example 1: An ABAB Design Testing Escape-Maintained Problem Behavior

    A student engages in verbal refusal and task avoidance during math. During baseline, you record the frequency of refusal across 10 sessions. The data average around 12–14 instances per session with moderate stability.

    You introduce an intervention: the student earns a 2-minute break after completing five math problems correctly. Refusal drops immediately to 6–8 instances in the first few sessions, then stabilizes at 5–7. This shows a large level change and immediacy of effect.

    You withdraw the intervention to test whether the behavior reverses. Refusal climbs back to 10–14 instances per session. You reintroduce the intervention, and refusal drops again to 5–7.

    Why this is strong evidence: The behavior changed right when the intervention started, reversed when you removed it, and repeated the pattern when you reintroduced it. That replication across three phase changes is powerful.

    Example 2: A Multiple Baseline Across Three Classrooms

    You design a prompting protocol to increase on-task behavior in three classrooms. In Classroom A, you introduce prompting in week 2 while Classrooms B and C remain in baseline. On-task behavior improves in Classroom A but not in B or C. In week 4, you introduce prompting in Classroom B, and behavior improves there while C remains flat. In week 6, you start prompting in Classroom C, and behavior improves there too.

    Why this is strong evidence: The staggered introduction rules out time-based confounds. Because behavior improved only in each classroom after the intervention was introduced—not before—you can be confident the intervention caused the change. Replication across three settings strengthens that conclusion further.

    The Role of Treatment Integrity

    You cannot interpret SCED data fairly without knowing whether the intervention was implemented correctly. Treatment integrity means the clinician delivered the intervention as designed. If your graph shows no improvement, but the intervention was only implemented 40% of the time, the real question is not “Does this intervention work?” but “What happens when we actually implement it correctly?”

    Conversely, if data show improvement but integrity was poor, you cannot be sure whether the improvement came from your planned intervention or from something off-plan. Always document your measurement method for integrity and check it regularly.

    Measurement Changes and Threats to Validity

    One subtle but critical threat is measurement change. Suppose baseline data are collected by direct observation, and intervention data are collected by teacher report. Even if the numbers look different, the change might simply reflect the different method, not a real change in behavior.

    Other threats include history (an event outside your control coinciding with the intervention), maturation (the learner simply developing), and instrumentation (the measurement tool itself changing). Your job during critique is to consider whether any of these factors could plausibly explain the data pattern.

    This is where clear documentation matters. Record what measurement method you used in each phase, note any unusual events, and document procedural fidelity consistently.

    Ethical Considerations in SCED Critique

    Withdrawing an effective intervention raises real ethical concerns. If a client’s aggressive behavior is dangerous, deliberately stopping the treatment that controls it can cause harm. Informed consent is essential: the client or guardian must understand that you will intentionally remove an effective intervention to test it, and they must agree.

    Many clinicians prefer designs that avoid withdrawal altogether—like multiple baseline—to dodge this ethical tension. If you do use withdrawal, weigh the scientific value against the ethical risks and document both your decision and your safeguards.

    Selective reporting is another ethical trap. If you show only the intervention phase and hide the fact that baseline was already improving, you misrepresent what the data show. Transparency means presenting the full graph and clearly explaining what you can and cannot conclude. If your data are ambiguous, say so.

    Confidentiality also matters. Never include identifying information on graphs shared outside the clinical record.

    Visual Analysis Versus Statistics

    In SCED interpretation, visual analysis is primary. You look at the graph, inspect level, trend, variability, and overlap, and form a judgment about whether the data support a causal claim.

    Statistical tests can supplement visual analysis but should never replace it. A statistical test can miss important patterns that visual inspection catches immediately. If you use statistics, choose methods designed for single-case data. And remember: a graph that clearly shows experimental control does not need statistical support. Conversely, a graph that is ambiguous will not be rescued by statistics alone.

    Common Mistakes in SCED Interpretation

    Overemphasizing a single data point is a classic error. One low value during intervention does not mean the intervention failed; look at the overall pattern.

    Ignoring variability can lead you astray. If baseline is highly unstable, it’s harder to claim the intervention produced change.

    Assuming improvement must be immediate is another pitfall. Some interventions produce gradual change. The key is not “does it change right away?” but “does change align with the intervention?”

    Averaging phase data without inspecting raw points obscures immediacy and trend. Always look at the individual data points first.

    Finally, confusing correlation with causation is pervasive. Just because baseline ended and intervention started at the same moment does not mean the intervention caused change. True experimental control requires replication, not just coincidence.

    A Practical Critique Checklist

    Before making a clinical decision based on SCED data, ask yourself:

    • Are phases clearly labeled and distinct? Can anyone reading this graph tell where one phase ends and the next begins?
    • Did behavior change at the phase boundary or before? Look for immediacy of effect relative to the intervention onset.
    • Is there overlap between phases? How much separation exists between baseline and intervention data?
    • Are the data stable within each phase? Or are they so variable that real effects are hard to spot?
    • Is there replication? Does the effect repeat across multiple phases, settings, or participants?
    • Is measurement consistent across phases? Did you use the same method and observer throughout?
    • Is treatment integrity documented and adequate? Was the intervention actually delivered as designed?
    • Are there plausible alternative explanations? Could history, maturation, or concurrent changes account for the pattern?

    If you answer yes to most of these questions, your data likely demonstrate experimental control. If not, investigate further before drawing causal conclusions.

    Talking to Families and Teams About SCED Data

    Families and team members aren’t always familiar with technical language. Your job is to translate the data into plain language without losing accuracy.

    Try: “Look at where the intervention started—you can see the behavior dropped right there. When we paused the intervention, it went back up. When we started again, it came down again. That pattern tells us the intervention is what’s controlling the change.”

    Highlight practical significance: “This means your child is safer when we use this approach, and that’s what matters most.” Share limitations too: “We’re still learning about what works best in this setting, and we’ll keep monitoring.”

    Key Takeaways

    Critiquing and interpreting SCED data is a foundational skill for evidence-based ABA practice. The core insight is simple: visual patterns of level, trend, variability, overlap, and replication are your best tools for determining whether an intervention caused change. You don’t need fancy statistics; you need careful graph reading and honest reasoning about alternative explanations.

    Experimental control is not about a single perfect phase change; it’s about demonstrating that you can predict and replicate behavior change when you manipulate the intervention. That kind of evidence gives you confidence to continue, modify, or stop treatment with genuine knowledge of what is working.

    Equally important is ethical rigor. Transparent reporting, consistent measurement, adequate treatment integrity, and informed consent protect your clients and your credibility. A well-interpreted graph that honestly acknowledges limitations builds more trust than an ambiguous graph oversold as definitive.

    As you review graphs in supervision, during consultation, or in your own practice, use this checklist. Ask tough questions about measurement, fidelity, and alternative explanations. That critical eye is what separates clinical confidence from wishful thinking—and it’s what ensures your clients receive interventions that actually work.