Automated segmentation of child-clinician speech in naturalistic clinical contexts.
You can get reliable automated speech segmentation of your autism sessions with just 30 seconds of labeled audio, cutting manual coding time.
01Research in Context
What this study did
Bertamini et al. (2025) built an AI tool that splits session audio into "child" and "clinician" parts.
They trained the model with only 30 seconds of hand-labeled speech from each session.
All testing took place in real clinics with autistic preschoolers.
What they found
The model found who was talking with good accuracy.
Only half a minute of labeling was enough to make the tool work.
This means you can skip long, hand-coding of session tapes.
How this fits with other research
La Valle et al. (2024) show that counting child turns and utterances over months picks up small language gains. Giulio’s tool now gives you those counts without listening to every minute.
McGeown et al. (2013) used LENA to tally words and vocalizations in preschool rooms. Their gear needed no labels but could not tell which speaker was which. Giulio adds speaker labels with almost zero set-up time.
MHeald et al. (2020) trained AI to spot vocal stereotypy in kids with autism. Both studies reach ≥80 % match with human coders, proving AI audio works for different clinic targets.
Why it matters
You can plug this 30-second setup into your current sessions and get instant child-clinician talk time. Use the numbers to set speaking goals, track progress, or show parents clear data. No extra staff, no hours of coding.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Record your next session, label 30 s of child vs adult speech, and let the free model give you child talk time for the full hour.
02At a glance
03Original abstract
BACKGROUND: Computational approaches hold significant promise for enhancing diagnosis and therapy in child and adolescent clinical practice. Clinical procedures heavily depend n vocal exchanges and interpersonal dynamics conveyed through speech. Research highlights the importance of investigating acoustic features and dyadic interactions during child development. However, observational methods are labor-intensive, time-consuming, and suffer from limited objectivity and quantification, hindering translation to everyday care. AIMS: We propose a novel AI-based system for fully automatic acoustic segmentation of clinical sessions with autistic preschool children. METHODS AND PROCEDURES: We focused on naturalistic and unconstrained clinical contexts, which are characterized by background noise and data scarcity. Our approach addresses key challenges in the field while remaining non-invasive. We carefully evaluated model performance and flexibility in diverse, challenging conditions by means of domain alignment. OUTCOMES AND RESULTS: Results demonstrated promising outcomes in voice activity detection and speaker diarization. Notably, minimal annotation efforts -just 30 seconds of target data- significantly improved model performance across all tested conditions. Our models exhibit satisfying predictive performance and flexibility for deployment in everyday settings. CONCLUSIONS AND IMPLICATIONS: Automating data annotation in real-world clinical scenarios can enable the widespread exploitation of advanced computational methods for downstream modeling, fostering precision approaches that bridge research and clinical practice.
Research in developmental disabilities, 2025 · doi:10.1016/j.ridd.2024.104906