Service Delivery

Audio-Visual Automatic Speech Recognition Towards Education for Disabilities.

Debnath et al. (2023) · Journal of autism and developmental disorders 2023
★ The Verdict

Lip-reading software hit 77 % accuracy, offering a no-touch way for students with severe motor limits to write and speak.

✓ Read this if BCBAs serving students who have reliable speech but minimal or no hand use.
✗ Skip if BCBAs working with clients who already type efficiently with fingers, head, or eye gaze.

01Research in Context

01

What this study did

The team built a computer program that reads lips and listens at the same time.

They added a picture-texture trick called GLCM to an older lip-reading tool named LBP-TOP.

They tested the new mix on recorded speech to see if it could keep up in a classroom.

02

What they found

The lip-reading part alone scored 77 percent correct.

When the mic was on, audio hit 96 percent.

The combo could give students with no hand control a hands-free way to type and talk.

03

How this fits with other research

Shih (2014), Shih et al. (2010), and Shih et al. (2009) all used tiny finger or hand moves to run a computer.

Debnath et al. (2023) keeps the same goal—computer access without fine motor—but swaps finger pokes for voice and lip movement.

Northup et al. (1991) showed that AAC users need to learn to start conversations, not just press buttons.

The new AV-ASR system could build on that lesson by letting users start talk with only their mouth, no buttons required.

04

Why it matters

If a student can’t move hands or head, current tools still ask for at least a finger twitch.

A lip-reading mic combo removes that last motor demand and may run on any laptop camera.

Try filming a student’s mouth while they say answers; see if the free code can turn it into text without touching the keyboard.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →
→ Action — try this Monday

Record a short video of a student saying sight words, run the open AV-ASR script, and count how many words it catches.

02At a glance

Intervention
augmentative alternative communication
Design
other
Population
other
Finding
positive

03Original abstract

Education is a fundamental right that enriches everyone's life. However, physically challenged people often debar from the general and advanced education system. Audio-Visual Automatic Speech Recognition (AV-ASR) based system is useful to improve the education of physically challenged people by providing hands-free computing. They can communicate to the learning system through AV-ASR. However, it is challenging to trace the lip correctly for visual modality. Thus, this paper addresses the appearance-based visual feature along with the co-occurrence statistical measure for visual speech recognition. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and Grey-Level Co-occurrence Matrix (GLCM) is proposed for visual speech information. The experimental results show that the proposed system achieves 76.60 % accuracy for visual speech and 96.00 % accuracy for audio speech recognition.

Journal of autism and developmental disorders, 2023 · doi:10.1109/TMM.2009.2030637