Audio-Visual Automatic Speech Recognition Towards Education for Disabilities.
Lip-reading software hit 77 % accuracy, offering a no-touch way for students with severe motor limits to write and speak.
01Research in Context
What this study did
The team built a computer program that reads lips and listens at the same time.
They added a picture-texture trick called GLCM to an older lip-reading tool named LBP-TOP.
They tested the new mix on recorded speech to see if it could keep up in a classroom.
What they found
The lip-reading part alone scored 77 percent correct.
When the mic was on, audio hit 96 percent.
The combo could give students with no hand control a hands-free way to type and talk.
How this fits with other research
Shih (2014), Shih et al. (2010), and Shih et al. (2009) all used tiny finger or hand moves to run a computer.
Debnath et al. (2023) keeps the same goal—computer access without fine motor—but swaps finger pokes for voice and lip movement.
Northup et al. (1991) showed that AAC users need to learn to start conversations, not just press buttons.
The new AV-ASR system could build on that lesson by letting users start talk with only their mouth, no buttons required.
Why it matters
If a student can’t move hands or head, current tools still ask for at least a finger twitch.
A lip-reading mic combo removes that last motor demand and may run on any laptop camera.
Try filming a student’s mouth while they say answers; see if the free code can turn it into text without touching the keyboard.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Record a short video of a student saying sight words, run the open AV-ASR script, and count how many words it catches.
02At a glance
03Original abstract
Education is a fundamental right that enriches everyone's life. However, physically challenged people often debar from the general and advanced education system. Audio-Visual Automatic Speech Recognition (AV-ASR) based system is useful to improve the education of physically challenged people by providing hands-free computing. They can communicate to the learning system through AV-ASR. However, it is challenging to trace the lip correctly for visual modality. Thus, this paper addresses the appearance-based visual feature along with the co-occurrence statistical measure for visual speech recognition. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and Grey-Level Co-occurrence Matrix (GLCM) is proposed for visual speech information. The experimental results show that the proposed system achieves 76.60 % accuracy for visual speech and 96.00 % accuracy for audio speech recognition.
Journal of autism and developmental disorders, 2023 · doi:10.1109/TMM.2009.2030637