Lip-Reading Technology Knows What You Said
Electronic design | April 27, 2007
Video surveillance systems
incorporating intelligent analysis already do a good job of tracking
people, recognizing faces, and even interpreting physical gestures.
But if a British research team has its way, surveillance system
operators will have yet another new tool to use in their fight
against crime and terrorism: automatic lip recognition.
Computer-based lip-reading
technology would help video surveillance systems spot people
planning a crime or terror attack by literally watching suspects’
lips for clues. Once it finds someone speaking certain key words or
sentences, the system would automatically send an alert message to a
central console, mobile phone, or other communications device.
Police or security agents could then be dispatched to the scene to
question the individual.
Richard Harvey, a senior lecturer
in computer vision at the University of East Anglia in Norwich,
England, is embarking on a three-year project that will collect
lip-reading data. The information will then be used to create
systems that can automatically convert lip motions into readable
text.
The Home Office, the U.K.
government department responsible for domestic security, is
interested in the project, according to Harvey. So is the U.K.
Engineering and Physical Sciences Research Council, which has
awarded the venture a Ł391,814 grant.
Harvey says he and his researchers
will investigate techniques for recognizing head positions, lip
shapes, and their related sounds. “We have several methods for
extracting what are called ‘features’ from the lips—sets of numbers
that vary with the lip shape, but not with anything else,” he says,
adding that the researchers won’t use any specialized cameras or
computers in their work. “Our technology is very standard,” he says.
“We are using standard speech recognition technology.”
Current automated lip-reading
systems, which require good lighting and static heads, are limited
and relatively inaccurate. “We can lip-read between 10 and 30
utterances at the moment, with an accuracy of around 50%,” Harvey
says. “Given the difficulty of lip-reading, that is regarded as
pretty good. But obviously there is a huge way to go before we can
handle natural speech.”
Yet Harvey feels that once all the
kinks can be worked out, automated lip reading could eventually be
applied to consumer and business products as well as video
surveillance. Camera phones incorporating the technology, for
example, would let users communicate in even the noisiest
environments.
“Lip-reading is important because
normal people lip-read all the time... in cars, aircraft, parties,
offices, and so on,” Harvey says. “Therefore, lip-reading is useful
as an adjunct to normal speech recognition.”
|