IEEE SPS 日本会員の皆様へ

        IEEE SP Society Japan Chapter
                           Chair 広瀬 啓吉(東京大学)    
                      Vice Chair 杉山 昭彦(NEC)    


Prof. Mark Hasegawa-Johnson をお迎え致しまして, IEEE 講演会
を開催いたします. IEEE会員の方はもちろん、会員でない方の参加
も可能です。本講演会への参加は無料です。また事前の聴講申込み
も不要です。多数の御来場をお待ちしております。

○講演者:
  Mark Hasegawa-Johnson
  Associate Professor
  Statistical Speech Technology Group Head
  Beckman Institute/ECE Department
  University of Illinois at Urbana-Champaign
  http://www.ifp.uiuc.edu/~hasegawa/

○講演題名:
 Audio-Visual Speech Recognition: 
 Audio Noise, Video Noise, and Pronunciation Variability

○日時:
  2007年6月28日(木) 13:15〜14:30

○会場:
  東京大学工学部新2号館3階101B1(新領域輪講室)
  http://www.u-tokyo.ac.jp/campusmap/cam01_04_18_j.html

○問合先:
  SPS Japan Chapter Chair: 広瀬 啓吉 
 [hirose@gavo.t.u-tokyo.ac.jp]
  Department of Information and Communication Engineering
  Graduate School of Information Science and Technology
  University of Tokyo
  7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656 JAPAN
  Tel.: +81-3-5841-6667
  Fax.: +81-3-5841-6648

○講演会主催:
  IEEE Signal Processing Society Japan Chapter

○ 講演者略歴:
 Mark Hasegawa-Johnson received his Ph.D. from MIT in 1996; 
he was a post-doctoral fellow at UCLA from 1996-1999, and 
has been on the faculty at the University of Illinois since 
1999. Dr. Hasegawa-Johnson is author or co-author of 4 
patents, 71 peer-reviewed journal and conference papers, 
and a chapter in the Wiley Encyclopedia of Telecommunications. 
In 2004, he ran a multi-university research workshop team at 
Johns Hopkins University, in which phonological-feature 
transformations were demonstrated for the front end of a DBN 
automatic speech recognizer.  In the 2006 workshop, similar 
ideas were applied to the task of audiovisual speech 
recognition.  
Dr. Hasegawa-Johnson's group at the Beckman Institute is the 
source of AVICAR, the world's largest (by far) freely 
available database of audiovisual speech recorded under 
real-world noise conditions.  Dr. Hasegawa-Johnson is a 
member of the Speech Technical Committee of the IEEE Signal 
Processing Society, and a Senior Member of the IEEE.

○講演内容:
  The problem of audio-visual speech recognition (AVSR: 
recognition of the words spoken by a videotaped talker) 
provides a new way of looking at some old speech problems, 
including the problems of noise and pronunciation variability.  
This talk will describe methods we have found useful for 
compensating audio noise, video noise, and pronunciation 
variability in AVSR.  Video "noise" includes lighting 
variation, interlace, camera movement, obstructions, and 
variations of skin tone and facial features.  Video "noise" 
may be compensated during feature extraction: the video 
recording specifies thousands of measurements per frame, of 
which only a very small number (perhaps 3-13) are useful for 
AVSR.  Classical methods work well, but better results are 
obtained by optimizing the feature extractor in order to 
optimally separate low-dimensional "manifolds" corresponding 
to each phoneme.  Audio "noise" includes both wind noise and 
reverberation.  These are classical problems, and classical 
solutions work well: beamforming, postfiltering, and voice 
activity detection, using a time-varying noise model backed 
off to a time-invariant baseline.  Finally, pronunciation 
variability is a phonological phenomenon, and may be 
compensated using an idea from phonological theory: a dynamic 
Bayesian network called the Articulatory Feature Model, in 
which the phonestate label of an HMM speech recognizer is 
replaced by a vector representation of the states of the lips, 
tongue, and glottis/velum.  The best word error rates are 
achieved by combining the outputs of the Articulatory Feature 
Model and of an audio-visual coupled HMM.

○講演内容(和訳-抄訳-):
  画像を伴った音声の認識(AVSR)は、雑音、発音の変動といった
従来の音声の研究に新しい視点を与えている。ここでは、AVSRにお
ける、音響ノイズ、画像ノイズ、発話変動への対処についての研究
成果を紹介する。画像ノイズとして、照明の変化、インターレース、
カメラ位置の変動、障害物、皮膚の色合いや顔の特徴の変化などを
考慮する。これらの画像ノイズは情報の冗長性を利用して対処可能
である。音響ノイズとしては、風の音、反響などがあり、ビーム
フォーミングなどで対処する。さらに、発音の変化への対処につい
ても言及する。調音器官に着目したHMMの状態表現について言及する。
以上を統合する事で、認識性能が向上する。