IEEE Signal Processing Society Japan Chapter 会員各位

                           IEEE Signal Processing Society Japan Chapter
                                    Chair       貴家 仁志 (首都大学東京)
                                    Vice Chair  守谷 健弘 (NTT)

IEEE Signal Processing Society Japan Chapterの主催で、
3月12日(火)に東京大学で下記の講演会を開催します。
IEEE会員の方はもちろん、会員でない方の参加も可能です。
事前申込み不要、参加無料です。
多数の方にご参加頂きますよう、ご案内いたします。

                        記

        IEEE SPS Japan Chapter 講演会

(1)Date and Place (日時・場所)
3月12日(火) 13:00-15:00 
東京大学 工学部11号館 講堂
〒113-8656 東京都文京区本郷7-3-1


(2)Program (プログラム)

■13:00〜13:30

・Speaker (講演者):
Dr. Emmanuel Vincent
French National Institute for Research in Computer Science and Control (INRIA)

・Talk Title (講演題目):
How to integrate audio source separation and classification?

・Talk Abstract (講演概要):
We consider the problem of audio classification in a wide sense:
speech recognition, speaker/singer identification, acoustic event
detection, etc. In the real world, the target signal is often
superimposed with other signals (noise, accompaniment, etc). Although
source separation can enhance the target signal, it also introduces
distortions which adversely affect the classification accuracy. The
general framework of "uncertainty propagation" aims to address this
issue by estimating the posterior distribution of the source signals
and propagating it through the subsequent feature extraction and
classification stages.

We will introduce two novel contributions in this framework:
- a method to estimate the posterior distribution of the sources based
- on a variational Bayesian algorithm
- a method to train mixture of Gaussian-based or hidden Markov
- model-based classifiers directly from noisy data

We will present applications of these methods to speaker
identification in a noisy domestic environment and to singer
identification in polyphonic music.

・Speaker Biography (講演者紹介)

Emmanuel Vincent is a Research Scientist with the French National
Institute for Research in Computer Science and Control (Inria). He
received the Ph.D. degree in music signal processing from IRCAM in
2004 and worked as a Research Assistant with the Centre for Digital
Music at Queen Mary, University of London from 2004 to 2006. His
research focuses on probabilistic machine learning for speech and
audio signal processing, with application to real-world audio source
localization and separation, noise-robust speech recognition and music
information retrieval. He is the founding chair of the SiSEC and CHiME
challenge series.


■13:30〜14:00

・講演者:
守谷 健弘 博士
日本電信電話株式会社 NTTコミュニケーション科学基礎研究所

・講演題目:
“LSPパラメータ”のIEEE Milestoneへの提案について

・講演概要:
世界の携帯電話用の音声符号化で使われているLSP
(線スペクトル対)パラメータをIEEEの歴史的記念技術として
日本から提案したところであるが、その経緯などを紹介する。

・講演者紹介:
1980年東大工学部修士卒。同年NTT研究所入所、以来音声音響信号の
符号化の研究、標準化に従事。現在守谷特別研究室室長

■14:00〜15:00

・Speaker (講演者):
Dr. Frank Soong
Microsoft Research Asia (MSRA)

・Talk Title (講演題目):
Search for the “Elementary Particles” in Human Speech
– Can we render a monolingual speaker’s speech in a different language? 

・Talk Abstract (講演概要):
In this talk, we will raise an interesting question:
Can we find the “elementary particles” of a person’s speech in one
language and use them for rnedering his/her voice in a different language?
A positive “yes” answer and the found “elementary particles” can then have
many useful applications, e.g. mixed code TTS, language learning,
speech-to-speech translation, etc.
We try to answer the question by limiting ourselves first to how to
train a TTS in a different language with speech collected from a
monolingual speaker. Additionally, a speech corpus in the targeted new
language is recorded by a reference speaker. We then use our “trajectory
tiling algorithm,” invented for synthesizing high quality, unit selection
TTS, to “tile” the trajectories of  all sentences in the reference
speaker’s corpus with the most appropriate speech segments in the
monolingual speaker’s data. To make the tiling proper across two different
(reference and monolingual) speakers, the difference between them needs to
be equalized with appropriate vocal tract length normalization,
e.g., a bilinear warping function or formant mapping. All tiled,
sentences are then used to train a new HMM-based TTS of the monolingual
speaker but in the reference speaker’s language. Different length units of
the ‘elementary particles” have been tried and a label-less frame length
(10 ms) segments have been found to yield the best TTS quality.
Some preliminary results also show that training a speech recognizer with
speech data of different languages tends to improve the ASR performance
in each individual language. Also, in addition to the fact that audio
“elementary particles” of human speech in different languages can be
discovered as frame-level speech segments, the mouth shapes of
a mono-lingual speaker have also been found adequate for rendering the
lips movement of talking heads in different languages.
Various demos will be shown to illustrate our findings.

・Speaker Biography (講演者紹介)
Frank K. Soong is a Principal Researcher, Speech Group, Microsoft Research
Asia (MSRA), Beijing, China, where he works on fundamental research on
speech and its practical applications. His professional research career
spans over 30 years, first with Bell Labs, US, then with ATR, Japan,
before joining MSRA in 2004. At Bell Labs, he worked on stochastic modeling
of speech signals, optimal decoder algorithm, speech analysis and coding,
speech and speaker recognition. He was responsible for developing the
recognition algorithm which was developed into voice-activated mobile phone
products rated by the Mobile Office Magazine (Apr. 1993) as the
“outstandingly the best”. He is a co-recipient of the Bell Labs President
Gold Award for developing the Bell Labs Automatic Speech Recognition
(BLASR) software package. He has served as a member of the Speech and
Language Technical Committee, IEEE Signal Processing Society and other
society functions, including Associate Editor of the IEEE Speech and Audio
Transactions and chairing IEEE Workshop. He published extensively with more
than 200 papers and co-edited a widely used reference book, Automatic
Speech and Speech Recognition- Advanced Topics, Kluwer, 1996. He is a
visiting professor of the Chinese University of Hong Kong (CUHK) and a few
other top-rated universities in China. He is also the co-Director of the
MSRA-CUHK Joint Research Lab. He got his BS, MS and PhD from National
Taiwan Univ., Univ. of Rhode Island, and Stanford Univ, all in Electrical
Eng. He is an IEEE Fellow for contributions to digital processing of
speech.



なお、同日・同会場にて、15:30より、前Chairであられます、
東京大学大学院情報理工学系研究科の嵯峨山茂樹教授の
退職記念最終講義が予定されています。詳細は下記URLを
ご覧ください。
http://hil.t.u-tokyo.ac.jp/final-lecture/