IEEE Signal Processing Society Japan Chapter 会員各位 IEEE Signal Processing Society Japan Chapter Chair 貴家 仁志 (首都大学東京) Vice Chair 守谷 健弘 (NTT) IEEE Signal Processing Society Japan Chapterの主催で、 3月12日(火)に東京大学で下記の講演会を開催します。 IEEE会員の方はもちろん、会員でない方の参加も可能です。 事前申込み不要、参加無料です。 多数の方にご参加頂きますよう、ご案内いたします。 記 IEEE SPS Japan Chapter 講演会 (1)Date and Place (日時・場所) 3月12日(火) 13:00-15:00 東京大学 工学部11号館 講堂 〒113-8656 東京都文京区本郷7-3-1 (2)Program (プログラム) ■13:00〜13:30 ・Speaker (講演者): Dr. Emmanuel Vincent French National Institute for Research in Computer Science and Control (INRIA) ・Talk Title (講演題目): How to integrate audio source separation and classification? ・Talk Abstract (講演概要): We consider the problem of audio classification in a wide sense: speech recognition, speaker/singer identification, acoustic event detection, etc. In the real world, the target signal is often superimposed with other signals (noise, accompaniment, etc). Although source separation can enhance the target signal, it also introduces distortions which adversely affect the classification accuracy. The general framework of "uncertainty propagation" aims to address this issue by estimating the posterior distribution of the source signals and propagating it through the subsequent feature extraction and classification stages. We will introduce two novel contributions in this framework: - a method to estimate the posterior distribution of the sources based - on a variational Bayesian algorithm - a method to train mixture of Gaussian-based or hidden Markov - model-based classifiers directly from noisy data We will present applications of these methods to speaker identification in a noisy domestic environment and to singer identification in polyphonic music. ・Speaker Biography (講演者紹介) Emmanuel Vincent is a Research Scientist with the French National Institute for Research in Computer Science and Control (Inria). He received the Ph.D. degree in music signal processing from IRCAM in 2004 and worked as a Research Assistant with the Centre for Digital Music at Queen Mary, University of London from 2004 to 2006. His research focuses on probabilistic machine learning for speech and audio signal processing, with application to real-world audio source localization and separation, noise-robust speech recognition and music information retrieval. He is the founding chair of the SiSEC and CHiME challenge series. ■13:30〜14:00 ・講演者: 守谷 健弘 博士 日本電信電話株式会社 NTTコミュニケーション科学基礎研究所 ・講演題目: “LSPパラメータ”のIEEE Milestoneへの提案について ・講演概要: 世界の携帯電話用の音声符号化で使われているLSP (線スペクトル対)パラメータをIEEEの歴史的記念技術として 日本から提案したところであるが、その経緯などを紹介する。 ・講演者紹介: 1980年東大工学部修士卒。同年NTT研究所入所、以来音声音響信号の 符号化の研究、標準化に従事。現在守谷特別研究室室長 ■14:00〜15:00 ・Speaker (講演者): Dr. Frank Soong Microsoft Research Asia (MSRA) ・Talk Title (講演題目): Search for the “Elementary Particles” in Human Speech – Can we render a monolingual speaker’s speech in a different language? ・Talk Abstract (講演概要): In this talk, we will raise an interesting question: Can we find the “elementary particles” of a person’s speech in one language and use them for rnedering his/her voice in a different language? A positive “yes” answer and the found “elementary particles” can then have many useful applications, e.g. mixed code TTS, language learning, speech-to-speech translation, etc. We try to answer the question by limiting ourselves first to how to train a TTS in a different language with speech collected from a monolingual speaker. Additionally, a speech corpus in the targeted new language is recorded by a reference speaker. We then use our “trajectory tiling algorithm,” invented for synthesizing high quality, unit selection TTS, to “tile” the trajectories of all sentences in the reference speaker’s corpus with the most appropriate speech segments in the monolingual speaker’s data. To make the tiling proper across two different (reference and monolingual) speakers, the difference between them needs to be equalized with appropriate vocal tract length normalization, e.g., a bilinear warping function or formant mapping. All tiled, sentences are then used to train a new HMM-based TTS of the monolingual speaker but in the reference speaker’s language. Different length units of the ‘elementary particles” have been tried and a label-less frame length (10 ms) segments have been found to yield the best TTS quality. Some preliminary results also show that training a speech recognizer with speech data of different languages tends to improve the ASR performance in each individual language. Also, in addition to the fact that audio “elementary particles” of human speech in different languages can be discovered as frame-level speech segments, the mouth shapes of a mono-lingual speaker have also been found adequate for rendering the lips movement of talking heads in different languages. Various demos will be shown to illustrate our findings. ・Speaker Biography (講演者紹介) Frank K. Soong is a Principal Researcher, Speech Group, Microsoft Research Asia (MSRA), Beijing, China, where he works on fundamental research on speech and its practical applications. His professional research career spans over 30 years, first with Bell Labs, US, then with ATR, Japan, before joining MSRA in 2004. At Bell Labs, he worked on stochastic modeling of speech signals, optimal decoder algorithm, speech analysis and coding, speech and speaker recognition. He was responsible for developing the recognition algorithm which was developed into voice-activated mobile phone products rated by the Mobile Office Magazine (Apr. 1993) as the “outstandingly the best”. He is a co-recipient of the Bell Labs President Gold Award for developing the Bell Labs Automatic Speech Recognition (BLASR) software package. He has served as a member of the Speech and Language Technical Committee, IEEE Signal Processing Society and other society functions, including Associate Editor of the IEEE Speech and Audio Transactions and chairing IEEE Workshop. He published extensively with more than 200 papers and co-edited a widely used reference book, Automatic Speech and Speech Recognition- Advanced Topics, Kluwer, 1996. He is a visiting professor of the Chinese University of Hong Kong (CUHK) and a few other top-rated universities in China. He is also the co-Director of the MSRA-CUHK Joint Research Lab. He got his BS, MS and PhD from National Taiwan Univ., Univ. of Rhode Island, and Stanford Univ, all in Electrical Eng. He is an IEEE Fellow for contributions to digital processing of speech. なお、同日・同会場にて、15:30より、前Chairであられます、 東京大学大学院情報理工学系研究科の嵯峨山茂樹教授の 退職記念最終講義が予定されています。詳細は下記URLを ご覧ください。 http://hil.t.u-tokyo.ac.jp/final-lecture/