Creating and sharing knowledge for telecommunications

SEMINAR SERIES on ICTs POLICY RESEARCH


on 01-07-2008

... Speaker: Noah A. Smith
Recent Developments with Weighted Grammars in Statistical Natural Language Processing
Abstract:
This talk introduces statistical natural language processing (NLP), an exciting field that marries machine learning to computational linguistics. After a basic introduction to NLP and some of the challenging problems, I will discuss some recent work by my research group on weighted grammars: learning weighted grammars efficiently from
annotated corpora, applying them to problems like question answering and translation, and ©\ perhaps surprisingly ©\ learning them from unannotated corpora using unsupervised methods.
Bio:
Noah Smith is an assistant professor at Carnegie Mellon University. His research has spanned statistical machine translation, parallel corpus discovery, unsupervised statistical grammar induction, efficient morphological and syntactic processing algorithms, weighted logic programming, and the formal study of weighted grammars. He is a Hertz Fellow (2001©\6), the recipient of an IBM Faculty Award (2007), and a member of the DARPA Computer Science Study Panel (2007).
Tuesday, July 1st 2008, 10:00 am
Torre Norte, EA3, Instituto Superior T¨¦cnico More Information..

SEMINAR SERIES on ICTs POLICY RESEARCH


on 01-07-2008

... Speaker: Andr¨¦ T. Martins
Nonextensive entropic kernels
Abstract:
Positive definite kernels on probability measures have been recently applied in structured data classification problems. Some of these kernels are related to classic information theoretic quantities, such as mutual information and the Jensen©\Shannon divergence. Meanwhile, driven by recent advances in Tsallis statistics, nonextensive generalizations of Shannon's information theory have been proposed. This paper bridges these two trends. We introduce the Jensen©\Tsallis q©\difference, a generalization of the Jensen©\Shannon divergence. We then
define a new family of nonextensive mutual information kernels, which allow weights to be assigned to their arguments, and which includes the Boolean, Jensen©\Shannon, and linear kernels as particular cases. We illustrate the performance of these kernels on text categorization tasks.
Bio:
Andr¨¦ Martins is a PhD student at IST/UTL and SCS/CMU. He is enrolled in the dual CMU©\Portugal PhD program in Language Technologies, under supervision of Mario Figueiredo, Pedro Aguiar (from IST/UTL), Noah Smith and Eric Xing (from CS/CMU). His area of research is "Kernel methods for Natural Language Processing".
Tuesday, July 1st 2008, 14:00 pm
Torre Norte, EA3, Instituto Superior T¨¦cnico More Information..