I currently hold a joint appointment at the Harvard Medical School as an Instructor, and at the affiliated Boston Children's Hospital as a Associate Scientific Researcher.
Over the past decade, I have made extensive contributions to computational semantics – the cornerstone of next generation language understanding systems. I have authored articles in journals such as Machine Learning and Computational Linguistics, and papers in refereed, top-tier conferences, and a few book chapters. As part of my PhD work, I published ground-breaking research in machine learning of the predicate argument structure of a sentence, now commonly known as semantic role labeling. This is the next step in meaning representation after syntactic analysis. A product of this work, ASSERT (Automatic Statistical SEmantic Role Tagger), has been downloaded and cited by hundreds of users in tens of countries.
Corpora annotated with linguistic information provide the foundation for learning algorithms tackling language understanding. These are also very expensive to build. After earning my PhD, I worked at BBN Technologies leading the creation of OntoNotes: the largest text corpus hand annotated with multiple layers of syntactic, semantic and discourse information across six genres in English, Chinese and Arabic. This work was a collaborative endeavor between BBN Technologies, University of Southern California, University of Pennsylvania, University of Colorado and Brandeis University. I designed an integrated relational representation that provides seamless access to the annotations.
I have fostered research in the community through the organization of evaluations such as ones in SemEval-2007, Computational Natural Language Learning (CoNLL) Shared Tasks in 2011 and 2012, and am currently organizing the SemEval-2014 shared task of Analysis of Clinical Text.
The field of annotation science is slowly emerging. To encourage research in this field, I, along with some other researchers, formed a Special Interest Group on ANNotation (SIGANN) within the Association of Computational Linguistics (ACL) and have held Linguistic Annotation Workshops (LAW) over the past six years.
I have been invited to participate in various NSF and DARPA funded workshops, to serve on the guest editorial board of journals, and I continue to be a reviewer for international journals such as Computational Linguistics, Language Resources and Evaluation, Transactions on Asian Language Information Processing, Artificial Intelligence Journal, Journal of Artificial Intelligence Research, etc. I also regularly serve on the program committees of core conferences such as ACL, HLT/NAACL, COLING, CoNLL, EMNLP, and AAAI.
In summary, over the past decade, I have progressed from being a machine learning researcher building algorithms to help computers imitate human language to being the builder of one of the largest annotated corpus in this area, and a leader in establishing a forum for continued development of an annotation science and organizing shared tasks on important linguistic phenomena. This in turn will inspire future machine learning algorithms thereby fueling the machinery for the evolution of language understanding systems.