Sameer Pradhan

Sameer Pradhan is an international expert in the field of computational semantics—A sub-field of natural language processing (NLP) that focuses on enriching data with semantics. His research focuses on creating corpora as well as machine learning algorithms, models and tools to convert unstructured information (text and speech) into searchable meaning representations.

Manually annotated corpora, among other resources, will continue to play a vital role in building next generation language understanding systems. He played a central role in creation of the OntoNotes corpus—the largest text corpus, freely available for research, manually annotated with multiple layers of syntactic, semantic and discourse information across six genres across three languages—English, Chinese and Arabic. He founded cemantix.org as a conduit to support and promote open, repeatable and replicable, research. Organized multiple international evaluations—CoNLL 2011, 2012, 2015, 2016; SemEval 2007, 2014, 2015— on various domain independent language understanding tasks, and tasks specific to medical informatics (while at the Harvard Medical School). A result of this was the standardization of evaluation metrics for coreference resolution.

He has authored more than 40 articles in top tier machine learning, computational linguistic and medical informatics journals, conferences and edited volumes. He is a founding member of the Special Interest Group on ANNotation (SIGANN) group within the Association of Computational Linguistics (ACL). He regularly serves on program committees and chairs major NLP conferences such as ACL, HLT/NAACL, COLING, CoNLL, EMNLP, AAAI. He has served on the PhD committees of students at Brandeis University, University of Colorado and Boston University.

With almost two decades of industry research experience, he has recently stepped in the position of an Assistant Research Director at the Linguistic Data Consortium (LDC) at the University of Pennsylvania and continues to be active in the computational linguistics community (while wearing the cemantix hat)