Classification is often used as a sensemaking practice in the fields of cultural analytics and computational social science. In this talk, I'll survey the range of functions by which NLP intersects with this form of measurement, and assess where large language models fit into this landscape by focusing on three broad topics: empirically assessing the performance of LLMs compared to traditional supervised methods on a new assembly of ten tasks relevant to cultural analytics, exploring the ways in which LLMs can be employed for sensemaking goals beyond mere accuracy, and measuring the threats to validity of such uses that arise from pretraining memorization.
Speaker
Prof. David Bamman,
School of Information,
UC Berkeley
David Bamman is an associate professor in the School of Information at UC Berkeley, where he works in the areas of natural language processing and cultural analytics, applying NLP and machine learning to empirical questions in the humanities and social sciences. His research focuses on improving the performance of NLP for underserved domains like literature (including LitBank and BookNLP) and exploring the affordances of empirical methods for the study of literature and culture. Before Berkeley, he received his PhD in the School of Computer Science at Carnegie Mellon University and was a senior researcher at the Perseus Project of Tufts University. Bamman's work is supported by the National Endowment for the Humanities, National Science Foundation, an Amazon Research Award, and an NSF CAREER award.
E-mail: rihs@cuhk.edu.hk
Tel.: 3943 4786