Presentations

Michael Baiocchi presents "When black box algorithms are (not) appropriate: a principled prediction-problem ontology", at Zoom: https://harvard.zoom.us/j/99424949004?pwd=aWtPNFM3ZzFYbWxIMXNoZDlyUElVZz09, Wednesday, September 30, 2020

In the 1980s a new, extraordinarily productive way of reasoning about algorithms emerged. Though this type of reasoning has come to dominate areas of data science, it has been under-discussed and its impact under-appreciated. For example, it is the primary way we reason about "black box'' algorithms. In this talk we discuss its current use (i.e., as "the common task framework'') and its limitations; we find a large class of prediction-problems are inappropriate for this type of reasoning. Further, we find the common task framework does not provide a foundation for the deployment of an...

Read more about Michael Baiocchi presents "When black box algorithms are (not) appropriate: a principled prediction-problem ontology"
Reagan Moze presents "Recent Adventures in Causal(ish) Inference with Text as Data.", at Zoom: https://harvard.zoom.us/j/99424949004?pwd=aWtPNFM3ZzFYbWxIMXNoZDlyUElVZz09, Wednesday, September 23, 2020
Text data have a long history in social science and education research. However, these data are notoriously high-dimensional and characterized by many nuances of language that lack plausible statistical models. As a result, analysis of text data typically involves intensive human coding tasks where particular constructs or features of the text are first defined, and then a collection of documents are inspected and coded for the presence or absence of these constructs. While this process may be feasible in studies with smaller sample sizes, the time and resources required to train and employ... Read more about Reagan Moze presents "Recent Adventures in Causal(ish) Inference with Text as Data."
Connor Jerzak presents "Detecting and Characterizing Latent Influence Dynamics in Social Science Data Using Machine Learning", at Zoom: https://harvard.zoom.us/j/99424949004?pwd=aWtPNFM3ZzFYbWxIMXNoZDlyUElVZz09, Wednesday, September 16, 2020

Unobserved interactions between people and groups play a fundamental role in domestic and international politics. Yet, despite their importance, the vast complexity of these unobserved interactions has typically frustrated efforts to quantify them, forcing scholars to assume that the units in an analysis are independent or to study a limited range of interactions. Here, I develop a framework and machine learning model for detecting and characterizing unobserved interference dynamics using all available information: outcome, covariate, and independent variable data. Given minimal...

Read more about Connor Jerzak presents "Detecting and Characterizing Latent Influence Dynamics in Social Science Data Using Machine Learning"
Matthew Blackwell presents "Noncompliance and instrumental variables for 2^K factorial experiments", at Zoom: https://harvard.zoom.us/j/99424949004?pwd=aWtPNFM3ZzFYbWxIMXNoZDlyUElVZz09, Wednesday, September 9, 2020

Factorial experiments are widely used to assess the marginal, joint, and interactive effects of multiple concurrent factors. While a robust literature covers the design and analysis of these experiments, there is less work on how to handle treatment noncompliance in this setting. To fill this gap, we introduce a new methodology that uses the potential outcomes framework for analyzing 2^K factorial experiments with noncompliance on any number of factors. This framework builds on and extends the literature on both instrumental variables and factorial experiments in several ways. First, we...

Read more about Matthew Blackwell presents "Noncompliance and instrumental variables for 2^K factorial experiments"
Adam Kapelner presents "Harmonizing Optimized Designs with Classic Randomization in Experiments", at CGIS Knafel Building (K354) - 12-1:30 pm, Wednesday, February 12, 2020
Abstract: There is a long debate in experimental design between the classic randomization design of Fisher, Yates, Kempthorne, Cochran, and those who advocate deterministic assignments based on notions of optimality. In nonsequential trials comparing treatment and control, covariate measurements for each subject are known in advance, and subjects can be divided into two groups based on a criterion of imbalance. With the advent of modern computing, this partition can be made nearly perfectly balanced via numerical optimization, but these... Read more about Adam Kapelner presents "Harmonizing Optimized Designs with Classic Randomization in Experiments"
Gary King presents "Statistically Valid Inferences from Privacy Protected Data", at CGIS Knafel Building (K354) - 12-1:30 pm, Wednesday, February 5, 2020
Abstract: Unprecedented quantities of data that could help social scientists understand and ameliorate the challenges of human society are presently locked away inside companies, governments, and other organizations, in part because of worries about privacy violations. We address this problem with a general-purpose data access and analysis system with mathematical guarantees of privacy for individuals who may be represented in the data, statistical guarantees for researchers...
Read more about Gary King presents "Statistically Valid Inferences from Privacy Protected Data"
Lucas Janson presents "Recent Advances in Model-X Knockoffs", at CGIS Knafel Building (K354) - 12-1:30 pm, Wednesday, November 20, 2019

Abstract: Two years ago in this workshop I presented my work on model-X knockoffs, a method for high-dimensional variable selection that provides exact (finite-sample) control of false discoveries and high power as a result of its flexibility to leverage any and all domain knowledge and tools from machine learning to search for signal. In this talk, I will discuss two recent works that significantly advance the usability and generality of model-X knockoffs. First, I will show how the original assumptions of model-X knockoffs, that the multivariate distribution of the...

Read more about Lucas Janson presents "Recent Advances in Model-X Knockoffs"
Xiao-Li Meng presents "2020 Election and Privacy Protected Census: Data Quantity vs. Quality & Privacy vs. Utility", at CGIS Knafel Building (K354) - 12-1:30 pm, Wednesday, November 13, 2019
Abstract: The year 2020 will be a busy one for statisticians and more generally for data scientists; predictions about the 2020 US election are already underway. Will the lessons from the 2016 US election be learned, or will the prediction failure be repeated? How do we measure the quality of the data we rely upon for predictions? How small are our big data when we take their quality into account?  The US Census Bureau has announced that the data from the 2020 Census will be released under differential privacy protection, which – in layperson’s terms – means adding some... Read more about Xiao-Li Meng presents "2020 Election and Privacy Protected Census: Data Quantity vs. Quality & Privacy vs. Utility"
Nicole Pashley presents "Causal Inference for Multiple Non-Randomized Treatments using Fractional Factorial Designs", at CGIS Knafel Building (K354) - 12-1:30 pm, Wednesday, November 6, 2019
Abstract: We explore a framework for addressing causal questions in an observational setting with multiple treatments. This setting involves attempting to approximate an experiment from observational data. With multiple treatments, this experiment would be a factorial design. However, certain treatment combinations may be so rare that, for some combinations, we have no measured outcomes in the observed data. We propose to conceptualize a hypothetical fractional factorial experiment instead of a full factorial experiment and lay out a framework for analysis in this setting. We also... Read more about Nicole Pashley presents "Causal Inference for Multiple Non-Randomized Treatments using Fractional Factorial Designs"

Pages