Wednesday, November 13, 2019
CGIS Knafel Building (K354) - 12-1:30 pm
Abstract: The year 2020 will be a busy one for statisticians and more generally for data scientists; predictions about the 2020 US election are already underway. Will the lessons from the 2016 US election be learned, or will the prediction failure be repeated? How do we measure the quality of the data we rely upon for predictions? How small are our big data when we take their quality into account? The US Census Bureau has announced that the data from the 2020 Census will be released under differential privacy protection, which – in layperson’s terms – means adding some noise to the data in order to prevent re-identification of individuals and other privacy-related threats. Few would argue against protecting data privacy, but what trade-offs would be acceptable between data privacy and data utility? How much information do we lose by making data differentially private? How should we analyze differential privacy protected data? This talk invites the audience on a journey of deep statistical thinking prompted by these questions, regardless of whether or not they have any interest in the US politics and census.