After the Dataverse Community Meeting, a workshop focused on tools helping open repositories currently addressing sensitive or private data will be held at Harvard Medical School. This workshop is a collaboration between the Dataverse team and the Privacy Tools for Sharing Research Data group.
July 13th
Room: Cannon Room, Building C, Harvard Medical School
10:30 - 10:45am Registration and Coffee
10:45 - 11:00am Welcome
Salil Vadhan, Professor of Computer Science and Applied Mathematics, Harvard University
11am - 12:30pm Panel Discussion on Privacy Risks
Moderated by Salil Vadhan, Professor of Computer Science and Applied Mathematics, Harvard University
George Church, Professor of Genetics at Harvard Medical School and Director of PersonalGenomes.org.
George Church is Professor of Genetics at Harvard Medical School and Director of PersonalGenomes.org, which provides the world's only open-access information on human Genomic, Environmental & Trait data (GET). His 1984 Harvard PhD included the first methods for direct genome sequencing, molecular multiplexing & barcoding. These led to the first genome sequence (pathogen, Helicobacter pylori) in 1994 . His innovations have contributed to nearly all "next generation" DNA sequencing methods and companies (CGI-BGI, Life, Illumina, Nanopore). This plus his lab's work on chip-DNA-synthesis, gene editing and stem cell engineering resulted in founding additional application-based companies spanning fields of medical diagnostics ( Knome, Alacris, AbVitro, Pathogenica, Veritas Genetics ) & synthetic biology / therapeutics ( Joule, Gen9, Editas, Egenesis, enEvolv, WarpDrive ). He has also pioneered new privacy, biosafety , environmental & biosecurity policies. He is director of an IARPA BRAIN Project and NIH Center for Excellence in Genomic Science. His honors include election to NAS & NAE & Franklin Bower Laureate for Achievement in Science. He has coauthored 400 papers, 74 patent publications & one book (Regenesis).
Kobbi Nissim, Center for Research on Computation and Society, Harvard University.
Kobbi Nissim is a Professor of Computer Science at Ben-Gurion University and a Senior Research Fellow at the Center for Research on Computation and Society at Harvard. Kobbi's current work is focused on the mathematical formulation and understanding of privacy. His work from 2003 and 2004 with Dinur and Dwork initiated rigorous foundational research of privacy and presented a precursor of Differential Privacy, a strong definition of privacy in computation that he introduced in 2006 with Dwork, McSherry and Smith. Since 2011, Kobbi has contributed as a senior researcher to the Privacy Tools for Sharing Research Data project at Harvard. In 2013, Nissim received with Irit Dinur the Alberto O. Mendelzon Test-of-Time award for their PODS 2003 work initiating the rigorous study of privacy. In 2016 he received with Cynthia Dwork, Frank McSherry, and Adam Smith the TCC Test-of-time award for their TCC 2006 work presenting differential privacy.
Danny Weitzner, Principal Research Scientist, CSAIL, MIT
Daniel J. Weitzner is the Founding Director of the MIT Internet Policy Research Initiative and Principal Research Scientist at the MIT Computer Science and Artificial Intelligence Lab. His group studies the relationship between network architecture and public policy, and develops new Web architectures to meet policy challenges such as privacy and intellectual property rights. He teaches Internet public policy in the MIT Electrical Engineering and Computer Science Department. From 2011-2012, Weitzner was the United States Deputy Chief Technology Officer for Internet Policy in the White House, where he lead initiatives on online privacy, cybersecurity, Internet copyright, and trade policies to promote the free flow of information. Weitzner has been a leader in the development of Internet public policy from its inception, making fundamental contributions to the successful fight for strong online free expression protection in the United States Supreme Court, crafting laws that provide protection against government surveillance of email and web browsing data.
Danny Goroff, VP and Program Director, Alfred P. Sloan Foundation.
Daniel L. Goroff is Vice President and Program Director at the Alfred P. Sloan Foundation, a private philanthropy organization that supports breakthroughs in science, technology, and economics. He is especially interested in economics, finance, mathematics, the scientific and technical work force, and education. Goroff is Professor Emeritus of Mathematics and Economics at Harvey Mudd College, where he served as Vice President for Academic Affairs and Dean of the Faculty. Before that, he was a faculty member at Harvard University for over two-dozen years. Daniel Goroff has twice worked for the President's Science Advisor in the White House Office of Science and Technology Policy, most recently as Assistant Director for Social, Behavioral, and Economic Sciences.
12:30 - 1:30pm Lunch break
1:30 - 3:30pm Solutions: Software Tools for Managing Privacy
Moderated by Mercè Crosas, Chief Data Science and Technology Officer, IQSS, Harvard University
Data Handling Policies: Spaces, DataTags, and the Tags Language
Michael Bar-Sinai
Widespread sharing of scientific datasets holds great promise for new scientific discoveries and great risks for personal privacy. Dataset handling policies play the critical role of balancing privacy risks and scientific value. We propose an extensible, formal, model for dataset handling policies, and some tools based on it. The presented model describes policies in a machine-executable, human-readable way. The model supports binary operators for composing and comparing data handling policies.
We further present the Tags programming language and toolset, created for working with the proposed model. Tags allows users to describe data handing policies. Based on this description, users can use it to compose interactive, user-friendly questionnaires. Once ready, and given a specific dataset, these questionnaires can aid researchers to arrive at a data handling policy that follows legal and technical guidelines. Currently, creating such a policy is a manual process requiring access to legal and technical experts. Some of Tags’ tools, such as a web-based interview system, visualizers, development environment, and questionnaire inspectors, will also be discussed.
This is in collaboration with Latanya Sweeney and Mercè Crosas. DataTags is being integrated with Dataverse to provide a DataTags-compliant data repository to share sensitive data with confidence.
Robot Lawyers: Automating Legal Compliance for Transferring Private Data
Steve Chong, Associate Professor of Computer Science, Harvard University
The legal and regulatory requirements on whether a repository can accept and subsequently release data are complex and often confusing. These complexities are reflected in restrictions on what data a repository may accept, under what conditions data may be retained, and on how a recipient may use released data.
There is no common set of licensing agreements that is shared across institutions and laws. Instead, current methods to create data-use agreements typically require significant efforts by a lawyer for each data transfer. The result is that different institutions duplicate effort in developing variations of agreements for identical uses. In addition, because most institutions do not have expertise in privacy law, the resulting licenses are often too generic and fail to accurately capture either the necessary restrictions on data use or the necessary protections.
Our approach addresses this problem by formalizing aspects of privacy laws, and using this formalization to automate actions within repository systems and automatically generate licenses. This enables appropriate decisions and accurate licenses, while removing the bottleneck of lawyer effort per data transfer. Our system will enable legal professionals to evaluate the legal reasoning and interpretation embodied in the formalization, and the specific rationale for a decision to accept or release a particular dataset.
This is joint work with Micah Altman and Alexandra Wood.
James Honaker, Senior Research Scientist, Harvard University
We provide an overview of PSI (Ψ): “A Private data Sharing Interface”, a system developed by the Privacy Tools research group to enable researchers in the social sciences and other fields to share and explore privacy-sensitive datasets with the strong privacy protections of differential privacy. This prototype system will allow researchers with sensitive datasets to make differentially private statistics about their data available through data repositories using the Dataverse platform. Our prototype system will allow researchers to: [1] upload private data to a secured Dataverse archive, [2] decide what statistics they would like to release about that data, and [3] release privacy preserving versions of those statistics to the repository, [4] that can be explored through a curator interface without releasing the raw data, including [5] interactive queries. This system was created by the Privacy Tools for Sharing Research Data project (http://privacytools.seas.harvard.edu). Differential privacy is a mathematical framework for enabling statistical analysis of sensitive datasets while ensuring that individual-level information cannot be leaked. The project website contains resources for learning more about differential privacy. Information on our tool can be found at http://privacytools.seas.harvard.edu/psi
Web-based Multi-Party Computation with Application to Anonymous Aggregate Compensation Analytics
Azer Bestavros, Professor of Computer Science, Boston University
In this talk, I will describe the definition, design, implementation, and deployment of a simple multi-party computation protocol and supporting web-based infrastructure. The protocol and infrastructure constitute a software application that allows groups of cooperating parties, such as companies or other organizations, to perform computation over their respective data assets for the purpose of obtaining aggregate analytics without the need to communicate or share their private data sets. The application was developed specifically to support a Boston Women’s Workforce Council (BWWC) study of the gender wage gap among employers within the Greater Boston Area. The application was deployed successfully to collect aggregate statistical data pertaining to compensation levels across genders and demographics at a number of participating organizations for two years in a row (2015 & 2016). Time permitting, I will summarize our experience with the rollout of this application, lessons learned, and future plans to extend the capabilities we developed by incorporating them into a programming environment that leverages popular cloud platforms for computing analytics over large private data sets.
This work was done at Boston University in collaboration with Andrei Lapets, Nikolaj Volgushev, Mayank Varia, Eric Dunton, Kyle Holzinger, and Frederick Jansen. It was funded in part by NSF awards #1414119 and #1430145 and by the Initiative on Cities at Boston University.
A Software Architecture Proposal to facilitate Privacy Auditing in a confidential Data Enclave
Cavan Capps, Big Data Lead, United States Census Bureau
Public use datasets protected by formal privacy techniques are critical for public policy transparency. However, there is an increasing need to do research on confidential big data. The privacy concerns have been an obstacle to facilitating research. This proposal attempts to address provide a software framework for doing research in an controlled data enclave environment that may also provide a potential facility for independent privacy audits.
3:30 - 4:00pm Coffee Break
4:00 - 5:00pm Panel Discussion on Managing Sensitive Data Use Cases
Moderated by David O'Brien, Senior Researcher at the Berkman Klein Center for Internet & Society at Harvard University
Barbara Brierer, MD., Faculty Co-Director, Multi-Regional Clinical Trials (MRCT) Center of Harvard and Brigham and Women’s Hospital
Director, Regulatory Foundations, Ethics and the Law Program, Harvard Catalyst, Harvard Medical School
Professor of Medicine, Harvard Medical School, Senior Physician, Division of Global Health Equity, Department of Medicine, Brigham and Women's Hospital
Dr. Barbara Bierer is Professor of Medicine, Harvard Medical School and Brigham and Women’s Hospital (BWH), Boston and a hematologist/oncologist. She is the faculty co-director of the Multi-Regional Clinical Trials Center at BWH and Harvard (MRCT Center), a study center whose purpose is to improve the ethical, logistical and regulatory aspects of international clinical trials, with a special focus on the emerging economies, and the Director of the Regulatory Foundations, Ethics and the Law Program of the Harvard clinical and translational sciences center. Previously she served as senior vice president, research at the Brigham and Women’s Hospital for 11 years, and as chair of the Secretary’s Advisory Committee on Human Research Protections, HHS.
Kaye Marz, Research Associate II, National Addiction and HIV Data Archive Program (NAHDAP), Inter-university Consortium for Political and Social Research
Kaye Marz is Archive Manager for NAHDAP, with responsibilities including acquisitions outreach, support for depositors, data processing plans and implementation, technical outreach, and user support.
Ms. Marz has been with ICPSR since 1991. Prior to joining NAHDAP, she was a processing supervisor for the National Archive of Criminal Justice Data (NACJD), primarily for data sponsored by the U.S. Department of Justice, National Institute of Justice. She has supervised the processing and release of data on the relationship of alcohol and other drugs and crime, domestic violence, youth and crime, victimization and victim services, corrections and prisoner reentry, policing, crime prevention, terrorism, and crime mapping and geographic information systems resources.
Ms. Marz holds a M.S. in Criminal Justice from the Michigan State University and a B.A. in Psychology from the University of Michigan, Ann Arbor. Her research interest include intimate partner violence as well the impact of exposure to violence during adolescence.
Simson Garfinkel, Senior Advisor, Information Access Division, National Institute of Standards and Technology (NIST)
5:00 - 6:00pm Panel Discussion on International Data Flows
Moderated by Deborah Hurley
Deborah Hurley
Deborah Hurley is a Fellow of the Institute for Quantitative Social Science, Harvard University. She is also Adjunct Professor of the Practice of Computer Science, Department of Computer Science, Brown University; Senior ICT Expert, Pacific Region Infrastructure Facility, Sydney, Australia; and Principal of the consulting firm she founded in 1996, which advises governments, international organizations, companies, civil society, and foundations on advanced science and technology policy. At the Organization for Economic Cooperation and Development, in Paris, France, she organized annual meetings on privacy, wrote the seminal report on information security, and launched the activities on cryptography policy. She served on boards and committees for the International Federation for Human Rights, U.S. Department of State, American Association for the Advancement of Science, National Academy of Sciences Research Council, and Electronic Privacy Information Center. Hurley received the Namur Award of the International Federation for Information Processing in recognition of outstanding contributions, with international impact, to awareness of social implications of information technology. http://scholar.harvard.edu/deborah_hurley.
Peter Doorn
Peter Doorn is director of Data Archiving and Networked Services (DANS), the Dutch national institute for long-term access to research data. He studied Human Geography at Utrecht University and received his PhD there. He taught Computing for Historians at Leiden University from 1985 to 1997. He was director of the Netherlands Historical Data Archive and head of department at the Netherlands Institute for Scientific Information Services (NIWI). He was Principal Investigator of the DARIAH preparation project (now an ESFRI ERIC) and vice-chair of CESSDA A.S. (Consortium of European Social Science Data Archive). He is (board) member of several other organisations in the area of humanities computing, data infrastructure and digital archiving, and editor of the newly founded Research Data Journal for the Humanities and Social Sciences.
David Heiner
David A. Heiner is Vice President & Deputy General Counsel at Microsoft Corporation, where he leads the law department's Regulatory Affairs team. The team is responsible for regulatory aspects of privacy, telecommunications, finance, accessibility, human rights, online safety and data analytics. Dave joined Microsoft in 1994, leading the law department's antitrust work until 2013. For several years he was also responsible for Microsoft's work with international standard-setting organizations. Dave is a 1982 graduate of Cornell University, where he received a B.A. in Physics, and a 1985 graduate of the University of Michigan Law School, where he served on the editorial board of the law review. Following law school, Dave clerked for the Honorable Thomas P. Griesa of the U.S. District Court in New York. Before joining Microsoft in 1994, Dave practiced at Sullivan & Cromwell in New York. Dave is Chairman of the Board of Probono.net, a national non-profit that works to increase access to justice for the poor through efficient use of technology. Dave has handled a number of immigration cases through Kids in Need of Defense, an advocacy group for unaccompanied immigrant children in the United States.
6:00pm Concluding Remarks and Reception