After the Dataverse Community Meeting, a workshop focused on tools helping open repositories currently addressing sensitive or private data will be held at Harvard Medical School. This workshop is a collaboration between the Dataverse team and the Privacy Tools for Sharing Research Data group.
Room: Cannon Room, Building C, Harvard Medical School
10:30 - 10:45am Registration and Coffee
10:45 - 11:00am Welcome Salil Vadhan, Professor of Computer Science and Applied Mathematics, Harvard University
11am - 12:30pm Panel Discussion on Privacy Risks Moderated by Salil Vadhan, Professor of Computer Science and Applied Mathematics, Harvard University
George Church is Professor of Genetics at Harvard Medical School and Director of PersonalGenomes.org, which provides the world's only open-access information on human Genomic, Environmental & Trait data (GET). His 1984 Harvard PhD included the first methods for direct genome sequencing, molecular multiplexing & barcoding. These led to the first genome sequence (pathogen, Helicobacter pylori) in 1994 . His innovations have contributed to nearly all "next generation" DNA sequencing methods and companies (CGI-BGI, Life, Illumina, Nanopore). This plus his lab's work on chip-DNA-synthesis, gene editing and stem cell engineering resulted in founding additional application-based companies spanning fields of medical diagnostics ( Knome, Alacris, AbVitro, Pathogenica, Veritas Genetics ) & synthetic biology / therapeutics ( Joule, Gen9, Editas, Egenesis, enEvolv, WarpDrive ). He has also pioneered new privacy, biosafety , environmental & biosecurity policies. He is director of an IARPA BRAIN Project and NIH Center for Excellence in Genomic Science. His honors include election to NAS & NAE & Franklin Bower Laureate for Achievement in Science. He has coauthored 400 papers, 74 patent publications & one book (Regenesis).
Kobbi Nissim, Center for Research on Computation and Society, Harvard University.
Kobbi Nissim is a Professor of Computer Science at Ben-Gurion University and a Senior Research Fellow at the Center for Research on Computation and Society at Harvard. Kobbi's current work is focused on the mathematical formulation and understanding of privacy. His work from 2003 and 2004 with Dinur and Dwork initiated rigorous foundational research of privacy and presented a precursor of Differential Privacy, a strong definition of privacy in computation that he introduced in 2006 with Dwork, McSherry and Smith. Since 2011, Kobbi has contributed as a senior researcher to the Privacy Tools for Sharing Research Data project at Harvard. In 2013, Nissim received with Irit Dinur the Alberto O. Mendelzon Test-of-Time award for their PODS 2003 work initiating the rigorous study of privacy. In 2016 he received with Cynthia Dwork, Frank McSherry, and Adam Smith the TCC Test-of-time award for their TCC 2006 work presenting differential privacy.
Daniel J. Weitzner is the Founding Director of the MIT Internet Policy Research Initiative and Principal Research Scientist at the MIT Computer Science and Artificial Intelligence Lab. His group studies the relationship between network architecture and public policy, and develops new Web architectures to meet policy challenges such as privacy and intellectual property rights. He teaches Internet public policy in the MIT Electrical Engineering and Computer Science Department. From 2011-2012, Weitzner was the United States Deputy Chief Technology Officer for Internet Policy in the White House, where he lead initiatives on online privacy, cybersecurity, Internet copyright, and trade policies to promote the free flow of information. Weitzner has been a leader in the development of Internet public policy from its inception, making fundamental contributions to the successful fight for strong online free expression protection in the United States Supreme Court, crafting laws that provide protection against government surveillance of email and web browsing data.
Danny Goroff, VP and Program Director, Alfred P. Sloan Foundation.
Daniel L. Goroff is Vice President and Program Director at the Alfred P. Sloan Foundation, a private philanthropy organization that supports breakthroughs in science, technology, and economics. He is especially interested in economics, finance, mathematics, the scientific and technical work force, and education. Goroff is Professor Emeritus of Mathematics and Economics at Harvey Mudd College, where he served as Vice President for Academic Affairs and Dean of the Faculty. Before that, he was a faculty member at Harvard University for over two-dozen years. Daniel Goroff has twice worked for the President's Science Advisor in the White House Office of Science and Technology Policy, most recently as Assistant Director for Social, Behavioral, and Economic Sciences.
12:30 - 1:30pm Lunch break
1:30 - 3:30pm Solutions: Software Tools for Managing Privacy Moderated by Mercè Crosas, Chief Data Science and Technology Officer, IQSS, Harvard University
Widespread sharing of scientific datasets holds great promise for new scientific discoveries and great risks for personal privacy. Dataset handling policies play the critical role of balancing privacy risks and scientific value. We propose an extensible, formal, model for dataset handling policies, and some tools based on it. The presented model describes policies in a machine-executable, human-readable way. The model supports binary operators for composing and comparing data handling policies. We further present the Tags programming language and toolset, created for working with the proposed model. Tags allows users to describe data handing policies. Based on this description, users can use it to compose interactive, user-friendly questionnaires. Once ready, and given a specific dataset, these questionnaires can aid researchers to arrive at a data handling policy that follows legal and technical guidelines. Currently, creating such a policy is a manual process requiring access to legal and technical experts. Some of Tags’ tools, such as a web-based interview system, visualizers, development environment, and questionnaire inspectors, will also be discussed.
This is in collaboration with Latanya Sweeney and Mercè Crosas. DataTags is being integrated with Dataverse to provide a DataTags-compliant data repository to share sensitive data with confidence.
Robot Lawyers: Automating Legal Compliance for Transferring Private Data Steve Chong, Associate Professor of Computer Science, Harvard University
The legal and regulatory requirements on whether a repository can accept and subsequently release data are complex and often confusing. These complexities are reflected in restrictions on what data a repository may accept, under what conditions data may be retained, and on how a recipient may use released data.
There is no common set of licensing agreements that is shared across institutions and laws. Instead, current methods to create data-use agreements typically require significant efforts by a lawyer for each data transfer. The result is that different institutions duplicate effort in developing variations of agreements for identical uses. In addition, because most institutions do not have expertise in privacy law, the resulting licenses are often too generic and fail to accurately capture either the necessary restrictions on data use or the necessary protections.
Our approach addresses this problem by formalizing aspects of privacy laws, and using this formalization to automate actions within repository systems and automatically generate licenses. This enables appropriate decisions and accurate licenses, while removing the bottleneck of lawyer effort per data transfer. Our system will enable legal professionals to evaluate the legal reasoning and interpretation embodied in the formalization, and the specific rationale for a decision to accept or release a particular dataset.
This is joint work with Micah Altman and Alexandra Wood.
PSI (Ψ): A Private data Sharing Interface James Honaker, Senior Research Scientist, Harvard University
We provide an overview of PSI (Ψ): “A Private data Sharing Interface”, a system developed by the Privacy Tools research group to enable researchers in the social sciences and other fields to share and explore privacy-sensitive datasets with the strong privacy protections of differential privacy. This prototype system will allow researchers with sensitive datasets to make differentially private statistics about their data available through data repositories using the Dataverse platform. Our prototype system will allow researchers to:  upload private data to a secured Dataverse archive,  decide what statistics they would like to release about that data, and  release privacy preserving versions of those statistics to the repository,  that can be explored through a curator interface without releasing the raw data, including  interactive queries. This system was created by the Privacy Tools for Sharing Research Data project (http://privacytools.seas.harvard.edu). Differential privacy is a mathematical framework for enabling statistical analysis of sensitive datasets while ensuring that individual-level information cannot be leaked. The project website contains resources for learning more about differential privacy. Information on our tool can be found at http://privacytools.seas.harvard.edu/psi
Web-based Multi-Party Computation with Application to Anonymous Aggregate Compensation Analytics Azer Bestavros, Professor of Computer Science, Boston University
In this talk, I will describe the definition, design, implementation, and deployment of a simple multi-party computation protocol and supporting web-based infrastructure. The protocol and infrastructure constitute a software application that allows groups of cooperating parties, such as companies or other organizations, to perform computation over their respective data assets for the purpose of obtaining aggregate analytics without the need to communicate or share their private data sets. The application was developed specifically to support a Boston Women’s Workforce Council (BWWC) study of the gender wage gap among employers within the Greater Boston Area. The application was deployed successfully to collect aggregate statistical data pertaining to compensation levels across genders and demographics at a number of participating organizations for two years in a row (2015 & 2016). Time permitting, I will summarize our experience with the rollout of this application, lessons learned, and future plans to extend the capabilities we developed by incorporating them into a programming environment that leverages popular cloud platforms for computing analytics over large private data sets.
This work was done at Boston University in collaboration with Andrei Lapets, Nikolaj Volgushev, Mayank Varia, Eric Dunton, Kyle Holzinger, and Frederick Jansen. It was funded in part by NSF awards #1414119 and #1430145 and by the Initiative on Cities at Boston University.
A Software Architecture Proposal to facilitate Privacy Auditing in a confidential Data Enclave Cavan Capps, Big Data Lead, United States Census Bureau
Public use datasets protected by formal privacy techniques are critical for public policy transparency. However, there is an increasing need to do research on confidential big data. The privacy concerns have been an obstacle to facilitating research. This proposal attempts to address provide a software framework for doing research in an controlled data enclave environment that may also provide a potential facility for independent privacy audits.
3:30 - 4:00pm Coffee Break
4:00 - 5:00pm Panel Discussion on Managing Sensitive Data Use Cases Moderated by David O'Brien, Senior Researcher at the Berkman Klein Center for Internet & Society at Harvard University
Barbara Brierer, MD., Faculty Co-Director, Multi-Regional Clinical Trials (MRCT) Center of Harvard and Brigham and Women’s Hospital Director, Regulatory Foundations, Ethics and the Law Program, Harvard Catalyst, Harvard Medical School Professor of Medicine, Harvard Medical School, Senior Physician, Division of Global Health Equity, Department of Medicine, Brigham and Women's Hospital
Dr. Barbara Bierer is Professor of Medicine, Harvard Medical School and Brigham and Women’s Hospital (BWH), Boston and a hematologist/oncologist. She is the faculty co-director of the Multi-Regional Clinical Trials Center at BWH and Harvard (MRCT Center), a study center whose purpose is to improve the ethical, logistical and regulatory aspects of international clinical trials, with a special focus on the emerging economies, and the Director of the Regulatory Foundations, Ethics and the Law Program of the Harvard clinical and translational sciences center. Previously she served as senior vice president, research at the Brigham and Women’s Hospital for 11 years, and as chair of the Secretary’s Advisory Committee on Human Research Protections, HHS.
Kaye Marz, Research Associate II, National Addiction and HIV Data Archive Program (NAHDAP), Inter-university Consortium for Political and Social Research
Kaye Marz is Archive Manager for NAHDAP, with responsibilities including acquisitions outreach, support for depositors, data processing plans and implementation, technical outreach, and user support.
Ms. Marz has been with ICPSR since 1991. Prior to joining NAHDAP, she was a processing supervisor for the National Archive of Criminal Justice Data (NACJD), primarily for data sponsored by the U.S. Department of Justice, National Institute of Justice. She has supervised the processing and release of data on the relationship of alcohol and other drugs and crime, domestic violence, youth and crime, victimization and victim services, corrections and prisoner reentry, policing, crime prevention, terrorism, and crime mapping and geographic information systems resources.
Ms. Marz holds a M.S. in Criminal Justice from the Michigan State University and a B.A. in Psychology from the University of Michigan, Ann Arbor. Her research interest include intimate partner violence as well the impact of exposure to violence during adolescence.
Simson Garfinkel, Senior Advisor, Information Access Division, National Institute of Standards and Technology (NIST)
Simson L. Garfinkel is a Computer Scientist at the National Institute of Standards and Technology's Information Technology Laboratory. Garfinkel's research interests include big data, privacy, usability, social justice, and data fusion. He holds seven US patents and has published dozens of research articles for his work in computer security and digital forensics. He is an ACM Fellow, an IEEE Senior Member, as a member of the National Association of Science Writers.
Garfinkel is the author or co-author of fourteen books on computing. He is perhaps best known for his book Database Nation: The Death of Privacy in the 21st Century. His book Practical UNIX and Internet Security (co-authored with Gene Spafford and Alan Schwartz), has sold more than 250,000 copies and been translated into more than a dozen languages since the first edition was published in 1991.
Garfinkel is also a journalist and has written more than a thousand articles about science, technology, and technology policy in the popular press since 1983. He has won numerous national journalism awards, including the Jesse H. Neal National Business Journalism Award. Today he mostly writes for MIT's Technology Review Magazine and the technologyreview.com website.
As an entrepreneur, Garfinkel founded five companies between 1989 and 2000, including Vineyard.NET, which provided Internet service on Martha's Vineyard to more than a thousand customers from 1995 through 2005, and Sandstorm Enterprises, an early developer of computer forensic tools.
Garfinkel received three Bachelor of Science degrees from MIT in 1987, a Master's of Science in Journalism from Columbia University in 1988, and a Ph.D. in Computer Science from MIT in 2005.
5:00 - 6:00pm Panel Discussion on International Data Flows Moderated by Deborah Hurley
Deborah Hurley Deborah Hurley is a Fellow of the Institute for Quantitative Social Science, Harvard University. She is also Adjunct Professor of the Practice of Computer Science, Department of Computer Science, Brown University; Senior ICT Expert, Pacific Region Infrastructure Facility, Sydney, Australia; and Principal of the consulting firm she founded in 1996, which advises governments, international organizations, companies, civil society, and foundations on advanced science and technology policy. At the Organization for Economic Cooperation and Development, in Paris, France, she organized annual meetings on privacy, wrote the seminal report on information security, and launched the activities on cryptography policy. She served on boards and committees for the International Federation for Human Rights, U.S. Department of State, American Association for the Advancement of Science, National Academy of Sciences Research Council, and Electronic Privacy Information Center. Hurley received the Namur Award of the International Federation for Information Processing in recognition of outstanding contributions, with international impact, to awareness of social implications of information technology. http://scholar.harvard.edu/deborah_hurley.
Peter Doorn Peter Doorn is director of Data Archiving and Networked Services (DANS), the Dutch national institute for long-term access to research data. He studied Human Geography at Utrecht University and received his PhD there. He taught Computing for Historians at Leiden University from 1985 to 1997. He was director of the Netherlands Historical Data Archive and head of department at the Netherlands Institute for Scientific Information Services (NIWI). He was Principal Investigator of the DARIAH preparation project (now an ESFRI ERIC) and vice-chair of CESSDA A.S. (Consortium of European Social Science Data Archive). He is (board) member of several other organisations in the area of humanities computing, data infrastructure and digital archiving, and editor of the newly founded Research Data Journal for the Humanities and Social Sciences.
David Heiner David A. Heiner is Vice President & Deputy General Counsel at Microsoft Corporation, where he leads the law department's Regulatory Affairs team. The team is responsible for regulatory aspects of privacy, telecommunications, finance, accessibility, human rights, online safety and data analytics. Dave joined Microsoft in 1994, leading the law department's antitrust work until 2013. For several years he was also responsible for Microsoft's work with international standard-setting organizations. Dave is a 1982 graduate of Cornell University, where he received a B.A. in Physics, and a 1985 graduate of the University of Michigan Law School, where he served on the editorial board of the law review. Following law school, Dave clerked for the Honorable Thomas P. Griesa of the U.S. District Court in New York. Before joining Microsoft in 1994, Dave practiced at Sullivan & Cromwell in New York. Dave is Chairman of the Board of Probono.net, a national non-profit that works to increase access to justice for the poor through efficient use of technology. Dave has handled a number of immigration cases through Kids in Need of Defense, an advocacy group for unaccompanied immigrant children in the United States.