ReSourcing BigData

A Symposium & Collaboration Opportunity

March 23 - 24, 2015

Day 1: Symposium
Extant data is an inexhaustible resource that is not yet very well understood and is underutilized. The focus of this symposium was to explore this area from various perspectives - privacy and security, policy, open clinical trial data, systems and disease-oriented synthetic efforts, and individually-provided, aggregated crowd-sourced data. The goal was to engage our biomedical and public health research community in a more nuanced appreciation of these and similar issues. Topics included: data aggregation, access, annotation, refocusing on novel or unanticipated questions, and recombination with diverse demographic/epidemiologic data (e.g. resource use or disease patterns combined with residential, socioeconomic, educational datasets). This one-day symposium was followed by half-day, topic-specific workshops that focused on various aspects of big data use, reuse, integration, and collaboration.

Day 1 Symposium Speakers

Paul Avillach

Paul Avillach, MD, PhD Assistant Professor, Pediatrics, Harvard Medical School, Boston Children's Hospital

Paul Avillach holds an MD in public health and epidemiology, and a PhD in biomedical informatics. An assistant professor of pediatrics at HMS, he is based at the Center of Biomedical Informatics and is on the faculty of Boston Children's Hospital as part of the Children's Hospital Informatics Program. His research focuses on the development of novel methods and techniques for the integration of multiple heterogeneous clinic cohorts, electronic health records data, and multiple types of genomics data to encompass biological observations.
Slides to Dr. Avillach's presentation

Stephen Friend

Stephen Friend, MD PhD, Former President, Co-Founder, and Director Sage Bionetworks

Stephen Friend is the former president of Sage Bionetworks, a non-profit organization that provides the tools and environment to conduct dynamic, large-scale collaborative biomedical research. He is an authority in the field of cancer biology and a pioneer in the field of the genetics of gene expression, integrating system biology approaches to complex diseases. Friend believes that successful biomedical research requires the active participation from all stakeholders. He is reimagining the role of citizens in the research process and is building tools to empower them to contribute both their data and expertise as they see fit. He also believes in the importance of iteratively generating and testing novel hypotheses transparently and collaboratively. Under his leadership, Sage Bionetworks has developed an open-source technology platform, called Synapse, for data-intensive analysis, sharing and reuse, enabling researchers to perform cutting-edge computational biology and research. Friend is engaging the community with crowd-source solutions to complex biomedical questions through targeted DREAM challenges. Previously he was senior vice president and franchise head for oncology research at Merck & Co., Inc., where he led Merck's basic cancer research efforts. Previously, Friend, along with Lee Hartwell, PhD, founded and co-led the Fred Hutchinson Cancer Research Center's "Seattle Project," an advanced institute for drug discovery, and later they co-founded Rosetta Inpharmatics with Leroy Hood, MD, PhD. Friend also held faculty positions at HMS from 1987 to 1995 and at Massachusetts General Hospital from 1990 to 1995. He received his MD/PhD from Indiana University. Friend was named an Ashoka Fellow for his work at Sage Bionetworks.
Slides to Dr. Friend's presentation

Steven Hyman

Steven Hyman, MD, Harvard University Distinguished Service Professor, Stem Cell and Regenerative Biology, Director, Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT

Steven Hyman, MD, is director of the Stanley Center for Psychiatric Research at the Broad Institute of MIT and Harvard, as well as Harvard University Distinguished Service Professor of Stem Cell and Regenerative Biology. From 2001 to 2011, he served as provost of Harvard University, the University's chief academic officer. As provost, he had a special focus on developing collaborative scientific initiatives that span multiple disciplines and institutions. In that role he helped shape the Broad Institute and Harvard's Wyss Institute for Biologically Inspired Engineering. From 1996 to 2001, he served as director of the US National Institute of Mental Health (NIMH), where he emphasized investment in neuroscience and emerging genetic technologies, as well as the establishment of DNA collections to facilitate genetic studies at large scale. He also initiated a series of large clinical trials with the goal of informing practice. Hyman is president-elect of the Society for Neuroscience, editor of the Annual Review of Neuroscience, and was founding president of the International Neuroethics Society. Hyman received his BA summa cum laude from Yale College, a BA and MA from the University of Cambridge, which he attended as a Mellon fellow, and an MD cum laude from HMS.
Slides to Dr. Hyman's Presentation

Tariq Khokhar

Tariq Khokhar, MA, Data Scientist World Bank

Tariq Khokhar, MA, is the World Bank's first data scientist and open data evangelist. He's a mathematician and computer scientist by training whose interests lie where technology, design, and data meet. Khokhar works on developing new methods for creating, analyzing, and visualizing data, and on making data openly accessible for public re-use. He also supports the World Bank's client countries with open data and statistics programs at the national, regional, and sectoral levels. Tariq holds degrees from the University of Cambridge, and currently lives in Washington DC.
Slides to Mr. Khokhar's Presentation



Steven McCarroll

Steven McCarroll, PhD, Professor of Genetics, Harvard Medical School, Director of Genetics, Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT

Steve McCarroll's goal is to use genetics to reveal the molecular basis of mental illnesses, generate new ideas for therapeutics, and understand how genome variation gives rise to variation in human biology. McCarroll has worked to find and better understand genetic risk factors for diseases such as schizophrenia and bipolar disorder by combining genome-wide data, collected from large cohorts of patients, with focused molecular biological experiments in neurons and brain. Having helped find many genetic influences on risk of these disorders, McCarroll and his group seek to understand what biological perturbations arise from these genetic variants - what genes and proteins are affected, and in what populations of cells, and how the molecular biology of these cells, especially neurons, varies under the influence of these genetic differences. McCarroll also studies large-scale variation in the human genome, including the deletion, duplication, and rearrangement of long genomic segments consisting of hundreds of thousands of base pairs. He has developed widely used molecular tools for identifying such "structural" variation in genomic DNA, and computational approaches for identifying such variation from genome-wide sequence data sets. McCarroll and his colleagues have found that several of these large-scale structural variants are strong risk factors for schizophrenia, autism, and other clinical phenotypes. McCarroll received his PhD from University of California, San Francisco, and his postdoctoral fellowship at Massachusetts General Hospital and the Broad Institute.

Sally Okun

Sally Okun, RN, MMHS, Vice President for Advocacy, Policy and Patient Safety, PatientsLikeMe

Sally Okun, RN, MMHS, is the vice president for Advocacy, Policy and Patient Safety at PatientsLikeMe in Cambridge, Massachusetts. She is responsible for the company's patient advocacy initiatives; participates and contributes to health policy discussions at the national and global level; and is the company's liaison with government and regulatory agencies. Okun joined the company in 2008 as the manager of Health Data Integrity and Patient Safety, overseeing the site's medical ontology including the curation of patient-reported health data and an ever-evolving patient vocabulary. Okun also developed the PatientsLikeMe Drug Safety and Pharmacovigilance Platform. Okun participates on the Institute of Medicine's Roundtable on Value and Science Driven Healthcare as a member of the Clinical Effectiveness Research Innovation Collaborative, the Evidence Communication Innovation Collaborative, and the Best Practices Innovation Collaborative, and is a member of the Committee on Core Metrics for Better Health at Lower Cost. Okun received her nursing diploma from the Hospital of St. Raphael School of Nursing, baccalaureate degree in nursing from Southern Connecticut State University, and a master's degree from The Heller School for Social Policy & Management at Brandeis University. She completed her study of palliative care and ethics at Memorial Sloan-Kettering Cancer Center, and was a fellow at the National Library of Medicine Program in Biomedical Informatics and the Salzburg Global Summit on integrating behavioral health into primary care. 
Slides to Ms. Okun's Presentation

Joanne Waldstreicher

Joanne Waldstreicher, MD, Chief Medical Officer, Johnson & Johnson

As CMO of Johnson & Johnson, Waldstreicher has cross-sector oversight for safety of all Johnson & Johnson products worldwide. In addition, Waldstreicher plays a leadership role for epidemiology, internal and external partnerships and collaborations, and development of the corporate science, technology, and R&D policies, including those related to clinical trial transparency. Waldstreicher also chairs the pharmaceuticals R&D Development Committee, which reviews all late-stage development programs in the pharmaceutical pipeline. Under Waldstreicher's leadership in her prior role as both CMO of the pharmaceutical sector and head of Asia Pacific Medical Sciences, four legacy safety groups were integrated into one independent global medical safety organization. In addition, Waldstreicher reshaped and realigned the R&D and medical affairs groups across Asia Pacific, resulting in an industry-leading drug pipeline in Japan, and the company's first ever international drug approval from a team based in China. Previously, Waldstreicher served as head of Global Drug Development for the Johnson & Johnson Pharmaceutical Research & Development, L.L.C. (J&JPRD) CNS/Internal Medicine business unit. In this role she was responsible for late-stage development of the CNS/Internal Medicine pipeline, spanning the areas of psychiatry, neurology, pain, infectious disease, cardiovascular medicine, urology, metabolism, and other emerging areas. Prior to joining J&JPRD in 2002, Waldstreicher was head of the Endocrinology and Metabolism clinical research group at Merck Research Laboratories, and responsible for overseeing clinical development of Mevacor®, Zocor®, Proscar®, and Propecia®, and for clinical development programs in atherosclerosis, obesity, diabetes, urology, dermatology, and oncology. During that time, she received numerous distinctions, including the Merck Research Laboratory Key Innovator Award. Waldstreicher received both the Jonas Salk and Belle Zeller scholarships from the City University of New York and graduated summa cum laude from Brooklyn College, and cum laude from HMS.
Slides to Dr. Waldstreicher's Presentation

Marsha Wilcox

Marsha Wilcox, EdD, ScD, Scientific Director and Fellow, Janssen Pharmaceutical Research & Development

Marsha Wilcox, EdD ScD, joined the Epidemiology group at Janssen Pharmaceutical Research & Development in 2008. She was named a Janssen Fellow in 2013. Wilcox is the project leader for the Open Translational Science in Schizophrenia effort, the OPTICS Project. At Janssen, she shares her expertise in methods, clinical epidemiology of psychiatric conditions, and biomarker epidemiology, in addition to working with both internal and external collaborators to address key questions in design, natural history of diseases, disease subtypes, prediction of treatment response or adverse effects in subpopulations, and drug safety. Prior to joining J&J, she was an epidemiologist at Ingenix, and served on the faculties of Medicine, Genetics, Epidemiology, and Biostatistics at Boston University and as a lecturer in psychiatry at Harvard. Wilcox holds degrees in education (BMus, MA, EdM, EdD) and statistics/epidemiology (MS, MS, ScD) from Columbia and Harvard Universities, including a post-doctoral fellowship in psychiatric genetics at HMS. She also holds an MPS in digital photography from the School of Visual Arts. Wilcox's experience includes private sector marketing and social science research, health care consulting, and teaching. She has several academic honors, a significant record of publications, and is widely regarded for her expertise in the application of various analytic and data mining methods in large patient-level observational and biomarker data.
Slides to Dr. Wilcox's Presentation

Day 2: Working Sessions
Four concurrent breakout/working sessions were held on the morning of March 24, in the Countway Library conference and meeting space for those interested in learning about these specific data sets and discussing potential collaborative efforts regarding these data sets. Harvard Catalyst intends to use the outcomes of these discussions to structure funding opportunities centered on collaborative new uses of these big data.

Day 2 Working Sessions

Session 1: The Human Oral Microbiome: Database

Floyd E. Dewhirst, PhD, DDS, The Forsyth Institute & Harvard School of Dental Medicine

The Human Oral Microbiome Database is a taxonomic and genomic database for the approximately 700 bacterial taxa present in the human oral cavity. The database contains full genomes for over half of the oral taxa and visualization tools for examining the genomes. The website has a 16S rRNA reference set and BLAST tools for placing sequences libraries into HOMD defined oral taxa. The oral genomes may also be search based on annotation and examined by BLAST. Protein sets are available from all genomes for transcriptome and proteome analyses.

Slides to Dr. Dewhirst's presentation

Session 2: The National Sleep Research Resource

Susan Redline, MD, MPH, Brigham & Women's Hospital and Beth Israel Deaconess Medical Center

Presentation and discussions will cover an overview of, how to sign up and request data. We will browse dataset documentation, data dictionaries, and tools. As an example, we will present a basic use case: identifying appropriate dataset(s) through online query tools, downloading data, moving forward with analysis. Additionally, we will describe EDF/signal files, annotation files, and show available online/MATLAB tools.

National Sleep Research Resource (NSRR)
The National Sleep Research Resource (NSRR), funded by the NHLBI, offers free web access to "big data" collections of deidentified physiological signals and data elements collected in well characterized research cohorts and clinical trials. Access is intended to spur hypothesis generation and testing, cross-cohort analysis of risk factors and outcomes, development of new signal processing tools, and to enhance training in clinical analysis, signal processing, epidemiological analysis, and understanding of brain and cardiopulmonary function in health and disease. Users can query and search across thousands of data elements, identify those of most relevance and explore their statistical distributions. There are currently 11,078 polysomnography recordings available in downloadable EDF (European data format) files, and users can also download standard annotations and derived summary measures. New data releases are scheduled every quarter. The NSRR also provides open-source software for viewing and analyzing these data.

MyApnea.Org is a patient-powered research network (PPRN) funded by the Patient-centered Outcome Research Institute as a part of their National Patient-Centered Clinical Research Network (PCORnet) initiative. The goal of MyApnea.Org and other PPRNs is to transform clinical research by engaging 50,000 patients, care providers, and health systems in collaborative partnerships to improve healthcare and advance medical knowledge. By bringing research and patient care together, this innovative health data network will be able to explore the questions that matter most to patients and their families. By combining the knowledge and insights of patients, caregivers, and researchers using a Common Data Model, not only will each disease-specific network be able to contribute to cost-effectiveness research in a new effective way, but the combination of data across networks will allow for a revolutionary network with carefully controlled access to rich sources of health data that will be able to respond to patient's priorities and speed the creation of new knowledge to guide treatment on a national scale.

Slides to Dr.Redline's presentation

Session 3: The OPTICS Project: Open Translational Science

The OPTICS Project: Open Translational Science in Schizophrenia, Using Clinical Trial and NIH Data Together

Marsha A. Wilcox, EdD, ScD, Janssen Pharmaceutical R & D
Adam Savitz, MD, PhD, Janssen Pharmaceutical R & D
Michelle Williams, ScD, SM, MS, Harvard T.H. Chan School of Public Health


The goal of this project is to demonstrate the value of an open-science approach for fostering collaborations leading to insights about the disease and therapeutic safety and efficacy. Disease understanding may be extended by the identification of subtypes, disease course(s) and/or etiologies. A parallel goal is the development of novel analytic and design methods to be used with disparate data types.

This effort is distinct in that it is a time-limited proof of concept for an open-science analytic collaboration based on both clinical trial and observational data sources. This is not the establishment of a data repository with a common data model for use in perpetuity. It's the antithesis; an open-science analytic collaboration designed to leverage the strengths and mitigate weaknesses of different types of data, and find synergies among different data sources.

Access to clinical trial data will be provided through the Yale Open Data Access Project, YODA.

Intellectual property generated from this project will be dedicated to the public and free for everyone to use.

Slides to Session 3

Session 4: Sage Bionetworks: Alternate Approaches to Explore Multi-Dimensional Clinical Spaces

Alternate Approaches to Explore Multi-Dimensional Clinical Spaces

Stephen Friend, MD, PhD, President, Co-Founder and Director of Sage Bionetworks

Much of what we do is guided by past extensions of past thoughts when it comes to how we take on our new pressing problems. It is beneficial to step back and ask more fundamental questions about what are the best approaches to make. It is helpful to look at our current roles. It is helpful to look at why some stay healthy and what drives the day to day variations in what are called chronic diseases.

Slides to Session 4