by Sana Sharma
From the rich history of Ifá divination to the ‘shoe-leather’ epidemiology of John Snow, prediction has longstanding and diverse roots in healthcare and health research. Predictive methods have been used for centuries to assess how diseases will spread, the likelihood of contracting an illness, how that illness will affect patients over time, and what the best treatments might be. Driven by new sources of data and computational techniques, modern predictive methods – including simulation, modeling, and machine learning – provide valuable new tools for medicine and health research. The public primarily experiences the effects of these health-related predictions through conversations with doctors about their own personal health, and through reports on new findings from healthcare research that advance our understanding of the human body. However, these predictive tools also come with their own uncertainties and risks that should be better understood by experts, healthcare workers, and the general public.
A few motivating questions can help us uncover how modern predictive methods impact healthcare, and how we should interpret their recommendations:
How are predictions made in a particular area, and with what data?
How is uncertainty and risk and evaluated?
How should predictions, uncertainty, and risk be explained, and to whom?
Given the diversity of reasons that predictive methods are used in healthcare, two other lenses are also worth considering: whether a prediction is made for short- or long-term timeframes, and whether a prediction should be used to inform personal decision making or healthcare policy. The following sections will discuss a few health disciplines where predictive methods are being used to today – epidemiology, medical research, and personal genomics – and what impact prediction has had in the societal and long-term to the deeply personal and short-term.
Epidemiology and Population-Level Health
Among the oldest and most famous uses of prediction in health research is the field of epidemiology, or the study of how diseases spread within a population and what interventions may help control the outbreak. While the field has come a long way since John Snow’s early epidemiological fieldwork in the 1830s, there is much a lot of uncertainty and challenge associated with modeling disease spread, identifying potential causes, and determining the best courses of action. As outlined by Megan Murray, professor of epidemiology and global health at Harvard’s T.H. Chan School of Public Health, modern epidemology can broadly be broken down into two approaches: traditional fieldwork epidemiology and more modern infectious disease modeling. Both approaches have their strengths, and can be used effectively to better understand how a disease spreads and what may be causing that spread. Fieldwork requires trained professionals to go out into the field at the first sign of an outbreak and speak with both infected and healthy individuals, in order to map the disease as it spreads and identify potential methods of transmission. Disease modeling relies upon already collected data to produce models that vary in complexity and can take into account more features than traditional practices. Some epidemiologists prefer the more qualitative fieldwork approach, as certain characteristics of disease are easier to capture through conversation rather than hard data, but disease modeling if used effecitvely can generate very accurate and useful predictions. Uncertainty is introduced at various steps of both these processes, in the data collected, what data is missing or is impossible to collect, the identification of relevant parameters, and the analytical or statistical methods used.
Both traditional fieldwork and computational modeling produce predictions at the population level, more suited for informing policy than for individual health decisions. This makes the communication of risk challenging, as well as the identification of ‘modifiable’ risk factors. What may be broadly true for a population may not be true for individuals in a community, particuarly more vulnerable individuals. It is an ongoing challenge to translate analyses and predictions from epidemiology into effective health policy that accounts for and safeguards as many people as possible.
Predictive Insights in Medical Research
Medical research also relies upon data from large groups of people, but rather than investigating the spread of disease in an outbreak, the research focuses on how disease affects the body, identify who may be most at risk for a disease, and what might be effective treatment. There is a diversity of research that falls into this category, but prediction is relevant component of all of them: when evaluating and treating patients, doctors make predictions about what is likely to happen to them based on the best understanding available, which is informed and updated by variety of research activities.
While the field is too diverse to summarize completely, some common methods employed in medical research include longitudinal studies, random clinical trials, and more recently, the use of machine learning and other big data techniques to find patterns and correlations that may be clinically meaningful. Longitudinal studies collect environmental and demographic data from a group of people over a long period of time, sometimes years or decades, in order to learn more about what the risk factors of certain diseases are, and how the disease progresses over time. Randomized clinical trials divide a group of patients into two random cohorts, an experimental group who receives treatment and a control group who doesn’t, and compares the groups to evaluate the safety and efficacy of the treatment. The results from these more traditional tools in medical research have uncertainty built into them in a few different ways, which are important for researchers and doctors to understand and present clearly. Given the variability found in almost all research, one cannot usually determine that if X is true, it will definitely lead to Y. Instead, most medical recommendations are presented as assessments of risk over a certain period of time, such as ‘a 5% risk of developing a certain type of cancer in the next five years.’ In addition, the techniques or models used to analyze data introduce uncertainty of their own, making our hypothetical 5% in reality a 5% plus or minus some additional percentage. This uncertainty is not a reason to throw out medical findings in its entirety. As biostatics professor Peter Kraft states well in the related video, “uncertainty is a quantification of certainty.” However, this uncertainty does indicate the skill and care that is needed to properly understand and present these research findings to promote the best patient outcomes. Effectively communicating risk and uncertainty is especially important given the gravity of decisions made in medical fields. Different challenges emerge when presenting findings to different audiences – what a doctor needs to understand about medical research may be different from a patient or research collaborator.
Understanding and communicating uncertainty is especially relevant for some of the newer machine learning techniques used in medical research today. As patient data is increasingly collected digitally, machine learning techniques are beginning to be employed to search for patterns, iterate to find optimal solutions, and inform risk and treatment assessments. However, there are risks in employing a machine learning model that cannot be interpreted or understood completely, especially if it makes predictions based on complex correlations without a clear causal process behind them. The question has arrisen in medicine about what the most responsible way of using these new tools are from both doctors and researchers working in this space. One promising application of machine learning in healthcare is as an “early warning system” to identify patterns for patients who are likely to experience negative effects after surgery or treatment well before the negative episode takes place. These systems also have suggested use in improving risk assessment for certain elective surgeries, steering away patients who are more at risk of being harmed by the surgery. In addition, insights from machine learning are beginning to be validated through more traditional means, such as randomized clinical trials, in order to provide an additional layer of evaluation.
Personal Genomics and Individual Agency
While insights from traditional medical research are usually communicated to patients through doctors, there has been a growing trend in recent years of individuals who want more insight and agency over their own health data. Movements like this are made possible through advances in the technologies used to collect this data, from real time monitoring wearables to the increased efficiency and reduced cost of genetic sequencing.
Initiatives such as the Personal Genome Project are particularly interesting cases, as they encourage people to ‘open-source’ their genetic information for the benefit of scientific research, in return for the benefit of a better understanding of their own genome and what it reveals about their health. While the field of genetics has grown considerably since Mendel’s experiments with heritability and pea plants, there is still a fair amount of uncertainty in interpreting what someone’s genome reveals about their health. Genetic predisposition often involves multiple genes and environmental factors; as Prof. George Church of Harvard Medical School succinctly puts, “genes are not destiny.” If understood well, genetic predisposition can motivate people to take preventative action, change their environments, and make informed decisions about themselves and their families. However, to do this well, one must understand the uncertainties associated with genomics and the risks associated with providing individuals life-altering health information. Given the complexity of the human genome and the siloed, biased nature of available genetic data, there is a risk that our understanding of genetic predisposition only applies to certain groups of people in certain cases. There is also the risk of patients taking drastic preventative action based on genetic testing, potentially to the detriment of their health. Programs like the Personal Genome Project mandate their participants to take a test on what the uncertainties and risks associated with genomic data are, as well as whether they are comfortable about what they could learn. As access to health data and predictive insights becomes increasingly easy for individuals to access, it is up to the cross-functional teams that build these systems to present predictions in ways that get people understand what their data says about them and to make good choices about their health.
In each of the fields covered here, multiple predictive methods have been shown to provide value in research, clinical, and personal settings. While each method has its strengths, uncertanties, and risks, they all can contribute effectively to our understanding of healthcare and health policy if wielded responsibly. A future prediction might be that the most successful outcomes will arise from collaborative efforts using multiple predictive methods, in order to create a more holistic picture of human health.