Advertisement

Using observational data for personalized medicine when clinical trial evidence is limited

  • Boris Gershman
    Correspondence
    Reprint requests: Boris Gershman, M.D., Rhode Island Hospital and The Miriam Hospital, Warren Alpert Medical School of Brown University; 195 Collyer Street, Suite 201; Providence, Rhode Island 02904.
    Affiliations
    Division of Urology, Rhode Island Hospital and the Miriam Hospital, Providence, Rhode Island

    Warren Alpert Medical School, Providence, Rhode Island
    Search for articles by this author
  • David P. Guo
    Affiliations
    Division of Urology, Rhode Island Hospital and the Miriam Hospital, Providence, Rhode Island

    Warren Alpert Medical School, Providence, Rhode Island
    Search for articles by this author
  • Issa J. Dahabreh
    Affiliations
    Center for Evidence Synthesis in Health, School of Public Health, Providence, Rhode Island

    Department of Health Services, Policy and Practice, School of Public Health, Providence, Rhode Island

    Department of Epidemiology, School of Public Health, Brown University, Providence, Rhode Island
    Search for articles by this author
      Randomized clinical trials are considered the preferred approach for comparing the effects of treatments, yet data from high-quality clinical trials are often unavailable and many clinical decisions are made on the basis of evidence from observational studies. Using clinical examples about the management of infertility, we discuss how we can use observational data from large and information-rich health-care databases combined with modern epidemiological and statistical methods to learn about the effects of interventions when clinical trial evidence is unavailable or not applicable to the clinically relevant target population. When trial evidence is unavailable, we can conduct observational analyses emulating the hypothetical pragmatic target trials that would address the clinical questions of interest. When trial evidence is available but not applicable to the clinically relevant target population, we can transport inferences from trial participants to the target population using the trial data and a sample of observational data from the target population. Clinical trial emulations and transportability analyses can be coupled with methods for examining heterogeneity of treatment effects, providing a path toward personalized medicine.

      Key Words

      Discuss: You can discuss this article with its authors and other readers at https://www.fertstertdialog.com/users/16110-fertility-and-sterility/posts/31843-25888.

      Background

      Randomized clinical trials are considered the preferred approach for comparing the effects of treatments because randomization renders the compared groups similar (in expectation) with respect to both measured and unmeasured (including unknown) pretreatment covariates, and justifies the use of straightforward statistical methods to estimate treatment effects (
      • Fisher R.A.
      The design of experiments.
      ). Clinical trials are prospectively planned experimental studies; thus, besides randomization, they have many other features that enhance validity, such as concurrent control groups, standardized outcome definitions and follow-up procedures, and measures to limit missing data and loss to follow-up. For these reasons, traditional “evidence hierarchies” identify clinical trials or meta-analyses of clinical trials as level I evidence (the highest possible) (
      • Sackett D.L.
      Rules of evidence and clinical recommendations on the use of antithrombotic agents.
      ,
      • Frieden T.R.
      Evidence for health decision making: beyond randomized, controlled trials.
      ,
      • Irving M.
      • Eramudugolla R.
      • Cherbuin N.
      • Anstey K.J.
      A critical review of grading systems: implications for public health policy.
      ).
      Despite the recognition that well-conducted clinical trials can support valid causal inference, physicians often have to make clinical recommendations with no or limited evidence from clinical trials (
      • Dahabreh I.J.
      • Kent D.M.
      Can the learning health care system be educated with observational data?.
      ,
      • Dahabreh I.J.
      • Hayward R.
      • Kent D.M.
      Using group data to treat individuals: understanding heterogeneous treatment effects in the age of precision medicine and patient-centred evidence.
      ). Clinical trials are often infeasible because of logistical, cost, or ethical considerations (
      • Elliott D.
      • Husbands S.
      • Hamdy F.C.
      • Holmberg L.
      • Donovan J.L.
      Understanding and improving recruitment to randomised controlled trials: qualitative research approaches.
      ,
      • McCulloch P.
      • Taylor I.
      • Sasako M.
      • Lovett B.
      • Griffin D.
      Randomised trials in surgery: problems and possible solutions.
      ,
      • Cook J.A.
      The challenges faced in the design, conduct and analysis of surgical randomised controlled trials.
      ,
      • Kao L.S.
      • Aaron B.C.
      • Dellinger E.P.
      Trials and tribulations: current challenges in conducting clinical trials.
      ). And when conducted, clinical trials sometimes suffer from serious methodological shortcomings, such as selection bias (e.g., informative dropout and loss to follow-up) and missing data, or have sample sizes and follow-up durations that are inadequate for assessing comparative effectiveness for clinically important outcomes (
      • Frieden T.R.
      Evidence for health decision making: beyond randomized, controlled trials.
      ,
      • Dahabreh I.J.
      • Kent D.M.
      Can the learning health care system be educated with observational data?.
      ,
      • Dahabreh I.J.
      • Hayward R.
      • Kent D.M.
      Using group data to treat individuals: understanding heterogeneous treatment effects in the age of precision medicine and patient-centred evidence.
      ,
      • Bothwell L.E.
      • Greene J.A.
      • Podolsky S.H.
      • Jones D.S.
      Assessing the gold standard—lessons from the history of RCTs.
      ). Even in high-quality clinical trials, trial participants are often selected on the basis of characteristics that modify the treatment effect. When that is the case, estimates of population-averaged treatment effects from trial participants do not directly apply to the patient populations seen in clinical practice.
      In this article, using two clinical cases, we discuss how we can use observational data from large and information-rich health-care databases combined with modern epidemiological and statistical methods to draw inferences about the effects of treatments when clinical trial evidence is unavailable or not applicable to clinically relevant target populations. When clinical trial evidence is unavailable, we can conduct observational analyses emulating a hypothetical pragmatic target trial that would address the clinical question of interest. When clinical trial evidence is available but not applicable to the target population, we can transport inferences from trial participants to the target population using the trial data and a sample of observational data from the target population. Both trial emulation and transportability analyses can be combined with methods for examining the heterogeneity of treatment effects to personalize care.

      Using observational data to emulate target trials

       Clinical Case 1

      A 32-year-old man with primary infertility presents with his wife of the same age. He is diagnosed with nonobstructive azoospermia, and their physician recommends surgical sperm retrieval with in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI). The couple asks whether fresh or cryopreserved sperm would increase the chance of a clinical pregnancy. What evidence should the physician rely on to counsel the couple?

       Methodological Considerations for Clinical Case 1

      For the couple in this case, no clinical trials have compared IVF-ICSI with fresh versus cryopreserved surgically retrieved sperm; a recent systematic review on this question identified 11 observational studies but no randomized trials (
      • Ohlander S.
      • Hotaling J.
      • Kirshenbaum E.
      • Niederberger C.
      • Eisenberg M.L.
      Impact of fresh versus cryopreserved testicular sperm upon intracytoplasmic sperm injection pregnancy outcomes in men with azoospermia due to spermatogenic dysfunction: a meta-analysis.
      ). In the absence of clinical trial evidence, well-conducted observational studies are often the best source of evidence (
      • Frieden T.R.
      Evidence for health decision making: beyond randomized, controlled trials.
      ,
      • Dahabreh I.J.
      • Kent D.M.
      Can the learning health care system be educated with observational data?.
      ,
      • Bothwell L.E.
      • Greene J.A.
      • Podolsky S.H.
      • Jones D.S.
      Assessing the gold standard—lessons from the history of RCTs.
      ).
      Observational studies take advantage of clinical practice variation to assess the effects of treatments that are not assigned by the investigators. Because treatment assignment is not randomized, observational studies are susceptible to confounding bias by shared causes of the treatment and the outcome. For instance, physicians may be more likely to offer cryopreservation to men with a higher probability of having viable sperm; and the availability of treatment may vary by geographic location or socioeconomic status, which may also affect fertility rates. In addition, observational studies are susceptible to selection bias (like any follow-up study, including clinical trials), measurement error bias, and (when considering per-protocol effects) time-varying confounding (i.e., when the exposure is time-varying and a covariate measured after the baseline is an independent predictor of both subsequent treatment and the outcome, within strata determined by baseline covariates and prior treatment [
      • Robins J.M.
      • Hernán M.A.
      Chapter 23: estimation of the causal effects of time-varying exposures.
      ,
      • Hernán M.A.
      • Hernandez-Diaz S.
      • Robins J.M.
      Randomized trials analyzed as observational studies.
      ,
      • Toh S.
      • Hernán M.A.
      Causal inference from longitudinal studies with baseline randomization.
      ]). Because design choices that can mitigate selection and measurement error biases are often impossible to implement in observational studies (especially when using routinely collected data), observational analyses may be more susceptible to these biases than clinical trials. Thus, causal inference from observational studies is often more speculative than inference based on well-conducted clinical trials, and the conduct of observational studies needs great care.
      When designing an observational study, it is useful to consider a hypothetical target trial that would address the same clinical question (
      • Hernán M.A.
      • Hernandez-Diaz S.
      • Robins J.M.
      Randomized trials analyzed as observational studies.
      ,
      • Hernán M.A.
      • Alonso A.
      • Logan R.
      • Grodstein F.
      • Michels K.B.
      • Willett W.C.
      • et al.
      Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease.
      ,
      • Hernán M.A.
      • Robins J.M.
      Using big data to emulate a target trial when a randomized trial is not available.
      ). The process begins by specifying the protocol of this target trial: eligibility criteria, treatment strategies, assignment procedures, follow-up duration, outcomes, causal contrasts (i.e., targets of inference, such as the intention-to-treat effect), and analysis plan (
      • Hernán M.A.
      • Robins J.M.
      Using big data to emulate a target trial when a randomized trial is not available.
      ). The protocol is used to guide the conduct of the observational study in an iterative process: refinements to the clinical question and practicalities related to the data suggest modifications of the protocol, while keeping the target trial in view ensures that the data used for the emulation contain adequate information and are processed in a way that can allow a causal interpretation of the final analysis. In the context of using routinely collected clinical (e.g., electronic medical records) or administrative (e.g., insurance claims) data (
      • Hernán M.A.
      • Robins J.M.
      Using big data to emulate a target trial when a randomized trial is not available.
      ), the target trial framework provides a way to impose structure on messy “big data” and “real-world evidence.” Given that data collection occurs as part of regular care encounters primarily for nonresearch purposes, the target trials that can be emulated with routinely collected data are necessarily highly pragmatic ones. For example, the target trials would define interventions fairly broadly, use administrative data to ascertain outcomes, and forego blinding (
      • Ford I.
      • Norrie J.
      Pragmatic trials.
      ,
      • Rosenthal G.E.
      The role of pragmatic clinical trials in the evolution of learning health systems.
      ).
      The intuition that an observational study comparing treatments should be viewed as an attempt to emulate a target trial is shared across different fields that have to rely on observational analyses to compare treatments, including medicine, epidemiology, and the social sciences (
      • Concato J.
      • Shah N.
      • Horwitz R.I.
      Randomized, controlled trials, observational studies, and the hierarchy of research designs.
      ,
      • Benson K.
      • Hartz A.J.
      A comparison of observational studies and randomized, controlled trials.
      ,
      • Anglemyer A.
      • Horvath H.T.
      • Bero L.
      Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials.
      ,
      • Dahabreh I.J.
      • Sheldrick R.C.
      • Paulus J.K.
      • Chung M.
      • Varvarigou V.
      • Jafri H.
      • et al.
      Do observational studies using propensity score methods agree with randomized trials? A systematic comparison of studies on acute coronary syndromes.
      ,
      • Hemkens L.G.
      • Contopoulos-Ioannidis D.G.
      • Ioannidis J.P.
      Agreement of treatment effects for mortality from routinely collected data and subsequent randomized trials: meta-epidemiological survey.
      ,
      • Kitsios G.D.
      • Dahabreh I.J.
      • Callahan S.
      • Paulus J.K.
      • Campagna A.C.
      • Dargin J.M.
      Can we trust observational studies using propensity scores in the critical care literature? A systematic comparison with randomized clinical trials.
      ,
      • Kunz R.
      • Oxman A.D.
      The unpredictability paradox: review of empirical comparisons of randomised and non-randomised clinical trials.
      ,
      • Lonjon G.
      • Boutron I.
      • Trinquart L.
      • Ahmad N.
      • Aim F.
      • Nizard R.
      • et al.
      Comparison of treatment effect estimates from prospective nonrandomized studies with propensity score analysis and randomized controlled trials of surgical procedures.
      ,
      • Fraker T.
      • Maynard R.
      The adequacy of comparison group designs for evaluations of employment-related programs.
      ,
      • LaLonde R.
      Evaluating the econometric evaluations of training programs with experimental data.
      ,
      • Steven G.
      • Dan M.L.
      • David M.
      Nonexperimental versus experimental estimates of earnings impacts.
      ,
      • Cook T.D.
      • Shadish W.R.
      • Wong V.C.
      Three conditions under which experiments and observational studies produce comparable causal estimates: new findings from within-study comparisons.
      ,
      • Michalopoulos C.
      • Bloom H.
      • Hill C.
      Can propensity-score methods match the findings from a random assignment evaluation of mandatory welfare-to-work programs?.
      ). This intuition has motivated various “benchmarking” attempts comparing estimates from observational studies against matched clinical trials (
      • Concato J.
      • Shah N.
      • Horwitz R.I.
      Randomized, controlled trials, observational studies, and the hierarchy of research designs.
      ,
      • Benson K.
      • Hartz A.J.
      A comparison of observational studies and randomized, controlled trials.
      ,
      • Anglemyer A.
      • Horvath H.T.
      • Bero L.
      Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials.
      ,
      • Dahabreh I.J.
      • Sheldrick R.C.
      • Paulus J.K.
      • Chung M.
      • Varvarigou V.
      • Jafri H.
      • et al.
      Do observational studies using propensity score methods agree with randomized trials? A systematic comparison of studies on acute coronary syndromes.
      ,
      • Hemkens L.G.
      • Contopoulos-Ioannidis D.G.
      • Ioannidis J.P.
      Agreement of treatment effects for mortality from routinely collected data and subsequent randomized trials: meta-epidemiological survey.
      ,
      • Kitsios G.D.
      • Dahabreh I.J.
      • Callahan S.
      • Paulus J.K.
      • Campagna A.C.
      • Dargin J.M.
      Can we trust observational studies using propensity scores in the critical care literature? A systematic comparison with randomized clinical trials.
      ,
      • Kunz R.
      • Oxman A.D.
      The unpredictability paradox: review of empirical comparisons of randomised and non-randomised clinical trials.
      ,
      • Lonjon G.
      • Boutron I.
      • Trinquart L.
      • Ahmad N.
      • Aim F.
      • Nizard R.
      • et al.
      Comparison of treatment effect estimates from prospective nonrandomized studies with propensity score analysis and randomized controlled trials of surgical procedures.
      ,
      • Fraker T.
      • Maynard R.
      The adequacy of comparison group designs for evaluations of employment-related programs.
      ,
      • LaLonde R.
      Evaluating the econometric evaluations of training programs with experimental data.
      ,
      • Steven G.
      • Dan M.L.
      • David M.
      Nonexperimental versus experimental estimates of earnings impacts.
      ,
      • Cook T.D.
      • Shadish W.R.
      • Wong V.C.
      Three conditions under which experiments and observational studies produce comparable causal estimates: new findings from within-study comparisons.
      ,
      • Michalopoulos C.
      • Bloom H.
      • Hill C.
      Can propensity-score methods match the findings from a random assignment evaluation of mandatory welfare-to-work programs?.
      ). In medicine, these comparisons have shown that good agreement between observational studies and clinical trials is possible, but fairly large disagreements do occur, even if they are rarely statistically significant (
      • Dahabreh I.J.
      • Kent D.M.
      Can the learning health care system be educated with observational data?.
      ). Most comparisons have relied on matching already completed observational and randomized studies conducted independently in different patient populations, using the incomplete information available in the published literature (as opposed to patient-level data and study protocols), without harmonizing the methods for baseline confounding control (in the observational studies) or addressing selection and measurement error bias (in either design). Observational studies designed to explicitly emulate target trials combined with better data and state-of-the science methods to address methodological shortcomings (in both clinical trials and observational analyses) should lead to better agreement.
      The target trial framework encourages clear thinking about the goals of the observational analysis and ensures that the methods employed can fit those goals. Thus, the framework confers many practical benefits, some of which we have collected in Table 1. Practical experience with observational analyses explicitly designed to emulate target trials is relatively limited. However, initial results from diverse fields of application are promising (
      • Danaei G.
      • Garcia Rodriguez L.A.
      • Cantero O.F.
      • Logan R.W.
      • Hernán M.A.
      Electronic medical records can be used to emulate target trials of sustained treatment strategies.
      ,
      • Emilsson L.
      • Garcia-Albeniz X.
      • Logan R.W.
      • Caniglia E.C.
      • Kalager M.
      • Hernán M.A.
      Examining bias in studies of statin treatment and survival in patients with cancer.
      ,
      • Garcia-Albeniz X.
      • Hsu J.
      • Bretthauer M.
      • Hernán M.A.
      Effectiveness of screening colonoscopy to prevent colorectal cancer among medicare beneficiaries aged 70 to 79 years: a prospective observational study.
      ,
      • Zhang Y.
      • Young J.G.
      • Thamer M.
      • Hernán M.A.
      Comparing the effectiveness of dynamic treatment strategies using electronic health records: an application of the parametric g-formula to anemia management strategies.
      ). Much of the value of the target trial framework derives from discussions among clinical experts and methodologists regarding the target trial that would address the clinical question and from careful consideration of the compromises that are necessary when using observational data to emulate that trial.
      Table 1Potential benefits of the target trial framework when designing an observational study to compare medical interventions.
      Potential benefitExamples
      Set selection criteria that identify patients for whom treatment recommendations vary and exclude patients for whom confounding is intractable (cannot be addressed by statistical adjustment).Include patients for whom there exists uncertainty regarding the comparative effectiveness of available treatments. Exclude patients for whom clinical policies preclude the use of one or more of the treatments of interest or for whom treatment may be driven by unmeasured confounding variables (e.g., restrict comparisons to younger patients to reduce confounding by frailty).
      Ensure treatments are well-defined and causal contrasts are meaningful for decision-making. Only when the causal contrasts are clearly specified can we select appropriate methods for estimating them.Shift focus away from contrasts that do not have a causal interpretation as treatment effects because the exposures cannot be manipulated even in principle (i.e., exposures that are impossible to study in a trial). Clarify whether intention-to-treat effects (i.e., effects of being assigned treatment), per-protocol effects (i.e., effects of being assigned and complying with a certain treatment regime), or both are of interest.
      Encourage “design thinking” about treatments, covariates, and outcomes with a view to reducing confounding, selection, and measurement error bias.Identify important confounding variables and use the observational data creatively to identify proxies that can reduce confounding. One example is the use of the frequency of cholesterol measurements (in health insurance claims) and history of lipid-lowering medication dispensing as proxies for dyslipidemia (when lipid measurements are unavailable). Another example is the use of durable medical equipment claims as proxies for frailty.
      Appropriately define “time zero” to protect investigators from “self-inflicted biases”
      • Hernán M.A.
      • Sauer B.C.
      • Hernandez-Diaz S.
      • Platt R.
      • Shrier I.
      Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses.
      such as immortal-time bias (i.e., relating to a period of follow-up time during which the outcome cannot occur for a treatment group due to the definition of that treatment) and other forms of selection bias.
      In most analyses estimating treatment effects, it is best to examine incident users (i.e., initiators) of the treatments being compared. Such “new-user designs” are the natural choice when emulating target trials that would enroll patients with no prior exposure to the study drugs in the past or after a prerandomization washout period. New-user designs avoid comparisons of incident users of a newly introduced treatment against prevalent users of more established treatments. Such comparisons can have severe selection bias because prevalent users are selected to tolerate and respond well to the established treatments.
      Plan statistical analyses that are similar to those in a randomized trial, except for the need to adjust for baseline confounding.Examine both absolute and relative treatment effects in observational analyses, as is done in virtually all clinical trials. Address informative censoring in observational studies, as would be done in the target trial. In the presence of nonadherence, control for time-varying confounding when estimating per-protocol effects, as would be done in the target trial.
      Returning to the couple in Clinical Case 1, because there is no evidence from clinical trials to inform clinical recommendations (
      • Ohlander S.
      • Hotaling J.
      • Kirshenbaum E.
      • Niederberger C.
      • Eisenberg M.L.
      Impact of fresh versus cryopreserved testicular sperm upon intracytoplasmic sperm injection pregnancy outcomes in men with azoospermia due to spermatogenic dysfunction: a meta-analysis.
      ), we can consider a hypothetical target trial comparing fresh versus cryopreserved surgically retrieved sperm on the clinical pregnancy rate after IVF-ICSI, as outlined in the mini-protocol in Table 2. Such a trial can be emulated using data from electronic medical records or health insurance claims. The treatment effect estimates from the observational analysis should approximate those that would be produced by a randomized trial conducted in the population from which the observational data are derived.
      Table 2Mini-protocol for a hypothetical target trial of fresh versus cryopreserved surgically retrieved sperm to be emulated using observational data.
      Protocol componentHypothetical target trial of fresh versus cryopreserved surgically retrieved sperm
      Eligibility criteriaAdult men aged 18–39 years with primary infertility and nonobstructive azoospermia on semen analysis, with female partner aged 18–39 years with normal fertility evaluation
      Treatment strategiesIVF/ICSI using fresh versus frozen surgically retrieved sperm
      Assignment proceduresUnblinded random assignment to treatments
      Follow-up periodStarts at randomization; ends at the occurrence of outcome event, 6 months after randomization, loss to follow-up, or death (whichever is earlier)
      OutcomesClinical pregnancy; fertilization; implantation
      Causal contrastsIntention-to-treat effect comparing point (non-time-varying) treatments
      Analysis planAnalyze patients “as randomized”; estimate the difference and ratio of the cumulative incidence proportions comparing the treatment groups and their confidence intervals; test the null hypothesis of no average effect of treatments on the outcome. Adjust for censoring, if follow-up is incomplete.

      Transporting the results of clinical trials to new target populations

       Clinical Case 2

      A 32-year-old woman with polycystic ovary syndrome and infertility presents with her male partner of the same age. His semen analysis is normal. The physician recommends ovulation induction therapy, and the woman asks whether clomiphene citrate or letrozole would improve her chances of getting pregnant. What evidence should the physician rely on to counsel the couple?

       Methodological Considerations for Clinical Case 2

      Generalizing the results of clinical trials involves judgments on whether regularities observed in the trial data are likely to remain stable under different conditions, including changes in the population, treatments, and outcomes under study (
      • Rothman K.J.
      • Gallacher J.E.
      • Hatch E.E.
      Why representativeness should be avoided.
      ). Such broad “scientific generalizations” cannot, in many practical cases, be fully addressed using formal methods, despite recent theoretical advances (
      • Pearl J.
      • Bareinboim E.
      External validity: from do-calculus to transportability across populations.
      ,

      Pearl J, Bareinboim E. Transportability of causal and statistical relations: a formal approach. In: IEEE 11th International Conference on Data Mining Workshops, December 11, 2011; Vancouver, Canada, 540–547. Available at: https://ieeexplore.ieee.org/document/6137426/. Accessed February 28, 2018.

      ). Nevertheless, large databases of routinely collected observational data and individual patient data from completed clinical trials provide an opportunity to formally treat aspects of the generalization problem related to differences between populations seen in clinical practice and clinical trial participants in important characteristics that are predictors of the outcome and modifiers of the treatment effect. Even after restricting attention to patients who would meet the trial eligibility criteria, trial participants are often younger, have fewer comorbidities, and less severe disease than eligible nonparticipants.
      Differences in the distribution of effect modifiers mean that population-averaged treatment effects (i.e., averaged over all baseline covariates in the study population from which the study sample was drawn) from clinical trials cannot be transported (applied) directly to clinically relevant target populations. In other words, the treatment effect seen in a clinical trial may not reflect the treatment effect comparing the same treatments in the clinically relevant target population.
      Large databases of routinely collected observational data and increasingly available individual patient data from completed clinical trials, combined with novel statistical methods (
      • Cole S.R.
      • Stuart E.A.
      Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 trial.
      ,
      • Hartman E.
      • Grieve R.
      • Ramsahai R.
      • Sekhon J.S.
      From SATE to PATT: combining experimental with observational studies to estimate population treatment effects.
      ,
      • O’Muircheartaigh C.
      • Hedges L.V.
      Generalizing from unrepresentative experiments: a stratified propensity score approach.
      ,

      Dahabreh I, Robertson S, Stuart E. Hernán MA. Extending inferences from randomized participants to all eligible individuals using trials nested within cohort studies. Available at: https://arxiv.org/abs/1709.04589; 2017. Accessed February 28, 2018.

      ,
      • Bang H.
      • Robins J.M.
      Doubly robust estimation in missing data and causal inference models.
      ), can be used to transport inferences from trial participants to the target population. The methods require individual participant data on baseline covariates, treatments, and outcomes from a completed clinical trial, and observational baseline covariate data (but not treatment or outcome data) from a sample of the target population. To estimate the treatment effect in the target population, the data are used to build models of the outcome among trial participants and models of the probability of trial participation (

      Dahabreh I, Robertson S, Stuart E. Hernán MA. Extending inferences from randomized participants to all eligible individuals using trials nested within cohort studies. Available at: https://arxiv.org/abs/1709.04589; 2017. Accessed February 28, 2018.

      ,
      • Bang H.
      • Robins J.M.
      Doubly robust estimation in missing data and causal inference models.
      ).
      The details underlying the methods are beyond the scope of this article, but the intuition behind them is easy to convey. Outcome models estimated in the trial are used to extrapolate relationships between treatments, covariates, and outcomes from trial participants to nonparticipants by applying the estimated models to the baseline covariates in the sample from the target population. Models for the probability of trial participation allow us to treat the trial participants as a sample from the target population, where the sampling probabilities depend on baseline covariates and have to be estimated using the data. Methods that combine both kinds of models aim to increase efficiency and gain robustness to possible model misspecification (
      • Bang H.
      • Robins J.M.
      Doubly robust estimation in missing data and causal inference models.
      ,
      • Zhang Z.
      • Nie L.
      • Soon G.
      • Hu Z.
      New methods for treatment effect calibration, with applications to non-inferiority trials.
      ). All methods can be viewed as strategies for principled extrapolation, in the sense that they properly account for the statistical uncertainty involved in extrapolating inferences beyond the population of trial participants. They can be extended to conduct sensitivity analyses to examine the impact of violations of the underlying causal assumptions.
      When examining what evidence to draw upon to advise the infertile woman with polycystic ovary syndrome in Clinical Case 2, the physician first need to examine whether the available evidence applies to the patients in their practice. A clinical trial comparing the effect of clomiphene citrate versus letrozole on pregnancy rates among women with infertility related to polycystic ovary syndrome is available (
      • Legro R.S.
      • Brzyski R.G.
      • Diamond M.P.
      • Coutifaris C.
      • Schlaff W.D.
      • Casson P.
      • et al.
      Letrozole versus clomiphene for infertility in the polycystic ovary syndrome.
      ), but its results may not be applicable to the target population of the physician's practice when the trial's participants differ from the target population in terms of age, body mass index, hormone levels, or a series of other factors likely to modify the treatment effect. Transportability analyses can improve the relevance of the clinical trial results to the physician's practice and can be used to transport treatment effects from the trial to the entire target population (e.g., to tailor clinical practice guidelines or make formulary decisions) or to certain patient subgroups. Ultimately, however, the physician must tailor treatment recommendations to the specific couple in Clinical Case 2 by considering the large number of characteristics over which treatment effects vary.

      Heterogeneity of treatment effects and personalized medicine

      Biological knowledge and clinical intuition suggest that no two individuals respond to treatment the same way. Yet most studies, including virtually all clinical trials, estimate population-averaged treatment effects (i.e., averaged over all the baseline covariates in the study population from which trial participants were drawn). Using estimates of population-averaged effects to make clinical recommendations for individuals is challenging in the presence of heterogeneity, especially if a treatment that is beneficial for one identifiable subgroup of patients is harmful for another (a situation termed “qualitative effect modification”).
      In an attempt to individualize treatment, investigators often examine whether treatment effects vary over identifiable patient characteristics using “one-variable-at-a-time” subgroup analyses or regression analyses with treatment-covariate product terms (“statistical interactions”). This approach faces a fundamental difficulty: the list of candidate effect modifiers is very long, and the combinations of possible patterns of effect modifiers can quickly surpass the total sample size in any clinical trial or observational study (e.g., the number of possible subgroups defined by considering jointly just 20 binary covariates exceeds 1 million). Furthermore, one-variable-at-a-time subgroup analyses have low statistical power (because most studies are powered to test main treatment effects, not between-subgroup differences) and, unless proper precautions are taken, produce a large number of false-positive results.
      An approach that can partly mitigate the difficulties stemming from the large number of effect modifiers is to combine multiple baseline characteristics into a single summary variable and then examine whether treatment effects vary over this variable. Two kinds of summary variables are increasingly popular: effect scores (predicted treatment effects conditional on baseline covariates) and baseline risk scores (predicted outcomes in the absence of treatment conditional on baseline covariates) (
      • Dahabreh I.J.
      • Trikalinos T.A.
      • Kent D.M.
      • Schmid C.H.
      Heterogeneity of treatment effects.
      ,
      • Tian L.
      • Zhao X.
      Statistical methods for personalized medicine.
      ). Both approaches rely on regression models of the outcome under each of the compared treatments (for effect scores), or the “baseline” or “control” treatment (for baseline risk scores) (
      • Hayward R.A.
      • Kent D.M.
      • Vijan S.
      • Hofer T.P.
      Multivariable risk prediction can greatly enhance the statistical power of clinical trial subgroup analysis.
      ,
      • Kent D.M.
      • Rothwell P.M.
      • Ioannidis J.P.
      • Altman D.G.
      • Hayward R.A.
      Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal.
      ,
      • Kent D.M.
      • Nelson J.
      • Dahabreh I.J.
      • Rothwell P.M.
      • Altman D.G.
      • Hayward R.A.
      Risk and treatment effect heterogeneity: re-analysis of individual participant data from 32 large clinical trials.
      ).
      Again, the details of implementing the methods are beyond the scope of this article, but the intuition is easy to communicate: when relying on an effect score, the physician is using an estimate of the treatment effect for patients with characteristics similar to the patient he or she is advising to identify the treatment that is most effective. When relying on a baseline risk score, the physician is trying to recommend a treatment on the basis of the predicted outcome in the absence of treatment. This approach is motivated by a desire to help the most vulnerable patients and the intuition that the patients at highest risk for the outcome in the absence of treatment are the ones with the largest potential for benefit under the treatment (e.g., for a binary outcome, subgroups of patients with near-zero risk cannot experience large treatment benefits).
      The ideas of emulating target trials using observational data and transporting findings from completed trials to target populations can also be combined with approaches for examining the heterogeneity of treatment effects. For instance, large observational data sets can produce precise estimates of the variation of treatment effects over patient characteristics in target trial emulations (
      • Hernán M.A.
      • Hernandez-Diaz S.
      • Robins J.M.
      Randomized trials analyzed as observational studies.
      ,
      • Hernán M.A.
      • Alonso A.
      • Logan R.
      • Grodstein F.
      • Michels K.B.
      • Willett W.C.
      • et al.
      Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease.
      ,
      • Hernán M.A.
      • Robins J.M.
      Using big data to emulate a target trial when a randomized trial is not available.
      ).
      A core premise underlying personalized medicine is that the treatment effect varies among individuals to such an extent that the optimal treatment from the set of treatments under consideration is not the same for all patients. Personalized treatment recommendations are most useful when some patient or disease characteristics are qualitative effect modifiers, such that the optimal treatment strategy (regime) is one that uses these characteristics to assign patients to the treatment under which they will experience the most benefit (as opposed to a uniform treatment strategy). A substantial amount of work in recent years has examined methods for estimating optimal regimes using observational data (
      • Orellana L.
      • Rotnitzky A.
      • Robins J.M.
      Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, Part I: main content.
      ,
      • Robins J.
      • Orellana L.
      • Rotnitzky A.
      Estimation and extrapolation of optimal treatment and testing strategies.
      ,
      • Zhang B.
      • Tsiatis A.A.
      • Laber E.B.
      • Davidian M.
      A robust method for estimating optimal treatment regimes.
      ); observational studies designed to emulate target trials are an ideal setting for using these methods; extensions to the transportability setting have also been proposed (
      • Ogburn E.L.
      Comment.
      ). Both in trial emulations and transportability analyses, the optimal regime may depend on conventional clinicopathological characteristics, but the incorporation of biomarker data, including -omic information, substantially expands the set of candidate effect modifiers and has stimulated recent statistical research.
      In the case of the infertile woman with polycystic ovary syndrome, a nuanced application of evidence from clinical trials would consider her specific baseline characteristics, such as age, race/ethnicity, body mass index, severity of polycystic ovary syndrome, and levels of sex hormones to determine whether clomiphene citrate or letrozole would be the better treatment. Observational data could be leveraged to inform such personalized treatment decisions, either by emulating a target trial or transporting the results from the completed trial to a new target population. Finally, if biomarker data were available, they would represent yet another dimension of covariates over which treatment recommendations could be personalized.

      Conclusion

      Many clinical decisions have to be made with limited randomized evidence. Large observational health-care databases, coupled with novel approaches such as target trial emulation and transportability analyses can help address knowledge gaps when trial evidence is limited. Combining these approaches with novel methods for examining the heterogeneity of treatment effects provides a path toward the ultimate goal of delivering personalized, evidence-based care.

      References

        • Fisher R.A.
        The design of experiments.
        Oliver & Boyd, Edinburgh1937
        • Sackett D.L.
        Rules of evidence and clinical recommendations on the use of antithrombotic agents.
        Chest. 1989; 95: 2s-4s
        • Frieden T.R.
        Evidence for health decision making: beyond randomized, controlled trials.
        N Engl J Med. 2017; 377: 465-475
        • Irving M.
        • Eramudugolla R.
        • Cherbuin N.
        • Anstey K.J.
        A critical review of grading systems: implications for public health policy.
        Eval Health Prof. 2016; ([Epub ahead of print])
        • Dahabreh I.J.
        • Kent D.M.
        Can the learning health care system be educated with observational data?.
        JAMA. 2014; 312: 129-130
        • Dahabreh I.J.
        • Hayward R.
        • Kent D.M.
        Using group data to treat individuals: understanding heterogeneous treatment effects in the age of precision medicine and patient-centred evidence.
        Int J Epidemiol. 2016; 45: 2184-2193
        • Elliott D.
        • Husbands S.
        • Hamdy F.C.
        • Holmberg L.
        • Donovan J.L.
        Understanding and improving recruitment to randomised controlled trials: qualitative research approaches.
        Eur Urol. 2017; 72: 789-798
        • McCulloch P.
        • Taylor I.
        • Sasako M.
        • Lovett B.
        • Griffin D.
        Randomised trials in surgery: problems and possible solutions.
        BMJ. 2002; 324: 1448-1451
        • Cook J.A.
        The challenges faced in the design, conduct and analysis of surgical randomised controlled trials.
        Trials. 2009; 10: 9
        • Kao L.S.
        • Aaron B.C.
        • Dellinger E.P.
        Trials and tribulations: current challenges in conducting clinical trials.
        Arch Surg. 2003; 138: 59-62
        • Bothwell L.E.
        • Greene J.A.
        • Podolsky S.H.
        • Jones D.S.
        Assessing the gold standard—lessons from the history of RCTs.
        N Engl J Med. 2016; 374: 2175-2181
        • Ohlander S.
        • Hotaling J.
        • Kirshenbaum E.
        • Niederberger C.
        • Eisenberg M.L.
        Impact of fresh versus cryopreserved testicular sperm upon intracytoplasmic sperm injection pregnancy outcomes in men with azoospermia due to spermatogenic dysfunction: a meta-analysis.
        Fertil Steril. 2014; 101: 344-349
        • Robins J.M.
        • Hernán M.A.
        Chapter 23: estimation of the causal effects of time-varying exposures.
        in: Fitzmaurice G. Davidian M. Verbeke G. Longitudinal data analysis. Chapman and Hall/CRC Press, New York2009: 553-599
        • Hernán M.A.
        • Hernandez-Diaz S.
        • Robins J.M.
        Randomized trials analyzed as observational studies.
        Ann Intern Med. 2013; 159: 560-562
        • Toh S.
        • Hernán M.A.
        Causal inference from longitudinal studies with baseline randomization.
        Int J Biostat. 2008; 4: 22
        • Hernán M.A.
        • Alonso A.
        • Logan R.
        • Grodstein F.
        • Michels K.B.
        • Willett W.C.
        • et al.
        Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease.
        Epidemiology. 2008; 19: 766-779
        • Hernán M.A.
        • Robins J.M.
        Using big data to emulate a target trial when a randomized trial is not available.
        Am J Epidemiol. 2016; 183: 758-764
        • Ford I.
        • Norrie J.
        Pragmatic trials.
        N Engl J Med. 2016; 375: 454-463
        • Rosenthal G.E.
        The role of pragmatic clinical trials in the evolution of learning health systems.
        Trans Am Clin Climatol Assoc. 2014; 125: 204-216
        • Concato J.
        • Shah N.
        • Horwitz R.I.
        Randomized, controlled trials, observational studies, and the hierarchy of research designs.
        N Engl J Med. 2000; 342: 1887-1892
        • Benson K.
        • Hartz A.J.
        A comparison of observational studies and randomized, controlled trials.
        N Engl J Med. 2000; 342: 1878-1886
        • Anglemyer A.
        • Horvath H.T.
        • Bero L.
        Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials.
        Cochrane Database Syst Rev. 2014; 4: MR000034
        • Dahabreh I.J.
        • Sheldrick R.C.
        • Paulus J.K.
        • Chung M.
        • Varvarigou V.
        • Jafri H.
        • et al.
        Do observational studies using propensity score methods agree with randomized trials? A systematic comparison of studies on acute coronary syndromes.
        Eur Heart J. 2012; 33: 1893-1901
        • Hemkens L.G.
        • Contopoulos-Ioannidis D.G.
        • Ioannidis J.P.
        Agreement of treatment effects for mortality from routinely collected data and subsequent randomized trials: meta-epidemiological survey.
        BMJ. 2016; 352: i493
        • Kitsios G.D.
        • Dahabreh I.J.
        • Callahan S.
        • Paulus J.K.
        • Campagna A.C.
        • Dargin J.M.
        Can we trust observational studies using propensity scores in the critical care literature? A systematic comparison with randomized clinical trials.
        Crit Care Med. 2015; 43: 1870-1879
        • Kunz R.
        • Oxman A.D.
        The unpredictability paradox: review of empirical comparisons of randomised and non-randomised clinical trials.
        BMJ. 1998; 317: 1185-1190
        • Lonjon G.
        • Boutron I.
        • Trinquart L.
        • Ahmad N.
        • Aim F.
        • Nizard R.
        • et al.
        Comparison of treatment effect estimates from prospective nonrandomized studies with propensity score analysis and randomized controlled trials of surgical procedures.
        Ann Surg. 2014; 259: 18-25
        • Fraker T.
        • Maynard R.
        The adequacy of comparison group designs for evaluations of employment-related programs.
        J Hum Resour. 1987; 22: 194-227
        • LaLonde R.
        Evaluating the econometric evaluations of training programs with experimental data.
        Am Econ Rev. 1986; 76: 604-620
        • Steven G.
        • Dan M.L.
        • David M.
        Nonexperimental versus experimental estimates of earnings impacts.
        Ann Am Acad Pol Soc Sci. 2003; 589: 63-93
        • Cook T.D.
        • Shadish W.R.
        • Wong V.C.
        Three conditions under which experiments and observational studies produce comparable causal estimates: new findings from within-study comparisons.
        J Policy Anal Manage. 2008; 27: 724-750
        • Michalopoulos C.
        • Bloom H.
        • Hill C.
        Can propensity-score methods match the findings from a random assignment evaluation of mandatory welfare-to-work programs?.
        Rev Econ Stat. 2004; 86: 156-179
        • Hernán M.A.
        • Sauer B.C.
        • Hernandez-Diaz S.
        • Platt R.
        • Shrier I.
        Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses.
        J Clin Epidemiol. 2016; 79: 70-75
        • Danaei G.
        • Garcia Rodriguez L.A.
        • Cantero O.F.
        • Logan R.W.
        • Hernán M.A.
        Electronic medical records can be used to emulate target trials of sustained treatment strategies.
        J Clin Epidemiol. 2018; 96: 12-22
        • Emilsson L.
        • Garcia-Albeniz X.
        • Logan R.W.
        • Caniglia E.C.
        • Kalager M.
        • Hernán M.A.
        Examining bias in studies of statin treatment and survival in patients with cancer.
        JAMA Oncol. 2018; 4: 63-70
        • Garcia-Albeniz X.
        • Hsu J.
        • Bretthauer M.
        • Hernán M.A.
        Effectiveness of screening colonoscopy to prevent colorectal cancer among medicare beneficiaries aged 70 to 79 years: a prospective observational study.
        Ann Intern Med. 2017; 166: 18-26
        • Zhang Y.
        • Young J.G.
        • Thamer M.
        • Hernán M.A.
        Comparing the effectiveness of dynamic treatment strategies using electronic health records: an application of the parametric g-formula to anemia management strategies.
        Health Serv Res. 2017; ([Epub ahead of print])
        • Pearl J.
        • Bareinboim E.
        External validity: from do-calculus to transportability across populations.
        Stat Sci. 2014; 29: 579-595
      1. Pearl J, Bareinboim E. Transportability of causal and statistical relations: a formal approach. In: IEEE 11th International Conference on Data Mining Workshops, December 11, 2011; Vancouver, Canada, 540–547. Available at: https://ieeexplore.ieee.org/document/6137426/. Accessed February 28, 2018.

        • Rothman K.J.
        • Gallacher J.E.
        • Hatch E.E.
        Why representativeness should be avoided.
        Int J Epidemiol. 2013; 42: 1012-1014
        • Cole S.R.
        • Stuart E.A.
        Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 trial.
        Am J Epidemiol. 2010; 172: 107-115
        • Hartman E.
        • Grieve R.
        • Ramsahai R.
        • Sekhon J.S.
        From SATE to PATT: combining experimental with observational studies to estimate population treatment effects.
        J R Stat Soc Ser A Stat Soc. 2015;
        • O’Muircheartaigh C.
        • Hedges L.V.
        Generalizing from unrepresentative experiments: a stratified propensity score approach.
        J R Stat Soc Ser C Appl Stat. 2014; 63: 195-210
      2. Dahabreh I, Robertson S, Stuart E. Hernán MA. Extending inferences from randomized participants to all eligible individuals using trials nested within cohort studies. Available at: https://arxiv.org/abs/1709.04589; 2017. Accessed February 28, 2018.

        • Bang H.
        • Robins J.M.
        Doubly robust estimation in missing data and causal inference models.
        Biometrics. 2005; 61: 962-973
        • Zhang Z.
        • Nie L.
        • Soon G.
        • Hu Z.
        New methods for treatment effect calibration, with applications to non-inferiority trials.
        Biometrics. 2016; 72: 20-29
        • Legro R.S.
        • Brzyski R.G.
        • Diamond M.P.
        • Coutifaris C.
        • Schlaff W.D.
        • Casson P.
        • et al.
        Letrozole versus clomiphene for infertility in the polycystic ovary syndrome.
        N Engl J Med. 2014; 371: 119-129
        • Dahabreh I.J.
        • Trikalinos T.A.
        • Kent D.M.
        • Schmid C.H.
        Heterogeneity of treatment effects.
        in: Morton S.C. Gatsonis C. Methods in comparative effectiveness research. Chapman and Hall/CRC, New York2017: 257-263
        • Tian L.
        • Zhao X.
        Statistical methods for personalized medicine.
        in: Lu Y. Fang J. Tian L. Jin H. Advanced medical statistics. 2nd ed. World Scientific, Singapore2015: 79-102
        • Hayward R.A.
        • Kent D.M.
        • Vijan S.
        • Hofer T.P.
        Multivariable risk prediction can greatly enhance the statistical power of clinical trial subgroup analysis.
        BMC Med Res Methodol. 2006; 6: 18
        • Kent D.M.
        • Rothwell P.M.
        • Ioannidis J.P.
        • Altman D.G.
        • Hayward R.A.
        Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal.
        Trials. 2010; 11: 85
        • Kent D.M.
        • Nelson J.
        • Dahabreh I.J.
        • Rothwell P.M.
        • Altman D.G.
        • Hayward R.A.
        Risk and treatment effect heterogeneity: re-analysis of individual participant data from 32 large clinical trials.
        Int J Epidemiol. 2016; 45: 2075-2088
        • Orellana L.
        • Rotnitzky A.
        • Robins J.M.
        Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, Part I: main content.
        Int J Biostat. 2010; 6 (article 8)
        • Robins J.
        • Orellana L.
        • Rotnitzky A.
        Estimation and extrapolation of optimal treatment and testing strategies.
        Stat Med. 2008; 27: 4678-4721
        • Zhang B.
        • Tsiatis A.A.
        • Laber E.B.
        • Davidian M.
        A robust method for estimating optimal treatment regimes.
        Biometrics. 2012; 68: 1010-1018
        • Ogburn E.L.
        Comment.
        J Am Stat Assoc. 2016; 111: 1534-1537