The advocates of big data sometimes claim that, given enough data, tedious epidemiological concerns about data that is not missing at random, other causes of bias or confounding will become irrelevant, so that randomised trials will become unnecessary. They continue that by analysing routine data and correlating outcomes with therapies, they will be able to reliably conclude whether any given drug is responsible for patient benefits or side effects. They might also point out – as did FDA authors in a recent NEJMed article – that using “Real World Evidence” (RWE) based on routine data could make studies faster, cheaper and more pragmatic, with improved generalisability to routine clinical practice. .
However, sadly many big data advocates are unaware of serious issues like confounding by indication, in which the (usually unconscious) prejudice of clinicians leads to biased prescribing of new, expensive or risky drugs to those patients with the best – or the worst – prognosis . While a technique called propensity scoring  can sometimes reduce the size of this bias it will rarely abolish it, so the presence of unrecorded confounders (like a subtle difference in performance status between cancer patients who receive a new drug versus those who do not) is often sufficient to explain the measured outcome differences. So, on those grounds, RCTs appear to retain their status as the gold standard design to answer causal questions.
However, some new econometric methods for estimating causality with routine data are emerging which could occasionally help make sense of RWE. For example, instrumental variable (IV) methods (sometimes called Mendelian randomisation) look for an “instrument” that usually determines whether a treatment is given . An example is when trying to estimate the benefit of bone marrow transplant (BMT) in acute myeloid leukaemia in children – fortunately this is a very rare condition, but it makes a randomised trial infeasible. However, it has been argued that one can simply compare mortality in this condition between children with and without a living sibling, since nearly all children with this leukaemia and a live sibling will get a BMT and those without a sibling are much less likely to have a successful transplant. Since the presence of a living sibling is unrelated to whether a child has AML, this is almost as good as an RCT for determining whether BMT is effective . However, in more typical cases, the major challenge in designing an IV study is to identify an instrument that fulfils the following essential criteria: it usually dictates whether the therapy is given, does not affect the outcome except via the therapy, and is not correlated in any way with the outcome, or with other factors that cause it (eg. via reverse causation ).
One kind of IV design which may be more frequently applicable is the regression discontinuity (RD) design. Here, the IV is a continuous score or lab test result associated with a relevant practice guideline. Above the threshold value of this allocation variable, the therapy is usually given according to the guideline, below this threshold it is usually omitted . With cholesterol, for example, many clinicians use a 5 mMol threshold as the prompt for prescribing a statin. If you can find a continuous variable like this that determines therapy allocation, you can then move to the next stage of the RD design: compare the outcomes in patients whose value was just below the threshold (a cholesterol of 4.9-4.99, for example) with those with a value just above it (5.01-5.1, for example). The only difference between these patients is their lab result, which is subject to minute to minute physiological variation and also to random measurement error – so if you measure it again next week they might well fall the other side of the threshold. This closely resembles the random allocation mechanism used in RCTs, so comparison of outcomes between these two groups can give us a useful estimate of the drug effect size. However, you will need a large cohort of patients to provide enough near to the threshold to obtain narrow confidence intervals, and will probably also need to correct for changes in underlying risk related to slight differences in the allocation variable .
This all sounds very promising, so in collaboration with the Universities of Edinburgh and Lausanne we recently carried out a feasibility assessment of using RDD in a cohort of 45,000 Scottish women with breast cancer to estimate the effectiveness of chemotherapy in the subset aged over 70 . These women have previously been excluded from RCTs and an RCT that tried to examine this question in that age group only recruited only 6 participants. The opportunity for RDD here arises from the fact that most UK oncologists since 2008 use the NHS Predict score to calculate the probability that a woman with breast cancer will benefit from chemotherapy and then use that to inform their chemotherapy decision. So in theory, if we compare outcomes in a group of women with NHS Predict scores just below and just above the NICE chemotherapy treatment threshold, we should be able to reliably estimate the effectiveness of chemotherapy in this “untriallable” group. However, this is where real life steps in: when our colleagues plotted the probability of a woman receiving chemotherapy against NHS Predict score there was no discontinuity at the thresholds, and a significant proportion of women received chemotherapy even at a 1% probability of benefit. So, in this promising and important scenario, RDD was unfortunately not feasible .
So, to summarise our conclusions about using RWD, simple propensity scoring is usually possible but often unconvincing, while some newer techniques such as IV and RDD may be more convincing but are often unfeasible. This leads us back to where we started, with the RCT. Interestingly, the RCT can be framed as a robust and widely applicable variant of IV, which uses a random number as the instrument to allocate patients to the therapy. So, the next time a Big Data advocate asks why you don’t use their methods, you can say that you already do, but that your variant of the IV design is more robust and widely applicable than theirs.
Jeremy Wyatt, Professor of Digital Healthcare
Director, The Wessex Institute
1. Sherman RE, Anderson SA, Dal Pan GJ, Gray GW, Gross T, Hunter NL, LaVange L, Marinac-Dabic D, Marks PW, Robb MA, Shuren J, Temple R, Woodcock J, Yue LQ, Califf RM. Real-World Evidence – What Is It and What Can It Tell Us? N Engl J Med. 2016 Dec 8;375(23):2293-2297.
2. McMurry TL1, Hu Y2, Blackstone EH3, Kozower BD4. Propensity scores: Methods, considerations, and applications in the Journal of Thoracic and Cardiovascular Surgery. J Thorac Cardiovasc Surg. 2015 Jul;150(1):14-9. doi: 10.1016/j.jtcvs.2015.03.057. Epub 2015 Apr 2.
3. Davey Smith, D. Capitalizing on Mendelian randomization to assess the effects of treatments. J Roy Soc Med 2007; 100(9): 432–435 ttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC1963388/
4. Streeter AJ1, Lin NX2, Crathorne L3, Haasova M4, Hyde C5, Melzer D6, Henley WE7. Adjusting for unmeasured confounding in nonrandomized longitudinal studies: a methodological review. J Clin Epidemiol. 2017 Jul;87:23-34. doi: 10.1016/j.jclinepi.2017.04.022. Epub 2017 Apr 28.
5. Venkataramani AS, Bor J, Jena AB. Regression discontinuity designs in healthcare research. BMJ. 2016 Mar 14;352:i1216. doi: 10.1136/bmj.i1216. Review. No abstract available.
6. Ewan Gray, Joachim Marti, David H Brewster, Jeremy C Wyatt, Romain Piaguet-Rossel and Peter S Hall. Feasibility and results of four real-world evidence methods for estimating the effectiveness of adjuvant chemotherapy in early stage breast cancer. Submitted to J Clin Epidemiol September 2018.