by Scott Aberegg, MD, MPH
It is rare occasion that one article allows me to review so many aspects of the epistemology of medical evidence, but alas Schortgen et al afforded me that opportunity in the May 15th issue of AJRCCM.
The issues raised by this article are so numerous that I shall make subsections for each one. The authors of this RCT sought to determine the effect of external cooling of febrile septic patients on vasopressor requirements and mortality. Their conclusion was that “fever control using external cooling was safe and decreased vasopressor requirements and early mortality in septic shock.” Let’s explore the article and the issues it raises and see if this conclusion seems justified and how this study fits into current ICU practice.
PRIOR PROBABILITY, BIOLOGICAL PLAUSIBILITY, and BIOLOGICAL PRECEDENTS
These are related but distinct issues that are best considered both before a study is planned, and before its report is read. A clinical trial is in essence a diagnostic test of a hypothesis, and like a diagnostic test, its influence on what we already know depends not only on the characteristics of the test (sensitivity and specificity in a diagnostic test; alpha and power in the case of a clinical trial) but also on the strength of our prior beliefs. To quote Sagan [again], “extraordinary claims require extraordinary evidence.” I like analogies of extremes: no trial result is sufficient to convince the skeptical observer that orange juice reduces mortality in sepsis by 30%; and no evidence, however cogently presented, is sufficient to convince him that the sun will not rise tomorrow. So when we read the title of this or any other study, we should pause to ask: What is my prior belief that external cooling will reduce mortality in septic shock? That it will reduce vasopressor requirements?
To support a prior probability that a therapy will be effective, we can look beyond our general beliefs on the topic to two other kinds of evidence. First, what other studies have been done on this topic that may inform our prior? Second, what is our general expectation of success in this line of inquiry based on similar lines of inquiry in this field? To answer the first question, we can usually refer to the article introduction to see what prior studies are referenced by the authors in support of their hypothesis. In this case, references 13-19 are provided as support. The most informative of these is the Bernard et al study of ibuprofen in sepsis which showed that ibuprofen reduced fever but not mortality (ibuprofen mortality 37%, placebo 40%; n=455, P=0.26). The study was designed with a power of 80% to detect a reduction in mortality from 30% to 19.5% by enrolling 525 patients. In order to show a statistically significant mortality reduction from 40% to 37% with 80% power, 4200 patients would have to be enrolled in EACH GROUP. Obviously such a study is financially unfeasible, and indeed finding a 3% benefit would not be useful to many clinicians (see Aberegg et al The Influence of Treatment Effect Size on Willingness to Adopt a Therapy). The other references provided by Schortgen et al do little to inform the prior, in my opinion, but I defer to interested readers to investigate for themselves.
To address the second question about our expectation of success in this field of inquiry, we have to take a broader view of research in critical care medicine, where precious few therapies reduce mortality (see Aberegg et al: Delta Inflation: A Bias in the Design of Trials in Critical Care Medicine). Indeed, many therapies that were initially shown to reduce mortality in critical illness failed to do so on subsequent investigation, such as intensive insulin therapy, steroids for septic shock, drotrecogin-alfa for severe sepsis, to name just a few prominent examples. This is in contrast to a field such as cardiology, where the odds of success are much higher (think of the effects of anti-platelet agents, cholesterol-lowering therapies, and PCTI on combined endpoints.) Suffice it to say that a bet against a therapy for a critical illness where the primary endpoint is mortality reduction is a good bet, especially taking the long view (analogous to the long view in the stock market – 10 years in the future rather than 10 months.)
Finally, I wish to emphasize that it is my view that “biological plausibility” should take a back seat to prior probability as it has been described above. Biological plausibility refers to whether an association (such as external cooling reducing mortality or smoking causing lung cancer) has conceptual support from basic biology and biological reasoning. While I think this is important, I also think it is far too easy to rationalize any phenomenon in terms of biological reasoning. Interested readers should investigate the Provenge (Sipuleucel-T) saga. But before you do, answer this question: what is the probability that, in a patient with prostate cancer, you can extract peripheral blood dendritic cells via leukopheresis, incubate them with a fusion protein containing prostatic acid phosphatase and GM-CSF, then reinfuse the cells into the patient and reduce prostate cancer mortality?
Rather than biological plausibility, I propose we focus onBIOLOGICAL PRECEDENT. That is, is there a precedent for a therapy that works in the way that this therapy is proposed to work. In the case of external cooling I would propose we ask: “Is there a precedent for believing that manipulation of core body temperature in an adult patient with a[ny] disease will lead to a reduction in mortality?” When you read about administration of folate, fish oil, omega-3 fatty acids, vitamin C, vitamin E, vitamin D, calcium, indeed any vitamin, mineral, or micronutrient for ANY disease process, ask yourself: “Is there any precedent for such a substance having a consistent and meaningful outcome improvement in a patient population without deficiency?” Taking a broader view and focusing on biological precedent rather than biological plausibility may save us from distracting rationalizations based on incompletely understood biology regarding therapies that have appeals for non-scientific reasons (e.g., people like to reduce fever and to take [and give] vitamins and minerals – it just feels like the right thing to do – and myriad rationalizations can be quickly mobilized to support the “right thing”).
PRIMARY OUTCOME SELECTION AND INTERPRETATION
The primary outcome for the study was the number of patients with a 50% decrease in the baseline vasopressor dose after 48 hours. This outcome was chosen appropriately on the basis of a pilot study (that I don’t have access to). Secondary outcomes were numerous but included the 50% reduction at 2, 12, 24, and 36 hours; and mortality at day 14, ICU discharge, and hospital discharge. Now here’s a crux of the study design that becomes a crux of the study interpretation.
The essence of a randomized trial is to demonstrate, beyond a reasonable doubt, that the effect being sought is a real effect, not one due to chance or bias. That is why we randomize; why we dismiss results where P>0.05; why we test the null hypothesis. But there is an inherent tension, a conflict here. Investigators want rigorous science, but they also want positive results. This conflict of interest SPEAKS VOLUMES about what the investigators thought was their best shot for a positive result. If these investigators thought that vasopressor dose at 24 hours, or resolution of shock, or mortality at day xyz was most likely to be positive, you can be darned sure that they would have selected that as the primary endpoint. They want good science, but they want positive results. So, I am completely unmoved by after-the-fact hemming and hawing about how “we selected the wrong endpoint, we should have selected abc rather than xyz.” They know the preliminary data; they have the best familiarity with the problem being addressed; they have control over the experimental design. The best shot is the one they chose.
And this is important because if you and I are in Vegas and I bet you I’m gonna roll snake eyes on the next roll (probability is 0.027), you’re not going to count it if I roll it several rolls later. Same thing with primary endpoints and secondary endpoints. If you can predict the future, do it on the first try. This is not palm reading, this is science and the pursuit of the truth.
This has received a fair amount of attention in the PLATO trial of ticagrelor versus clopidogrel in patients with acute coronary syndromes. There was an interaction between the treatment effect and North American versus other centers. Is there something going on here? No explanations are obvious and this was one of 33 variables tested for interaction in this study. To know if the result is true, we have to know the prior probability. The prior probability of this being a true result is near zero. How do I know? Because if there were any hint that this result would come about, any prior information to suggest it, the authors would not have included North American patients in this study because they would have diluted the effect, or they would have at least made the comparison more prominent among the 33 subgroup analyses. So, always ask: “If this is an important and previously known subgroup, why did you not focus on it prospectively?” If the NETT investigators had known beforehand that upper lobe predominance and low baseline exercise capacity were the two key subgroups, wouldn’t they have focused their efforts on those subgroups so that the overall trial would have been “positive”?
What is the relevance of a 50% reduction in vasopressor requirements at 48 hours anyway? Who really cares? Does the patient care? S/he’s either on pressors in the ICU, or s/he’s not. He’s either got a central line, or he doesn’t. He’s either still sick, or he’s not. For my own part, I would actually prefer more pressors to a cooling blanket, for a variety of reasons. In any case, it doesn’t seem like the result would have been useful EVEN IF the primary endpoint had been positive, because you’re just exchanging some of one cumbersome treatment for some of another one. Sounds like the proverbial “sixes” to me. And it reminds me of this study of tPA in submassive pulmonary embolism where tPA now saves you from tPA later – in this case, cooling for 48 hours saves you from several milligrams of levophed during that 48 hours. Great.
THE CONVERGING KAPLAN-MEIER CURVE AND OTHER DETAILS
Another example of cherry-picking among secondary endpoints is found in the presentation of the K-M curve for 14-day mortality. Table 2 shows that there is a steady diminution of the 14-day mortality difference as you move to ICU and hospital discharge. Thus the question of relevance is again raised – if you die by discharge, does it matter if you were alive at day x, y, or z? Or does it mean that your death was just “prolonged” in the ICU before you died?
Note that spurious results are possible because of lack of blinding in the study, as well as the baseline differences in pressor drugs and doses (and the difficulty with comparisons between pressors), the latter which is particularly problematic given that pressor dose reduction was the primary endpoint.
SIR WILLIAM OSLER AND THE SCOURGE OF FEVER
The article begins with a quote from Osler, opining that fever is more terrible than famine and war. I think this is a poignant quote, and its inclusion reminds me of the main reason that, taking the “long view”, I am willing to wager (at substantial odds) that external cooling is of no benefit in this or most other diseases.
I take this view because I see fever as one of several epiphenomena of the disease we call sepsis, which keeps the company of other epiphenomena such as diaphoresis, leukocytosis, electrolyte disturbances, myalgias, delirium, etc. And I reason that if the sepsis is treated specifically with antibiotics and measures that support failing organs (such as oxygen and fluids), that the epiphenomena will resolve as the underlying process resolves. If I come home from vacation and find my tomato plants brown, wilted, and droopy from lack of water during the summer drought, I don’t busy myself propping up the branches, and spraying the leaves with green chlorophyll solution – I water the plant, and I wait on the epiphenomena to improve as the underlying problem is corrected by the natural biological mechanisms of the plant.
And this, I wager, is why Osler so feared fever – because without antibiotics, he could not water his dying plants and was forced to look on in horrified impotence as the primary disease with all its epiphenomena raged on and ravaged his patients’ bodies unchecked.
Scott Aberegg, MD, MPH, blogs at the Medical Evidence Blog.