Lucassen et al sharpened their pencils and tried to combine in a meta-analysis 52 studies (n=55,268) that examined the success of methods of using "gestalt" (subjective impression) or clinical decision rules (Wells, Geneva or revised Geneva scores) to diagnose acute pulmonary embolism.
The punchline (and their unstated but implied conclusion) is, we just can't safely use these clinical decision rules for any patient above low-probability for a pulmonary embolism. There are so many limitations to concluding anything more than that from these trials, or this analysis.
- The studies had widely varying prevalence of pulmonary embolism in their population of subjects. This affected strongly the performance of both gestalt and clinical decision rules as diagnostic tests. But you don't know the prevalence of PE in your population when you see the patient. Or whether the prevalence at your institution is anywhere near what's observed in any of these studies you read. The fact that observed prevalences vary in these studies implies, a priori, a standard rule cannot successfully be applied across institutions.
- Many of these studies had follow-up on the order of 3 months. If a PE hadn't occurred or become apparent by that time, the person was considered healthy. They justify this by comparing their observed 3-month rates of pulmonary embolism against historically reported 3-month rates. But, that's just not good enough, given the stakes here. A small pulmonary embolus could easily evade detection by a clinical decision rule and be followed by a deadly one 6 months later; that's the natural history of the condition. You should have at least a 1-2 year follow-up to pronounce people who were never imaged "PE-free", and none of the studies are described as having follow-up that long.
- Two-thirds of the studies used algorithms that resulted in some of the patients being imaged, while others were not (they were pronounced PE-free after some months of follow-up) -- but they were all analyzed together. Related to the point above, you simply don't know how many PEs you missed in the non-imaged group. You can't aggregate together people deemed negative by follow-up and people who had negative CT-angiograms, a highly sensitive imaging study. (Or, you can, but I won't believe it.)
- One-third of the studies did not report any imaging test results that were indeterminate, only those that were high or low probability.
- The studies were so heterogeneous, the paper has this statement to explain their methods: "We used a bivariate model for diagnostic meta-analysis to obtain summary estimates of sensitivity and specificity. The bivariate approach simultaneously models pairs of logit-transformed sensitivity and specificity from studies, thereby incorporating any correlation that might exist between sensitivity and specificity. It also uses a random-effects approach for both sensitivity and specificity, which allows for heterogeneity beyond chance due to clinical and methodological differences between studies. Data from the different rules were analyzed in a single model to obtain summary estimates of sensitivity and specificity for each rule, and we subsequently tested whether these summary values significantly differed from each other. Our model was fitted to allow the between-study variation in sensitivity and specificity to differ across rules." Can someone bring me an aspirin, and a statistics degree?
There's only one way the authors could make the rules sufficiently safe for ruling out PE: combining a "negative" result on gestalt or clinical decision rule with a negative D-dimer. The problem, as always, being the rarity of truly negative D-dimer tests in acute-on-chronically ill people who've made their way to a hospital.
Which takes us back to where we started: Lost. Confused. Unsure what to do for this sort-of-sick looking, kind-of-fast-breathing person seeking our counsel and care. Secretly grateful and relieved that the ED physicians are ordering so many "excessive" CT-angiograms. Until there is a PIOPED-ish randomized trial that compares clinical decision rules to a convincing gold standard (i.e., an algorithm that includes imaging on every patient), with follow-up of 12-24 months and/or ultrasound and D-dimer at 6 or 12 months, the plan for evaluating patients at intermediate and higher probability for PE is simple, to me:
Image them! Image them all!!
(that could be ultrasound of the legs, though)
Lucassen W et al. Clinical Decision Rules for Excluding Pulmonary Embolism: A Meta-analysis. Ann Intern Med 2011;155(7):448-460.