The new round of Best Hospitals 2013-14 rankings, published online July 16, stirred up more than the usual intensity of the expected annual scrutiny. A July 25 opinion piece in the Wall Street Journal was especially sharp. Ezekiel Emanuel, chairman of the Department of Medical Ethics and Health Policy at the University of Pennsylvania, questioned the validity of the rankings, and in a followup commentary July 29 on MSNBC's Morning Joe he called the rankings "pretty worthless."
We obviously disagree.
More to the point, many of the criticisms leveled by Emanuel, as well as by many others with strong feelings about the rankings, are premised on faulty assumptions. Whether that's willful or inadvertent doesn't matter. We need to do better at communicating the purpose of the rankings and the blueprint that defines them. We make the broad outlines of the rankings methdology public in the form of a detailed FAQ, which in turn includes a link to a detailed methodology report of 130 pages. But perhaps we are still burying the lead, as journalists say — sending consumers and others to lengthy explanations when short, clear messages would be better.
Here, then, are the major points raised in the Wall Street Journal and echoed elsewhere, other frequently voiced concerns, and our responses.
The methodology is flawed.
Response: Yes, it is. No conceivable methdology is immune from limitations of design and available data. We try to remedy weaknesses in our methodology by soliciting candid input from the hospital community, using the best metrics and data available to us and making changes when there's a strong case for doing so. The 2013-14 methodology report includes more than four pages of changes we have made just in the past six years.
Each one illustrates the reality that even as we strive to improve, each change has limitations and the result will be inherently imperfect. One change in the latest rankings is a case in point. We took advantage of a new version of a federal programming tool used by those who analyze hospitals' performance in certain patient safety measures. This update makes it possible to exclude patients who already had conditions when they were admitted that might have a negative impact on a hospital's patient-safety scores. Someone recovering from pneumonia, for example, would be at risk for respiratory problems after surgery, one of six types of events we tabulate. Removing these present-on-admission cases, or POAs, kept hospitals from being penalized unfairly. Many hospitals undoubtedly benefited from the change, with higher overall scores. But not all hospitals are diligent about identifying POA cases and coding them appropriately. The playing field, with regard to this measure, will not be level until they do. Lagging hospitals not only will derive no benefit from the change but will be penalized relative to hospitals that do a good job of taking these patients out of the Best Hospitals equation.
Does this represent a flaw in the methodology? We don't think so. No measure is perfect. Compliance with any measure will never reach 100 percent. Moreover, the fact of the change is likely to make some hospitals pay more attention to POA identification and coding, improving the quality of the data beyond their use in the rankings.
Best Hospitals doesn't identify the best hospitals.
Response: The rankings do not and cannot designate hospitals that are the best for every patient. A one-size-fits-all hospital does not exist. Patients vary enormously in the medical problems they present, the procedures they need and the risks they bring to the table. Emanuel and other critics ignore the often-stated mission of the rankings: to provide a tool for the relatively small percentage of patients who, first, need a much higher level of medical care than most community hospitals can provide and, second, have time to look around for the best source. We want to speak to those with rare or hard-to-treat cancers, those who are very old or physically ailing and those with multiple conditions for whom care is never "routine."
It's basically a ranking of teaching hospitals.
Response: Even a modest-sized community hospital with no academic affiliation can provide outstanding quality in cardiac care or another service. One of the ground rules laid down in 1990, when Best Hospitals was created, was that the methodology should be inclusive, not exclusive. Accordingly, almost any hospital of some size has the potential to be recognized; teaching or nonteaching status is not even one of the measures in the methodology. Our starting pool comprises all 4,806 U.S. hospitals, other than government and institutional facilities such as prison units for which data are unavailable. From there, a hospital could meet any one of four criteria to move on to consideration in the 12 data-driven specialties: teaching status, affiliation with a medical school, 200 beds or more, or 100 beds plus availability of four or more specific types of medical technology. This year 2,262 hospitals, or 47 percent of the initial number, met the test.
The topmost reaches of the rankings in the 16 ranked specialties are dominated by major academic medical centers. But that's to be expected when the goal is to identify hospitals that do best with the most difficult patients. AMCs are where community hospitals send those patients and where doctors research and practice the latest techniques and incorporate the latest findings.
Hospitals can game the system to boost their scores.
Response: Emanuel cites "perverse incentives" to cheat, but offers no evidence that hospitals have done so. Over the 24-year-lifespan of the rankings, a single effort at overt manipulation of the methodology has surfaced. For many years we counted only full-time nurses in computing nurse-to-patient ratios; one year a hospital reported that it had zero part-timers. The computer flagged the entry, inquiries were made and the hospital's nursing numbers were reverted to the prior year, when there had been numerous part-time nurses.
We know that some hospitals try to influence physicians who might be contacted for the annual survey of specialists we conduct to determine which hospitals they they consider to be the best in their area of expertise for the most challenging patients. Hospitals launch internal and external email campaigns, send out snail mail and offer handouts at medical meetings to raise the profile of their institution. Does it work? We suspect that it has limited or no impact given the size of the specialist universe (and the resistance of doctors at being told how to think).
U.S. News shouldn't be measuring reputation at all. It's just a popularity contest. Most physicians don't know much about the care delivered at local hospitals, let alone at those across the country, so surveyed physicians tend to rely on rankings.
Response: The physician survey is a proxy for the process of care within a hospital — the quality of caregivers' decisions and their execution at every point from admissions to diagnostics to treatment choices to medication management. Do doctors adhere to accepted guidelines? What protects patients from medication errors? Are handoffs accomplished with adequate communication and speed? Until good process data for thousands of hospitals are made available, the choice is to rely either on data that are selective, incomplete, noisy or error-prone (sometimes all four) or on a proxy. We believe that responsible specialists plug into extensive networks of other specialists in seeking the best care for the most challenging patients wherever it might be located. The late Bernadine Healy, a cardiologist and director of the National Institutes of Health before coming to U.S. News as health editor, used to call the physician survey a form of peer review.
The physician survey sample is too small and not generalizable.
Response: We survey 200 physicians in each of 16 specialties and average their responses over three years. The response rate for the latest three years was between 34.0 percent and 39.5 percent across all specialties. That is a gratifying return for a survey of busy doctors and higher than the "around 30 percent" stated in the Wall Street Journal. The physician survey has never claimed to be generalizable to all physicians. It is a survey of specialists, who are generally much more informed about the quality of care provided at specific hospitals across the country than a non-specialist would be.
Is the sample too small? RTI International, our Best Hospitals contractor, informed us late last year, when U.S. News looked into bumping up the sample size, that doing so would not be expected to have a large effect on the rankings. It might reduce year-to-year fluctuations in hospitals that get a small number of nominations, but such hospitals already have a decent chance of ranking based on other factors. In the new urology rankings, for example, 21 of the 50 nationally ranked hospitals had reputational scores of 2.5 percent or less. What put them in the rankings was hard data. Cedars-Sinai was No. 10 with a reputational score of 2.3 percent because of outstanding performances in mortality and patient safety.
Instead of surveying doctors, use other data.
Response: So far the only process measures that stand up, in our view, are the safety measures mentioned above. They are modified versions of a subset of the Patient Safety Index measures from the federal Agency for Health Research Quality. Together they add up to 5 percent of a hospital's score. Harm to patients is both an outcome and a failure of process, so the weighting of the reputational survey was reduced in 2009 when the safety measures were added.
What about the data posted by the federal government on its Hospital Compare website, such as hospital performance with heart attack and pneumonia patients?
Response: The purpose of Hospital Compare is entirely different than that of Best Hospitals. Most of the process-related data posted on Hospital Compare reflect care that matters to the majority of patients — not the relatively small percentage with extremely complex conditions who need sophisticated care. Consider two Hospital Compare measures: whether possible heart attack patients received an aspirin within 24 hours of admission and whether surgical patients had urinary catheters removed within two days. The bar in both cases is too wide and low to be useful for anyone who is looking for the best place for complex therapy. Trying to use Hospital Compare to help narrow down the list of possible hospitals for an upcoming surgery or other high-risk medical treatment will not be terribly helpful. All of the measures look at general quality rather than a specific domain of care.
Technology gets too much weight in scoring.
Response: The third part of the family of elements that go into Best Hospital scores, besides process and outcomes, is structure: how well a hospital is equipped and staffed to provide key services. The structural measures include advanced technology, number of nurses assigned to direct patient care, special certifications and accreditations, patient volume, and translators and other special services.
In scoring, 30 percent of the weight is given to structural measures. The Wall Street Journal article states that technology is "the main focus" of this group of measures. In fact, its overall share of hospital scores is 5 percent or less in all but two specialties, where it is marginally higher. Two nursing measures carry twice as much weight in most specialties.
Reasonable people can disagree about giving credit to hospitals for particular kinds of technology. After a lengthy investigation that included discussions with practitioners with a range of opinions at various hospitals, we included robotic surgery in five specialties. It is singled out by Emanuel as an example of an expensive technology that raises a hospital's score in urology but hasn't been proven in reliable, randomized clinical trials. We see part of our obligation, however, given our target audience of high-acuity patients, as trying to identify technologies and other care that make a strong case for benefiting such individuals. Such features are more likely to be found at centers that see more of such patients.
Where is readmission? Why isn't it included?
Response: We've wanted to include readmissions data for some time. The percentage of patients readmitted to a hospital within a month or so after they are discharged is potentially a very good performance yardstick. It not only indicates the quality of care received but also the adequacy of discharge instructions and follow-up. But the readmissions data on the Hospital Compare site are generic, based not on the challenging conditions and procedures we assess in Best Hospitals but on on all-cause readmissions, which does not fit terribly well into the rankings.
Why aren't data on hospital-acquired conditions included?
Response: Because many of them, such as data on infections, are incomplete and unreliable. We hope that will change. There's debate as to whether newly available federal data on pressure ulcers (bed sores) are sufficiently robust. If and when the data improve, we will certainly incorporate them into our methodology.
Here and generally, we wish, as Emanuel says he does, that more data were available. Each year, as we have done since the inception of the Best Hospitals rankings in 1990, we incorporate data that serve the purpose for which Best Hospitals is intended — guiding patients with exceptional needs to the best sources of exceptional care. We will continue to do so.
In fact, we are working to expand the reach and usefulness of Best Hospitals. Our regional rankings now list nearly 600 hospitals in most large metro areas and elsewhere that come close to meeting national ranking standards. As discussed in this space last fall, we are also assembling a methodology for guiding patients with more everyday needs. Our goal is to evaluate how well every hospital in the U.S. performs a comprehensive list of such routine procedures as knee replacements and heart bypass surgery. It won't be a perfect methodology, but it will be strong and useful. Remember the old days, when your doctor pointed you to a hospital and said, "Go there"?