With COVID-19 coronavirus testing results continuing to dominate the news, many Americans are getting a crash course in the limits of statistical analysis, whether they wanted it or not.
For example, a study by Stanford University researchers recently made a large splash in the news because it found that "between 48,000 and 81,000 people in Santa Clara County alone may already have been infected by the coronavirus by early April — that's 50 to 85 times more than the number of official cases at that date."
Those numbers are alarming because through Saturday, 18 April 2020, 28,963 people in the entire state of California had been officially tested positive for having been infected by the SARS-CoV-2 coronavirus. If double that many people had already been infected in just Santa Clara County alone, which is consistent with the lower end estimate of the Stanford study, that result would directly affect how medical resources are being allocated within the state.
For example, with such a high number of previous infections compared to the number of deaths attributed to COVID-19, it would be evidence the coronavirus is far less deadly than previous analysis indicated, which could lead public health officials to dial back their efforts to limit the rate of growth of new infections.
But if the newer, headline-grabbing analysis is incorrect, that would be a very wrong action to take. If the incidence of COVID-19 infections is really much less, with the same death toll, more serious action would be needed to mitigate the potentially fatal infection.
That leads to the question of how the Stanford researchers collected their data and did their analysis. Here's an excerpt from their preprint paper:
On 4/3-4/4, 2020, we tested county residents for antibodies to SARS-CoV-2 using a lateral flow immunoassay. Participants were recruited using Facebook ads targeting a representative sample of the county by demographic and geographic characteristics. We report the prevalence of antibodies to SARS-CoV-2 in a sample of 3,330 people, adjusting for zip code, sex, and race/ethnicity. We also adjust for test performance characteristics using 3 different estimates: (i) the test manufacturer's data, (ii) a sample of 37 positive and 30 negative controls tested at Stanford, and (iii) a combination of both. Results The unadjusted prevalence of antibodies to SARS-CoV-2 in Santa Clara County was 1.5% (exact binomial 95CI 1.11-1.97%), and the population-weighted prevalence was 2.81% (95CI 2.24-3.37%).
We're going to focus on the unadjusted infection rate of 1.5%, since that's the figure they directly determined from their 3,330 solicited-via-Facebook-ad sample of Santa Clara County's 1.9 million population, in which 50 tested positive for having SARS-CoV-2 antibodies.
That's the researchers' reported rate of COVID-19 infections in Santa Clara County, but what's the true rate?
Estimating that value requires knowing what the testing's rate of both false positives (people who aren't really infected but who have test results indicating they are), and false negatives (people who really are infected, but whose test results gave them an 'all-clear'). Alex Tabarrok explains how to do the math to estimate the true infection rate if you know those figures:
The authors assume a false positive rate of just .005 and a false negative rate of ~.8. Thus, if you test 1000 individuals ~5 will show up as having antibodies when they actually don’t and x*.8 will show up as having antibodies when they actually do and since (5+x*.8)/1000=.015 then x=12.5 so the true rate is 12.5/1000=1.25%, thus the reported rate is pretty close to the true rate. (The authors then inflate their numbers up for population weighting which I am ignoring). On the other hand, suppose that the false positive rate is .015 which is still very low and not implausible then we can easily have ~15/1000=1.5% showing up as having antibodies to COVID when none of them in fact do, i.e. all of the result could be due to test error.
In other words, when the event is rare the potential error in the test can easily dominate the results of the test.
To illustrate that point, we've built the following tool to do that math, where we've standardized the various rates to be expressed as percentages of the sampled population, which we've also standardized at 1,000 individuals. If you're accessing this article on a site that republishes our RSS news feed, please click through to our site to access a working version of the tool.
Considering Alex' two scenarios, the tool confirms the results of a true infection rate of 1.25% for the default values, while increasing the false positive rate from 0.5% to 1.5% returns a true incidence rate of 0.00%, which indicates that all the reported positives could indeed be the result of false positive test results within the sampled population.
The ACSH's Chuck Dinerstein reviewed the study and discusses how the rate of false positives among the raw sample would affect the statistical validity of the report's population-adjusted findings:
The test for antibodies, seropositivity, is new, and the sensitivity and specificity of the test are not well calibrated. The researchers tested the test against know positive and negative serum (based on a more accurate nucleic acid amplification test – the swab in the nose or collected before the COVID-19 outbreak) among their patients and made use of the sensitivity and specificity studies done by the manufacturer on similar clinically confirmed COVID-19 patients and controls. While the specificity, false positives, with high and very similar for both test conditions, the sensitivity, the false negatives, were much higher in the local serum as compared to the data from the manufacturer. The researchers used blended false positives and negatives in their calculations.
- Fewer false positives lowered the calculated incidence of COVID-19 to 2.49%.
- Higher false positives increased the calculation to 4.16%.
- Their blended value gave the 50-fold increase being reported.
In that thoughtful "peer-review," it is pointed out that the false-positive rate may be higher, it depends on how it is calculated. As that false positive rate increases, given their sample size, the conclusion may become lost in statistical uncertainty. In other terms, in the 50 positive tests, there might be 16 to as many as 40 false-positive results.
Before we go any further in this discussion, we should note the Stanford team's results are being torn apart by statisticians, including Luis Pedro Coelho and Andrew Gelman, the latter of whom is demanding the study's authors provide an apology for wasting everybody's time with what he describes in detail as avoidable statistical errors. Gelman also has this to say about Stanford University's credibility:
The authors of this article put in a lot of work because they are concerned about public health and want to contribute to useful decision making. The study got attention and credibility in part because of the reputation of Stanford. Fair enough: Stanford's a great institution. Amazing things are done at Stanford. But Stanford has also paid a small price for publicizing this work, because people will remember that "the Stanford study" was hyped but it had issues. So there is a cost here. The next study out of Stanford will have a little less of that credibility bank to borrow from. If I were a Stanford professor, I'd be kind of annoyed. So I think the authors of the study owe an apology not just to us, but to Stanford.
Before we close, take a closer look at the image we featured at the top of this article, particularly the solution to the algebra problem that appears on the whiteboard. This image was snapped from Stanford University's promotional materials for its graduate school of education. It's not just rushed studies that Stanford professors have to be annoyed about coming out from the university these days.