Benford’s Law and the Trustworthiness of COVID-19 Case Counts

Benford's Law Leading Digit Distribution

Can you trust the numbers the U.S. government reports daily for the number of confirmed COVID-19 cases? Can you trust China's or Italy's figures? How about the case counts reported by Russia or other nations?

2020 has been a bad year for many people around the world, mainly because of the coronavirus pandemic and many governments' response to it, which has almost made COVID-19 as much a political condition as a viral infection. Among the factors that make it a political condition is the apparent motives of political leaders to justify their policies in responding to the pandemic, which raises questions of whether they are honestly reporting the number of cases their nations are experiencing.

Telling whether they are or not is where Benford's Law might be used. Benford's Law describes the frequency by which leading digits appear in sets of data where exponential growth is observed, as shown in the chart above. The expected pattern that emerges in data showing exponential growth over time according to Benford's Law is strong enough that significant deviations from it can be taken as evidence that non-natural forces, such as fraud or manipulation for political purposes are at play.

Economists Christoffer Koch and Ken Okamura wondered if the data being reported by China, Italy and the United States for their respective numbers of reported cases was trustworthy and turned to Benford's Law to find out. We won't keep you in suspense, they found that the growth of each nation's daily COVID-19 case counts prior to their imposing 'lockdown' restrictions were consistent with the expectations of Benford's Law, leading them to reject the potential for the data having been manipulated to benefit the interests of their political leaders. Here's the chart illustrating their findings from their recently published report:

Koch, Okamura: Figure 2. First Digit Distribution Pre-Lockdown number of confirmed cases in Chinese Provinces, U.S. States and Italian Regions

But that's only three countries. Are there any nations whose leaders have significantly manipulated their data?

A preprint study by Anran Wei and Andre Eccle Vellwock also found no evidence of manipulation in COVID-19 case data by China, Italy and the U.S., and extends the list of countries with trustworthy data to include Brazil, India, Peru, South Africa, Colombia, Mexico, Spain, Argentina, Chile, France, Saudia Arabia, and the United Kingdom. However, when they evaluated COVID-19 case data for Russia, they found cause for concern:

Results suggest high possibility of data manipulations for Russia's data. Figure 1e illustrates the lack of Benfordness for the total confirmed cases. The pattern resembles a random distribution: if we calculate the RMSE related to a constant proability of 1/9 for all first digits, it shows that the RMSE is 20.5%, a value lower than the one related to the Benford distribution (49.2%).

Wei and Vollock also find issues with Russia's COVID-19 data for daily reported cases and deaths. Here is their chart summarizing the results for total confirmed COVID-19 cases for each of the nations whose data they reviewed:

Wei and Vellwock. Figure 1. Total confirmed cases for (a) the whole world and (b-q) selected countries. The black curve refers to Benford's Law probability.

They also found issues with Iran's daily confirmed cases and deaths, but not enough to verify the nation's figures have been manipulated.

Previously on Political Calculations

References

Koch, Christopher and Okamura, Ken. Benford's Law and COVID-19 Reporting. Economics Letters. Volume 196, November 2020, 209573. DOI: 10.1016/j.econlet.2020.109573.

Wei, Anran and Vellwock, Andre Eccel. Is the COVID-19 data reliable? A statistical analysis with Benford's Law. [Preprint PDF Document]. September 2020. DOI: 10.13140/RG.2.2.31321.75365.