Category Archives: quality

Of Storks, Statistics, and Cigarettes

More or Less presenter Tim Harford talks about deliberately misleading statistical analysis in the following Numberphile video. If you have nine minutes, you'll find why its essential to maintain a healthy skepticism of both claims and counterclaims based on statistical analysis.

If you're anywhere but the U.S. or Canada today, Tim's newest book, How to Make the World Add Up, is now available for sale. If you're in the U.S. or Canada, you'll have to wait until February 2021 to get a copy that will carry a different title: The Data Detective: Ten Easy Rules to Make Sense of Statistics, which can be pre-ordered at Amazon today.

Benford’s Law and the Trustworthiness of COVID-19 Case Counts

Benford's Law Leading Digit Distribution

Can you trust the numbers the U.S. government reports daily for the number of confirmed COVID-19 cases? Can you trust China's or Italy's figures? How about the case counts reported by Russia or other nations?

2020 has been a bad year for many people around the world, mainly because of the coronavirus pandemic and many governments' response to it, which has almost made COVID-19 as much a political condition as a viral infection. Among the factors that make it a political condition is the apparent motives of political leaders to justify their policies in responding to the pandemic, which raises questions of whether they are honestly reporting the number of cases their nations are experiencing.

Telling whether they are or not is where Benford's Law might be used. Benford's Law describes the frequency by which leading digits appear in sets of data where exponential growth is observed, as shown in the chart above. The expected pattern that emerges in data showing exponential growth over time according to Benford's Law is strong enough that significant deviations from it can be taken as evidence that non-natural forces, such as fraud or manipulation for political purposes are at play.

Economists Christoffer Koch and Ken Okamura wondered if the data being reported by China, Italy and the United States for their respective numbers of reported cases was trustworthy and turned to Benford's Law to find out. We won't keep you in suspense, they found that the growth of each nation's daily COVID-19 case counts prior to their imposing 'lockdown' restrictions were consistent with the expectations of Benford's Law, leading them to reject the potential for the data having been manipulated to benefit the interests of their political leaders. Here's the chart illustrating their findings from their recently published report:

Koch, Okamura: Figure 2. First Digit Distribution Pre-Lockdown number of confirmed cases in Chinese Provinces, U.S. States and Italian Regions

But that's only three countries. Are there any nations whose leaders have significantly manipulated their data?

A preprint study by Anran Wei and Andre Eccle Vellwock also found no evidence of manipulation in COVID-19 case data by China, Italy and the U.S., and extends the list of countries with trustworthy data to include Brazil, India, Peru, South Africa, Colombia, Mexico, Spain, Argentina, Chile, France, Saudia Arabia, and the United Kingdom. However, when they evaluated COVID-19 case data for Russia, they found cause for concern:

Results suggest high possibility of data manipulations for Russia's data. Figure 1e illustrates the lack of Benfordness for the total confirmed cases. The pattern resembles a random distribution: if we calculate the RMSE related to a constant proability of 1/9 for all first digits, it shows that the RMSE is 20.5%, a value lower than the one related to the Benford distribution (49.2%).

Wei and Vollock also find issues with Russia's COVID-19 data for daily reported cases and deaths. Here is their chart summarizing the results for total confirmed COVID-19 cases for each of the nations whose data they reviewed:

Wei and Vellwock. Figure 1. Total confirmed cases for (a) the whole world and (b-q) selected countries. The black curve refers to Benford's Law probability.

They also found issues with Iran's daily confirmed cases and deaths, but not enough to verify the nation's figures have been manipulated.

Previously on Political Calculations

References

Koch, Christopher and Okamura, Ken. Benford's Law and COVID-19 Reporting. Economics Letters. Volume 196, November 2020, 209573. DOI: 10.1016/j.econlet.2020.109573.

Wei, Anran and Vellwock, Andre Eccel. Is the COVID-19 data reliable? A statistical analysis with Benford's Law. [Preprint PDF Document]. September 2020. DOI: 10.13140/RG.2.2.31321.75365.

How Do False Test Outcomes Affect Estimates Of The True Incidence Of COVID-19?

Curses, FOILED again!

With COVID-19 coronavirus testing results continuing to dominate the news, many Americans are getting a crash course in the limits of statistical analysis, whether they wanted it or not.

For example, a study by Stanford University researchers recently made a large splash in the news because it found that "between 48,000 and 81,000 people in Santa Clara County alone may already have been infected by the coronavirus by early April — that's 50 to 85 times more than the number of official cases at that date."

Those numbers are alarming because through Saturday, 18 April 2020, 28,963 people in the entire state of California had been officially tested positive for having been infected by the SARS-CoV-2 coronavirus. If double that many people had already been infected in just Santa Clara County alone, which is consistent with the lower end estimate of the Stanford study, that result would directly affect how medical resources are being allocated within the state.

For example, with such a high number of previous infections compared to the number of deaths attributed to COVID-19, it would be evidence the coronavirus is far less deadly than previous analysis indicated, which could lead public health officials to dial back their efforts to limit the rate of growth of new infections.

But if the newer, headline-grabbing analysis is incorrect, that would be a very wrong action to take. If the incidence of COVID-19 infections is really much less, with the same death toll, more serious action would be needed to mitigate the potentially fatal infection.

That leads to the question of how the Stanford researchers collected their data and did their analysis. Here's an excerpt from their preprint paper:

On 4/3-4/4, 2020, we tested county residents for antibodies to SARS-CoV-2 using a lateral flow immunoassay. Participants were recruited using Facebook ads targeting a representative sample of the county by demographic and geographic characteristics. We report the prevalence of antibodies to SARS-CoV-2 in a sample of 3,330 people, adjusting for zip code, sex, and race/ethnicity. We also adjust for test performance characteristics using 3 different estimates: (i) the test manufacturer's data, (ii) a sample of 37 positive and 30 negative controls tested at Stanford, and (iii) a combination of both. Results The unadjusted prevalence of antibodies to SARS-CoV-2 in Santa Clara County was 1.5% (exact binomial 95CI 1.11-1.97%), and the population-weighted prevalence was 2.81% (95CI 2.24-3.37%).

We're going to focus on the unadjusted infection rate of 1.5%, since that's the figure they directly determined from their 3,330 solicited-via-Facebook-ad sample of Santa Clara County's 1.9 million population, in which 50 tested positive for having SARS-CoV-2 antibodies.

That's the researchers' reported rate of COVID-19 infections in Santa Clara County, but what's the true rate?

Estimating that value requires knowing what the testing's rate of both false positives (people who aren't really infected but who have test results indicating they are), and false negatives (people who really are infected, but whose test results gave them an 'all-clear'). Alex Tabarrok explains how to do the math to estimate the true infection rate if you know those figures:

The authors assume a false positive rate of just .005 and a false negative rate of ~.8. Thus, if you test 1000 individuals ~5 will show up as having antibodies when they actually don’t and x*.8 will show up as having antibodies when they actually do and since (5+x*.8)/1000=.015 then x=12.5 so the true rate is 12.5/1000=1.25%, thus the reported rate is pretty close to the true rate. (The authors then inflate their numbers up for population weighting which I am ignoring). On the other hand, suppose that the false positive rate is .015 which is still very low and not implausible then we can easily have ~15/1000=1.5% showing up as having antibodies to COVID when none of them in fact do, i.e. all of the result could be due to test error.

In other words, when the event is rare the potential error in the test can easily dominate the results of the test.

To illustrate that point, we've built the following tool to do that math, where we've standardized the various rates to be expressed as percentages of the sampled population, which we've also standardized at 1,000 individuals. If you're accessing this article on a site that republishes our RSS news feed, please click through to our site to access a working version of the tool.

Statistical Test Data
Input DataValues
Rate of Incidence From Sampled Population
Rate of False Test Positives
Rate of False Test Negatives

True Rate of Incidence
Calculated ResultsValues
Estimated True Rate of Incidence

Considering Alex' two scenarios, the tool confirms the results of a true infection rate of 1.25% for the default values, while increasing the false positive rate from 0.5% to 1.5% returns a true incidence rate of 0.00%, which indicates that all the reported positives could indeed be the result of false positive test results within the sampled population.

The ACSH's Chuck Dinerstein reviewed the study and discusses how the rate of false positives among the raw sample would affect the statistical validity of the report's population-adjusted findings:

The test for antibodies, seropositivity, is new, and the sensitivity and specificity of the test are not well calibrated. The researchers tested the test against know positive and negative serum (based on a more accurate nucleic acid amplification test – the swab in the nose or collected before the COVID-19 outbreak) among their patients and made use of the sensitivity and specificity studies done by the manufacturer on similar clinically confirmed COVID-19 patients and controls. While the specificity, false positives, with high and very similar for both test conditions, the sensitivity, the false negatives, were much higher in the local serum as compared to the data from the manufacturer. The researchers used blended false positives and negatives in their calculations.

  • Fewer false positives lowered the calculated incidence of COVID-19 to 2.49%.
  • Higher false positives increased the calculation to 4.16%.
  • Their blended value gave the 50-fold increase being reported.

In that thoughtful "peer-review," it is pointed out that the false-positive rate may be higher, it depends on how it is calculated. As that false positive rate increases, given their sample size, the conclusion may become lost in statistical uncertainty. In other terms, in the 50 positive tests, there might be 16 to as many as 40 false-positive results.

Before we go any further in this discussion, we should note the Stanford team's results are being torn apart by statisticians, including Luis Pedro Coelho and Andrew Gelman, the latter of whom is demanding the study's authors provide an apology for wasting everybody's time with what he describes in detail as avoidable statistical errors. Gelman also has this to say about Stanford University's credibility:

The authors of this article put in a lot of work because they are concerned about public health and want to contribute to useful decision making. The study got attention and credibility in part because of the reputation of Stanford. Fair enough: Stanford's a great institution. Amazing things are done at Stanford. But Stanford has also paid a small price for publicizing this work, because people will remember that "the Stanford study" was hyped but it had issues. So there is a cost here. The next study out of Stanford will have a little less of that credibility bank to borrow from. If I were a Stanford professor, I'd be kind of annoyed. So I think the authors of the study owe an apology not just to us, but to Stanford.

Before we close, take a closer look at the image we featured at the top of this article, particularly the solution to the algebra problem that appears on the whiteboard. This image was snapped from Stanford University's promotional materials for its graduate school of education. It's not just rushed studies that Stanford professors have to be annoyed about coming out from the university these days.



Hauser’s Law, Updated for 2019

Back in 2009, we wrote about Hauser's Law, which at the time, we described as "one of the stranger phenomenons in economic data". The law itself was proposed by W. Kurt Hauser in 1993, who observed:

No matter what the tax rates have been, in postwar America tax revenues have remained at about 19.5% of GDP.

In 2009, we found total tax collections the U.S. government averaged 17.8% of GDP in the years from 1946 through 2008, with a standard deviation of 1.2% of GDP. Hauser's Law had held up to scrutiny in principal, although the average was less than what Hauser originally documented in 1993 due to the nation's historic GDP having been revised higher during the intervening years.

We're revisiting the question now because some members of the new Democrat party-led majority in the House of Representatives has proposed increasing the nation's top marginal income tax rate up to 70%, nearly doubling today's 37% top federal income tax rate levied upon individuals. Since their stated purpose of increasing income tax rates to higher levels is to provide additional revenue to the U.S. Treasury to fund their "Green New Deal", if Hauser's Law continues to hold, they can expect to have their dreams of dramatically higher tax revenues to fund their political initiatives crushed on the rocks of reality.

Meanwhile, the U.S. Bureau of Economic Analysis completed a comprehensive revision to historic GDP figures in 2013, which significantly altered (increased) past estimates of the size of the nation's Gross Domestic Product.

The following chart shows what we found when we updated our analysis of Hauser's Law in action for the years from 1946 through 2018, where we're using preliminary estimates for the just-completed year's tax collections and GDP to make it as current as possible.

Hauser's Law in Action, 1946 - 2018

From 1946 through 2018, the top marginal income tax rate has ranged from a high of 92% (1952-1953) to a low of 28% (1988-1990), where in 2018, it has recently decreased from 39.6% to 37% because of the passage of the Tax Cuts and Jobs Act of 2017.

Despite all those changes, we find that the U.S. government's total tax collections have averaged 16.8% of GDP, with a standard deviation of 1.2% of GDP. Applying long-established techniques from the field of statistical process control, that gives us an expected range of 13.2% to 20.5% of GDP for where we should expect to see 99.7% of all the observations for tax collections as a percent share of GDP.

And that's exactly what we do see. The next chart zooms in on the total tax collections as a percent share of GDP data from the first chart, and adds the data for individual income tax collections as a percent share of GDP below it.

Total and Individual Income Tax Collections as Percent of GDP, 1946 - 2018

What we find is that the federal government's tax collections from both personal income taxes and all sources of tax revenue are remarkably stable over time as a percentage share of annual GDP, regardless of the level to which marginal personal income tax rates have been set. The biggest deviations we see from the mean trend to be associated with severe recessions, when tax collections have tended to decline somewhat more than the nation's GDP during periods of economic distress.

We also confirm that the variation in total and personal income tax receipts over time is well described by a normal distribution. We calculate that personal income tax collections as a percentage share of GDP from 1946 through 2018 has a mean of 7.6%, with a standard deviation of 0.8%.

For both levels of tax collections, if Hauser's Law holds, we would then expect any given year's tax collections as a percent of GDP to fall within one standard deviation of the mean 68% of the time, within two standard deviations 95% of the time, and within three standard deviations 99.7% of the time. And that is pretty close to what we observe with the reported data from 1946 through 2018.

As for high tax revenue aspirations, we can find only three periods where tax collections rose more than one standard deviation above the mean level.

  1. In 1968, the Democratic U.S. Congress and President Lyndon Johnson passed a 10% income surtax that took effect in mid-year, which suddenly raised the top tax rate from 70% to 77% (which increased the amount collected from top income tax earners by 10%.) Coupled with a spike in inflation, for which personal income taxes were not adjusted to compensate, this tax hike led to outsize income tax collections in that year.
  2. The sustained high inflation of 1978 (7.62%), 1979 (11.22%), 1980 (13.58%) and 1981 (10.35%) led to higher tax collections through bracket creep, as income tax brackets in the U.S. were not adjusted for inflation until 1985 as part of President Ronald Reagan's first term Economic Recovery Tax Act.
  3. Beginning in April 1997, the Dot Com Stock Market Bubble minted a large number of new millionaires as investors swarmed to participate in Internet and "tech" company initial public offerings or private capital ventures, which in turn, inflated personal income tax collections. Unfortunately, like the vaporware produced by many of the companies that sprang up to exploit the investor buying frenzy, the illusion of prosperity could not be sustained and tax collections crashed with the incomes of the Internet titans in the bursting of the bubble, leading to the recession that followed.

Now, what about those other taxes? Zubin Jelveh looked at the data back in 2008 and found that as corporate income taxes have declined over time, social insurance taxes (the payroll taxes collected to support Social Security and Medicare) have increased to sustain the margin between personal income tax receipts and total tax receipts. This makes sense given the matching taxes paid by employers to these programs, as these taxes have largely offset a good portion of corporate income taxes as a source of tax revenue from U.S. businesses. We also note that federal excise taxes have risen from 1946 through the present, which also has contributed to filling the gap and keeping the overall level of tax receipts as a percentage share of GDP stable over time.

Looking at the preliminary data for 2018, which saw both personal and corporate income tax rates decline with the passage of the Tax Cuts and Jobs Act of 2017, we see that total tax receipts as a percent of GDP dipped below the mean, but still falls within one standard deviation of it, just as in over two-thirds of previous years. Tax receipts from individual income taxes however rose slightly, despite the income tax cuts that took effect in 2018, staying above the mean but still falling within one standard deviation of it.

Hauser's Law appears to have held up surprisingly well over time.

Will it continue? Only time will tell, but given what we've observed, it would take more than simple changes in marginal income tax rates to boost the U.S. government's tax revenues above the historical range that characterizes the strange phenomenon that is Hauser's law.

References

Bradford Tax Institute. History of Federal Income Tax Rates: 1913-2019. [Online Text]. Accessed 13 January 2019.

Tax Foundation. Federal Individual Income Tax Rates History. [PDF Document]. 17 October 2013.

U.S. Department of the Treasury. September 2018 Monthly Treasury Statement. [PDF Document]. 17 October 2018.

U.S. Bureau of Economic Analysis. National income and Product Accounts Tables. Table 1.1.5. Gross Domestic Product. [Online Database]. Last Revised: 21 December 2018. Accessed: 14 January 2019.

U.S. White House Office of Management and Budget. Budget of the United States Government. Historical Tables. Table 1.1. Summary of Receipts, Outlays, and Surpluses or Deficits (-): 1789-2023. Table 2.1. Receipts by Source: 1934-2023. [PDF Document]. 12 February 2018.

Hauser’s Law, Updated for 2019

Back in 2009, we wrote about Hauser's Law, which at the time, we described as "one of the stranger phenomenons in economic data". The law itself was proposed by W. Kurt Hauser in 1993, who observed:

No matter what the tax rates have been, in postwar America tax revenues have remained at about 19.5% of GDP.

In 2009, we found total tax collections the U.S. government averaged 17.8% of GDP in the years from 1946 through 2008, with a standard deviation of 1.2% of GDP. Hauser's Law had held up to scrutiny in principal, although the average was less than what Hauser originally documented in 1993 due to the nation's historic GDP having been revised higher during the intervening years.

We're revisiting the question now because some members of the new Democrat party-led majority in the House of Representatives has proposed increasing the nation's top marginal income tax rate up to 70%, nearly doubling today's 37% top federal income tax rate levied upon individuals. Since their stated purpose of increasing income tax rates to higher levels is to provide additional revenue to the U.S. Treasury to fund their "Green New Deal", if Hauser's Law continues to hold, they can expect to have their dreams of dramatically higher tax revenues to fund their political initiatives crushed on the rocks of reality.

Meanwhile, the U.S. Bureau of Economic Analysis completed a comprehensive revision to historic GDP figures in 2013, which significantly altered (increased) past estimates of the size of the nation's Gross Domestic Product.

The following chart shows what we found when we updated our analysis of Hauser's Law in action for the years from 1946 through 2018, where we're using preliminary estimates for the just-completed year's tax collections and GDP to make it as current as possible.

Hauser's Law in Action, 1946 - 2018

From 1946 through 2018, the top marginal income tax rate has ranged from a high of 92% (1952-1953) to a low of 28% (1988-1990), where in 2018, it has recently decreased from 39.6% to 37% because of the passage of the Tax Cuts and Jobs Act of 2017.

Despite all those changes, we find that the U.S. government's total tax collections have averaged 16.8% of GDP, with a standard deviation of 1.2% of GDP. Applying long-established techniques from the field of statistical process control, that gives us an expected range of 13.2% to 20.5% of GDP for where we should expect to see 99.7% of all the observations for tax collections as a percent share of GDP.

And that's exactly what we do see. The next chart zooms in on the total tax collections as a percent share of GDP data from the first chart, and adds the data for individual income tax collections as a percent share of GDP below it.

Total and Individual Income Tax Collections as Percent of GDP, 1946 - 2018

What we find is that the federal government's tax collections from both personal income taxes and all sources of tax revenue are remarkably stable over time as a percentage share of annual GDP, regardless of the level to which marginal personal income tax rates have been set. The biggest deviations we see from the mean trend to be associated with severe recessions, when tax collections have tended to decline somewhat more than the nation's GDP during periods of economic distress.

We also confirm that the variation in total and personal income tax receipts over time is well described by a normal distribution. We calculate that personal income tax collections as a percentage share of GDP from 1946 through 2018 has a mean of 7.6%, with a standard deviation of 0.8%.

For both levels of tax collections, if Hauser's Law holds, we would then expect any given year's tax collections as a percent of GDP to fall within one standard deviation of the mean 68% of the time, within two standard deviations 95% of the time, and within three standard deviations 99.7% of the time. And that is pretty close to what we observe with the reported data from 1946 through 2018.

As for high tax revenue aspirations, we can find only three periods where tax collections rose more than one standard deviation above the mean level.

  1. In 1968, the Democratic U.S. Congress and President Lyndon Johnson passed a 10% income surtax that took effect in mid-year, which suddenly raised the top tax rate from 70% to 77% (which increased the amount collected from top income tax earners by 10%.) Coupled with a spike in inflation, for which personal income taxes were not adjusted to compensate, this tax hike led to outsize income tax collections in that year.
  2. The sustained high inflation of 1978 (7.62%), 1979 (11.22%), 1980 (13.58%) and 1981 (10.35%) led to higher tax collections through bracket creep, as income tax brackets in the U.S. were not adjusted for inflation until 1985 as part of President Ronald Reagan's first term Economic Recovery Tax Act.
  3. Beginning in April 1997, the Dot Com Stock Market Bubble minted a large number of new millionaires as investors swarmed to participate in Internet and "tech" company initial public offerings or private capital ventures, which in turn, inflated personal income tax collections. Unfortunately, like the vaporware produced by many of the companies that sprang up to exploit the investor buying frenzy, the illusion of prosperity could not be sustained and tax collections crashed with the incomes of the Internet titans in the bursting of the bubble, leading to the recession that followed.

Now, what about those other taxes? Zubin Jelveh looked at the data back in 2008 and found that as corporate income taxes have declined over time, social insurance taxes (the payroll taxes collected to support Social Security and Medicare) have increased to sustain the margin between personal income tax receipts and total tax receipts. This makes sense given the matching taxes paid by employers to these programs, as these taxes have largely offset a good portion of corporate income taxes as a source of tax revenue from U.S. businesses. We also note that federal excise taxes have risen from 1946 through the present, which also has contributed to filling the gap and keeping the overall level of tax receipts as a percentage share of GDP stable over time.

Looking at the preliminary data for 2018, which saw both personal and corporate income tax rates decline with the passage of the Tax Cuts and Jobs Act of 2017, we see that total tax receipts as a percent of GDP dipped below the mean, but still falls within one standard deviation of it, just as in over two-thirds of previous years. Tax receipts from individual income taxes however rose slightly, despite the income tax cuts that took effect in 2018, staying above the mean but still falling within one standard deviation of it.

Hauser's Law appears to have held up surprisingly well over time.

Will it continue? Only time will tell, but given what we've observed, it would take more than simple changes in marginal income tax rates to boost the U.S. government's tax revenues above the historical range that characterizes the strange phenomenon that is Hauser's law.

References

Bradford Tax Institute. History of Federal Income Tax Rates: 1913-2019. [Online Text]. Accessed 13 January 2019.

Tax Foundation. Federal Individual Income Tax Rates History. [PDF Document]. 17 October 2013.

U.S. Department of the Treasury. September 2018 Monthly Treasury Statement. [PDF Document]. 17 October 2018.

U.S. Bureau of Economic Analysis. National income and Product Accounts Tables. Table 1.1.5. Gross Domestic Product. [Online Database]. Last Revised: 21 December 2018. Accessed: 14 January 2019.

U.S. White House Office of Management and Budget. Budget of the United States Government. Historical Tables. Table 1.1. Summary of Receipts, Outlays, and Surpluses or Deficits (-): 1789-2023. Table 2.1. Receipts by Source: 1934-2023. [PDF Document]. 12 February 2018.