16/6/17: Replicating Scientific Research: Ugly Truth

Continuing with the theme on 'What I've been reading lately?', here is a smashing paper on 'accuracy' of empirical economic studies.

The paper, authored by Hou, Kewei and Xue, Chen and Zhang, Lu, and titled "Replicating Anomalies" (most recent version is from June 12, 2017, but it is also available in an earlier version via NBER) effectively blows a whistle on what is going on in empirical research in economics and finance. Per authors, the vast literature that detects financial markets anomalies (or deviations away from the efficient markets hypothesis / economic rationality) "is infested with widespread p-hacking".

What's p-hacking? Well, it's a shady practice whereby researchers manipulate (by selective inclusion or exclusion) sample criteria (which data points to exclude from estimation) and test procedures (including model specifications and selective reporting of favourable test results), until insignificant results become significant. In other words, under p-hacking, researchers attempt to superficially maximise model and explanatory variables significance, or, put differently, they attempt to achieve results that confirm their intuition or biases.

What's anomalies? Anomalies are departures in the markets (e.g. in share prices) from the predictions generated by the models consistent with rational expectations and the efficient markets hypothesis. In other words, anomalies occur when markets efficiency fails.

There are scores of anomalies detected in the academic literature, prompting many researchers to advocate abandonment (in all its forms, weak and strong) of the idea that markets are efficient.

Hou, Xue and Zhang take these anomalies to the test. The compile "a large data library with 447 anomalies". The authors then control for a key problem with data across many studies: microcaps. Microcaps - or small capitalization firms - are numerous in the markets (accounting for roughly 60% of all stocks), but represent only 3% of total market capitalization. This is true for key markets, such as NYSE, Amex and NASDAQ. Yet, as authors note, evidence shows that microcaps "not only have the highest equal-weighted returns, but also the largest cross-sectional standard deviations in returns and anomaly variables among microcaps, small stocks, and big stocks." In other words, these are higher risk, higher return class of securities. Yet, despite this, "many studies overweight microcaps with equal-weighted returns, and often together with NYSE-Amex-NASDAQ breakpoints, in portfolio sorts." Worse, many (hundreds) of studies use 1970s regression technique that actually assigns more weight to microcaps. In simple terms, microcaps are the most common outlier and despite this they are given either same weight in analysis as non-outliers or their weight is actually elevated relative to normal assets, despite the fact that microcaps have little meaning in driving the actual markets (their weight in the total market is just about 3% in total).

So the study corrects for these problems and finds that, once microcaps are accounted for, the grand total of 286 anomalies (64% of all anomalies studied), and under more strict statistical signifcance test 380 (of 85% of all anomalies) "including 95 out of 102 liquidity variables (93%) are insignificant at the 5% level." In other words, the original studies claims that these anomalies were significant enough to warrant rejection of markets efficiency were not true when one recognizes one basic and simple problem with the data. Worse, per authors, "even for the 161 significant anomalies, their magnitudes are often much lower than originally reported. Among the 161, the q-factor model leaves 115 alphas insignificant (150 with t < 3)."

This is pretty damning for those of us who believe, based on empirical results published over the years, that markets are bounded-efficient, and it is outright savaging for those who claim that markets are perfectly inefficient. But, this tendency of researchers to silverplate statistics is hardly new.

Hou, Xue and Zhang provide a nice summary of research on p-hacking and non-replicability of statistical results across a range of fields. It is worth reading, because it dents significantly ones confidence in the quality of peer review and the quality of scientific research.

As the authors note, "in economics, Leamer (1983) exposes the fragility of empirical results to small specification changes, and proposes to “take the con out of econometrics” by reporting extensive sensitivity analysis to show how key results vary with perturbations in regression specification and in functional form." The latter call was never implemented in the research community.

"In an influential study, Dewald, Thursby, and Anderson (1986) attempt to replicate empirical results published at Journal of Money, Credit, and Banking [a top-tier journal], and find that inadvertent errors are so commonplace that the original results often cannot be reproduced."

"McCullough and Vinod (2003) report that nonlinear maximization routines from different software packages often produce very different estimates, and many articles published at American Economic Review [highest rated journal in economics] fail to test their solutions across different software packages."

"Chang and Li (2015) report a success rate of less than 50% from replicating 67 published papers from 13 economics journals, and Camerer et al. (2016) show a success rate of 61% from replicating 18 studies in experimental economics."

"Collecting more than 50,000 tests published in American Economic Review, Journal of Political Economy, and Quarterly Journal of Economics, [three top rated journals in economics] Brodeur, L´e, Sangnier, and Zylberberg (2016) document a troubling two-humped pattern of test statistics. The pattern features a first hump with high p-values, a sizeable under-representation of p-values just above 5%, and a second hump with p-values slightly below 5%. The evidence indicates p-hacking that authors search for specifications that deliver just-significant results and ignore those that give just-insignificant results to make their work more publishable."

If you think this phenomena is encountered only in economics and finance, think again. Here are some findings from other ' hard science' disciplines where, you know, lab coats do not lie.

"...replication failures have been widely documented across scientific disciplines in the past decade. Fanelli (2010) reports that “positive” results increase down the hierarchy of sciences, with hard sciences such as space science and physics at the top and soft sciences such as psychology, economics, and business at the bottom. In oncology, Prinz, Schlange, and Asadullah (2011) report that scientists at Bayer fail to reproduce two thirds of 67 published studies. Begley and Ellis (2012) report that scientists at Amgen attempt to replicate 53 landmark studies in cancer research, but reproduce the original results in only six. Freedman, Cockburn, and Simcoe (2015) estimate the economic costs of irreproducible preclinical studies amount to about 28 billion dollars in the U.S. alone. In psychology, Open Science Collaboration (2015), which consists of about 270 researchers, conducts replications of 100 studies published in top three academic journals, and reports a success rate of only 36%."

Let's get down to real farce: everyone in sciences knows the above: "Baker (2016) reports that 80% of the respondents in a survey of 1,576 scientists conducted by Nature believe that there exists a reproducibility crisis in the published scientific literature. The surveyed scientists cover diverse fields such as chemistry, biology, physics and engineering, medicine, earth sciences, and others. More than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than 50% have failed to reproduce their own experiments. Selective reporting, pressure to publish, and poor use of statistics are three leading causes."

Yeah, you get the idea: you need years of research, testing, re-testing and, more often then not, you get the results are not significant or weakly significant. Which means that after years of research you end up with unpublishable paper (no journal would welcome a paper without significant results, even though absence of evidence is as important in science as evidence of presence), no tenure, no job, no pension, no prospect of a career. So what do you do then? Ah, well... p-hack the shit out of data until the editor is happy and the referees are satisfied.

Which, for you, the reader, should mean the following: when we say that 'scientific research established fact A' based on reputable journals publishing high quality peer reviewed papers on the subject, know that around half of the findings claimed in these papers, on average, most likely cannot be replicated or verified. And then remember, it takes one or two scientists to turn the world around from believing (based on scientific consensus at the time) that the Earth is flat and is the centre of the Universe, to believing in the world as we know it to be today.

Full link to the paper: Charles A. Dice Center Working Paper No. 2017-10; Fisher College of Business Working Paper No. 2017-03-010. Available at SSRN:

7/6/17: Markets, Investors Exuberance and Fundamentals

Latest data from FactSet on S&P500 core metrics is an interesting read. Here are a couple of charts that caught my attention:

Look first at the last 6 months worth of EPS data through estimated 2Q 2017 (based on 99% of companies reporting). The trend continues: EPS is declining, while prices are rising. On a longer time scale, EPS have been virtually flat in 2014-2016, but are forecast to rise nicely in 2017 and 2018. Whatever the forecast might be for 2018, 2017 increase would do little to generate a meaningful reversion in EPS to price trend

However, the good news is, expectations on rising EPS are driven by rising sales for 2017, and to a lesser extent in 2018. This would be (if materialised) an improvement on the 2014-2016 core drivers, including shares repurchases (chart below).

Next, consider P/E ratios:

As the chart above indicates, P/E ratios are expected to continue rising in the next 12 months. In other words, the markets are going to get more expensive, relative to underlying earnings. Worse, on a 5-year average basis, all sectors, excluding Financials, are at above x14. Hardly a comfort zone for 'go long' investors. The overvalued nature of the market is clearly confirmed by both forward and trailing P/E ratios over the last 10 years:

Forward expectations are now literally a run-away train, relative to the past 10 years record (chart above), while trailing (lagged) P/Es are dangerously close to crisis-triggering levels of exuberance (chart below).

In summary, thus, latest data (through end-of-May) shows continued buildup of risks in the equity markets. At what point the dam will crack is not something I can attempt to answer, but the lake of investors' expectations is now breaching the top, and the spillways aren't doing the trick on abating them.

3/9/16: Fintech, Banking and Dinosaurs with Wings

Here is an interesting study from McKinsey on fintech role in facilitating banking sector adjustments to technological evolution and changes in consumer demand for banking services:

The key here is that fintech is viewed by McKinsey as a core driver for changes in risk management. And the banks responses to fintech challenge are telling. Per McKinsey: “More recently, banks have begun to capture efficiency gains in the SME and commercial-banking segments by digitizing key steps of credit processes, such as the automation of credit decision engines.”

The potential for rewards from innovation  is substantial: “The automation of credit processes and the digitization of the key steps in the credit value chain can yield cost savings of up to 50 percent. The benefits of digitizing credit risk go well beyond even these improvements. Digitization can also protect bank revenue, potentially reducing leakage by 5 to 10 percent.”

McKinsey reference one example of improved efficiencies: “…by putting in place real-time credit decision making in the front line, banks reduce the risk of losing creditworthy clients to competitors as a result of slow approval processes.”

Blockchain technology offers several pathways to delivering significant gains for banks in the area of risk management:

  • It is real-time transactions tracking mechanism which can be integrated into live systems of data analytics to reduce lags and costs in risk management;
  • It is also the most secure form of data transmission to-date;
  • It offers greater ability to automate individual loans portfolios on the basis of each client (irrespective of the client size); and 
  • It provides potentially seamless integration of various sub-segments of lending portfolios, including loans originated in unsecured peer-to-peer lending venues and loans originated by the banks.

Note the impact matrix above.

Blockchain solutions, such as for example AID:Tech platform for payments facilitation, can offer tangible benefits across all three pillars of digital credit risk management process for a bank:

  • Meeting customer demand for real-time decisions? Check. Self-service demand? Check. Integration with third parties’ platforms? Check. Dynamic risk-adjusted pricing and limits? Check
  • Reduced cost of risk mitigation? Yes, especially in line with real-time analytics engines and monitoring efficiency
  • Reduced operational costs? The entire reason for blockchain is lower transactions costs

What the above matrix is missing is the bullet point of radical innovation, such as, for example, offering not just better solutions, but cardinally new solutions. Example of this: predictive or forecast-based financing (see my earlier post on this

A recent McKinsey report ( attempted to map the same path for insurance industry, but utterly failed in respect of seeing the insurance model evolution forward beyond traditional insurance structuring (again, for example, FBF is not even mentioned in the report, nor does the report devote any attention to the blockchain capacity to facilitate predictive analytics-based insurance models). Tellingly, the same points are again missed in this month’s McKinsey report on digital innovation in insurance sector:

This might be due to the fact that McKinsey database is skewed to just 350 larger (by now legacy) blockchain platforms with little anchoring to current and future innovators in the space. In a world where technology evolves with the speed of blockchain disruption, one can’t be faulted for falling behind the curve by simply referencing already established offers.

Which brings us to the point of what really should we expect from fintech innovation taken beyond d simply tinkering on the margins of big legacy providers?

As those of you who follow my work know, I recently wrote about fintech disruption in the banking sector for the International Banker (see The role of fintech in providing back-office solutions in banking services is something that is undoubtedly worth exploring. However, it is also a dimension of innovation where banks are well-positioned to accept and absorb change. The real challenge lies within the areas of core financial services competition presented (for now only marginally) by the fintech. Once, however, the marginal innovation gains speed and breadth, traditional banking models will be severely stretched and the opening for fintech challengers in the sector will expand dramatically. The reason for this is simple: you can’t successfully transform a centuries-old business model to accommodate revolutionary change. You might bolt onto it few blows and whistles of new processes and new solutions. But that is hardly a herald of innovation.

At some point in evolution, dinosaurs with wings die out, and birds fly.

13/6/16: Twin Tech Challenge to Traditional Banks

My article for the International Banker looking at the fintech and cybercrime disruption threats to traditional banking models is out.

The long-term fallout from the 2008 global financial crisis created several deep fractures in traditional-banking models. Most of the sectoral attention today has focused on weak operating profits and balance-sheet performance, especially the risks arising from the negative-rates environment and the collapse in yields on traditional assets, such as highly rated sovereign and corporate debt. Second-tier concerns in boardrooms and amidst C-level executives relate to the continuously evolving regulatory and supervisory pressures and rising associated costs. Finally, the anemic dynamics of the global economic recovery are also seen as a key risk to traditional banks’ profitability.

However, from the longer-term perspective, the real risks to the universal banks’ well-established business model come from an entirely distinct direction: the digital-disruption channels that simultaneously put pressure on big banks’ core earnings lines and create ample opportunities for undermining the banking sector’s key unique selling proposition—that is, security of customer funds, data and transactions, and by corollary, enhancing customer loyalty. These channels are FinTech innovations—including rising data intensity of products on offer and technological threats, such as rising risks to cybersecurity. This two-pronged challenge is not unique to the banking sector, but its disruptive potential is a challenge that today’s traditional banking institutions are neither equipped to address nor fully enabled to grasp.

Read more here: Gurdgiev, Constantin, Is the Rise of Financial Digital Disruptors Knocking Traditional Banks Off the Track? (June 13, 2016). International Banker, June 2016. Available at SSRN: