Category Archives: Big Data

16/10/18: Data analytics. It really is messier than you thought


An interesting study (H/T to @stephenkinsella) highlights the problems with empirical determinism that is the basis for our (human) evolving trust in 'Big Data' and 'analytics': the lack of determinism in statistics when it comes to social / business / finance etc data.

Here is the problem: researchers put together 29 independent teams, with 61 analysts. They gave these teams the same data set on football referees decisions to give red cards to players. They asked the teams to evaluate the same hypothesis: are football "referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players"?

Due to a variation of analytic models used, the estimated models produced a range of answers, from the effect of skin color of the player on red card issuance being 0.89 at the lower end or the range to 2.93 at the higher end. Median effect was 1.31. Per authors, "twenty teams (69%) found a statistically significant positive effect [meaning that they found the skin color having an effect on referees decisions], and 9 teams (31%) did not observe a significant relationship" [meaning, no effect of the players' skin color was found].

To eliminate the possibility that analysts’ prior beliefs could have influenced their findings, the researchers controlled for such beliefs. In the end, prior beliefs did not explain these differences in findings. Worse, "peer ratings of the quality of the analyses also did not account for the variability." Put differently, the vast difference in the results cannot be explained by quality of analysis or priors.

The authors conclude that even absent biases and personal prejudices of the researchers, "significant variation in the results of analyses of complex data may be difficult to avoid... Crowdsourcing data analysis, a strategy in which numerous research teams are recruited to simultaneously investigate the same research question, makes transparent how defensible, yet subjective, analytic choices influence research results."

Good luck putting much trust into social data analytics.

Full paper is available here: http://journals.sagepub.com/doi/pdf/10.1177/2515245917747646.

16/10/18: Data analytics. It really is messier than you thought


An interesting study (H/T to @stephenkinsella) highlights the problems with empirical determinism that is the basis for our (human) evolving trust in 'Big Data' and 'analytics': the lack of determinism in statistics when it comes to social / business / finance etc data.

Here is the problem: researchers put together 29 independent teams, with 61 analysts. They gave these teams the same data set on football referees decisions to give red cards to players. They asked the teams to evaluate the same hypothesis: are football "referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players"?

Due to a variation of analytic models used, the estimated models produced a range of answers, from the effect of skin color of the player on red card issuance being 0.89 at the lower end or the range to 2.93 at the higher end. Median effect was 1.31. Per authors, "twenty teams (69%) found a statistically significant positive effect [meaning that they found the skin color having an effect on referees decisions], and 9 teams (31%) did not observe a significant relationship" [meaning, no effect of the players' skin color was found].

To eliminate the possibility that analysts’ prior beliefs could have influenced their findings, the researchers controlled for such beliefs. In the end, prior beliefs did not explain these differences in findings. Worse, "peer ratings of the quality of the analyses also did not account for the variability." Put differently, the vast difference in the results cannot be explained by quality of analysis or priors.

The authors conclude that even absent biases and personal prejudices of the researchers, "significant variation in the results of analyses of complex data may be difficult to avoid... Crowdsourcing data analysis, a strategy in which numerous research teams are recruited to simultaneously investigate the same research question, makes transparent how defensible, yet subjective, analytic choices influence research results."

Good luck putting much trust into social data analytics.

Full paper is available here: http://journals.sagepub.com/doi/pdf/10.1177/2515245917747646.

30/10/15: ‘Internet Natives’: Power of Value Creation + Power of Value Destruction


A very interesting Credit Suisse survey of some 1,000 people of the tail end of the millennial generation (age 16-25) across the U.S., Brazil, Singapore and Switzerland. Some surprising insights.

Take a look at the following summary:



The results are seriously strange. Around 48% of all respondents use internet for payments transactions, but only 19% on average use it for obtaining financial advice. In other words, convenience drives transactions use, but not analytics demand.

Meanwhile, on average just 20% use internet for earning money or working. Which makes you wonder, what jobs (if any) do the respondents hold if only 1 in 5 use internet to execute it? And, furthermore, look at the percentages of respondents who use internet for job searches compared to earning money or working. Once again, something fishy.

Internet use for political and social engagement heavily exceeds personal relations. And this is true for all countries surveyed. Which simply does not bear any relationship to young generation voting participation in the real world, but does match their responses to whether or not they use internet for voting.


While responses across previous set of questions suggest that internet-based social (political / civic) engagements are more prevalent amongst the young respondents than personal engagements, there is the opposite view of internet as bearing personal benefits as opposed to social benefits.



This is especially true in the U.S. and Switzerland, where the gap between those who think internet is a positive personal platform as opposed to social platform is 12-13 percentage points.

Confusing? May be not. The ‘web naturals’ that we all are, we are simultaneously experiencing two aspect of internet-enabled life:

  • Too much information and clutter; and
  • Significant value to the power of engagement.


What this means to me is that social and interactive platforms have to stop inventing new channels to push through to us - information users - commercialised crap and start letting us take charge of content once again. To do this, the successful platforms of the future will need the following:
1) Own brand capital that is clean from being pure advertisement pushers;
2) More creative and empowering deployment of user-generated content; and
3) Ability to re-focus their business strategies on margin delivery.

Otherwise, they will end up cannibalising themselves and destroying our - users’ - value.

4/10/15: Data is not the end of it all, it’s just one tool…


Recently, I spoke at a very interesting Predict conference, covering the issues of philosophy and macro-implications of data analytics in our economy and society. I posted slides from my presentations earlier here.

Here is a quick interview recorded by the Silicon Republic covering some of the themes discussed at the conference: https://www.siliconrepublic.com/video/data-is-not-the-end-of-it-all-its-just-one-tool-dr-constantin-gurdgiev.


17/9/15: Predict Conference: Data Analytics in the Age of Higher Complexity


This week I spoke at the Predict Conference on the future of data analytics and predictive models. Here are my slides from the presentation:












Key takeaways:

  • Analytics are being shaped by dramatic changes in demand (consumer side of data supply), changing environment of macroeconomic and microeconomic uncertainty (risks complexity and dynamics); and technological innovation (on supply side - via enablement that new technology delivers to the field of analytics, especially in qualitative and small data areas, on demand side - via increased speed and uncertainty that new technologies generate)
  • On the demand side: consumer behaviour is complex and understanding even the 'simpler truths' requires more than simple data insight; consumer demand is now being shaped by the growing gap between consumer typologies and the behavioural environment;
  • On micro uncertainty side, consumers and other economic agents are operating in and environment of exponentially increasing volatility, including income uncertainty, timing variability (lumpiness) of income streams and decisions, highly uncertain environment concerning life cycle incomes and wealth, etc. This implies growing importance of non-Gaussian distributions in statistical analysis of consumer behaviour, and, simultaneously, increasing need for qualitative and small data analytics.
  • On macro uncertainty side, interactions between domestic financial, fiscal, economic and monetary systems are growing more complex and systems interdependencies imply growing fragility. Beyond this, international systems are now tightly connected to domestic systems and generation and propagation of systemic shocks is no longer contained within national / regional or even super-regional borders. Macro uncertainty is now directly influencing micro uncertainty and is shaping consumer behaviour in the long run.
  • Technology, that is traditionally viewed as the enabler of robust systems responses to risks and uncertainty is now acting to generate greater uncertainty and increase shocks propagation through economic systems (speed and complexity).
  • Majority of mechanisms for crisis resolution deployed in recent years have contributed to increasing systems fragility by enhancing over-confidence bias through excessive reliance on systems consolidation, centralisation and technocratic responses that decrease systems distribution necessary to address the 'unknown unknowns' nature of systemic uncertainty. excessive reliance, within business analytics (and policy formation) on Big Data is reducing our visibility of smaller risks and creates a false perception of safety in centralised regulatory and supervisory regimes.
  • Instead, fragility-reducing solutions require greater reliance on highly distributed and dispersed systems of regulation, linked to strong supervision, to simultaneously allow greater rate of risk / response discovery and control the downside of such discovery processes. Big Data has to be complemented by more robust and extensive deployment of the 'craft' of small data analytics and interpretation. Small events and low amplitude signals cannot be ignored in the world of highly interconnected systems.
  • Overall, predictive data analytics will have to evolve toward enabling a shift in our behavioural systems from simple nudging toward behavioural enablement (via automation of routine decisions: e.g. compliance with medical procedures) and behavioural activation (actively responsive behavioural systems that help modify human responses).