AI: A statistics package in the hands of monkeys

There are two ways to approach an empirical question.

1. Formulate an hypothesis based on some physical, structural or behavioural model. Test the hypothesis using a relevant dataset. Interpret the results relative to the model predictions.

2. Dump a bunch of data into a bucket, stir and implement the correlations that appear.

I have been intrigued by the aura-ed veil with which AI distinguishes itself from statistics. One journalist poses the question aptly “If machine learning is a subsidiary of statistics, how could someone with virtually no background in stats develop a deep understanding of cutting-edge Machine Learning concepts?“[Joe Davison, Medium, June 28 2018 ‘No, AI is not just glorified statistics’]. The journalist’s intent argues that AI is fundamentally different to statistics – that understanding the statistical algorithms is not necessary. I disagree. I question whether a ‘deep understanding’ of these cutting edge concepts was ever attained.

Let me explain the dangers of a statistics package in the hands of monkeys. Recently we were visited by an AI-person asking us to invest millions in his stock picking model. Stock picking has been a fascination of statisticians for centuries so there is an established literature of acceptable practice amongst researchers to prove their success or otherwise. Measures such as information ratios, Sharpe ratios, Sortino’s, benchmark relatives etc etc are standard. But this AI-person was out and about seeking investors without any of this supporting evidence. When quizzed about how he chose his investment universe and the subset of securities that made up his portfolio the response was that the ‘AI chose it’. Add to this, where he did calculate some measures of success, the calculations were wrong (an information ratio of 27 without a benchmark!) Financial datasets are notoriously unstable with outlying observations driving inference and spurious correlations galore. Ceding control of your dataset to an AI algorithm is hardly comforting…

…to me. However, this seems to be standard AI practice. The bigger the data bucket, the more crunchtime that is needed, the better the story it seems. AI practice is to specify, say, a linear relation between some X-variables as they affect a Y-variable, calculate the coefficients and just use them. It is a fact, however, that the optimal coefficient estimates from a multivariate regression are functionally related to the variance and covariances in the data. The response coefficients are positively related to covariances of X with Y, and negatively related to the variance of X. It is also a fact that these coefficients will be exactly the same irrespective of whether the user identifies themselves as an AI data-scientist or a statistical researcher. Clearly, where the end-result is the same and the method for estimating the relations are the same, then the same thing is being generated. So who are the monkeys here?

It seems to me that the AI-field fails to distinguish itself from the other users of the mathematical techniques that are used in their model-building. The monkeyness shows up as a lack of thinking about the datasets and the problem that they are trying to solve. One of the first things that you are told is to ‘look at your data’ but most people start calculating means and variances when they come into possession of a new dataset. The AI-types just stick the data in a bucket and stir! They dont want to know about where the data came from, how it is calculated or what it represents. They just want results – means and variances and covariances – that may repeat themselves or not.

The monkey that visited us seeking capital (and who I described above) is a classic case of someone who is destined to make the mistakes of statisticians several centuries before him and really doesn’t know how naive he is. He will patch up his model when it fails to deliver actual investment results and arbitrarily impose constraints to improve historical performance, while not improving anything in the future. He will lose money for investors and not know why.

It does seem odd that AI researchers have embraced the Finance industry without bothering to learn from the mistakes that have gone before them. Do they genuinely believe that they are the first to apply mathematical techniques to these datasets? A monkey with a statistics package is a dangerous combination.

Do you like what you read? Then subscribe to our blog below…

You might also like