Deep Seek benefits from some Human Intelligence

I have been critical of purely data driven modelling in the past.  Deep Seek’s shock and challenge to the Generative AI heavyweights is a slap in the face for data driven solutions.  The ‘dump a bunch of data into a bucket and stir’ mentality seems to sum up the brute force, but stupid, approach to Generative AI Large Language models that has driven these companies to conclude that all you need is more computer power to  make the model work.  It took a hand-full of statisticians to curtail that belief.

So how did a hedge fund torpedo the GAI industry?  Some Hedge Funds use quantitative investing techniques.  Quantitative investing approaches summarise large data sets with screening models.  These screen take a big problem (create an equity portfolio that beats the market) and reduce it to a handful of common factors.  These factors may have some long term or short term return that can be exploited.  Some common screens are Growth v Value stocks, Small-cap v Large-cap, Sector membership and so forth that conditionally or unconditionally enable a manager to beat a static benchmark.  But the key is to reduce a big problem down to a smaller problem without losing to much information.  There are a number of techniques that reduces the dimensionality of the problem down to something more manageable.  For example, pre-screening the dataset to eliminate obvious bad candidate investments, aggregating data to smooth cross-sectional variation, using linear approximations to test for incremental improvements, specifying near-optimal starting conditions from a coarse grid search before starting a finer grid search and many many other techniques.

The fact that Large Language Models have many billions of parameters that are carried on their tokens means that even a small reduction in this number of parameters needed to estimate a model can save an enormous amount of computation time.  Deep Seek seemed to have applied these dimensionality reduction techniques to the training data before estimation.  The efficiency loss was minimal relative to the time and cost saving.  Brute force computational solutions that are favoured by the big-data community were trounced by a bit of human intelligence.

The quantitative dimensionality reduction techniques were honed out of necessity rather than design.  When quantitative investing first became popular 40 years ago, computers were not as fast nor as accessible as they are today.  Investors had to think before feeding their punchcards into the reader since this could save days of processing time.  One ‘run’ might take 12 hours to complete back then which is equivalent to the continuous refresh in today’s excel spreadsheet on a PC.

The lesson from Deep Seek is not about the race to win the Super AI crown, it is about the instructions that a smart human should give to a dumb computer.