There has been a lot of commentary in the financial press about the idea of “fat tails,” that is, about the idea that disasters are more common than a normal or bell curve view of probability would lead one to expect. Indeed, one might even say that there is some consensus on this once heterodox and Mandelbrotian view of probability distributions. The real debates in the artificial intelligence world (and we will see in a moment how intense they can get) arise over the significance of that fact. Perhaps the opposed points of view in that debate can reconcile on this point and then they can all get busy working on algorithms.
What is the debate? To begin, it is possible to infer that a lot of investors are implicitly betting on those bell curves and their narrow tails and that the appropriate response to this situation is to bet against them, to bet on the fat tails. But it is also possible to infer that the wise use of diversification and leverage can protect against the fat tails, especially if one allows for at least a medium-distance time horizon.
There was a heated exchange on this subject on Twitter not long ago between Nassim Taleb and Cliff Asness. It ended in acrimony. Taleb, on May 22, tweeted, “One should resist making psychiatric claims, except for situations where 100% self-evident.” On the basis of self-evidence, he attributed “narcissistic rage” to Asness.
Asness replied, “Being called a lunatic by Nassim Nicholas Taleb … is a bizarre feeling. It’s like being attacked by Harvey Weinstein for not respecting women enough.”
Beyond Reality Television
That’s amusing in a reality-television sort of way, but … what were they actually arguing about? Fat tails? But then … is Asness a defender of the narrow-tailed bell curve? Not at all. Asness has written: “Investment returns are generally not perfectly ‘normally’ distributed. To some degree almost all investment returns seem to be somewhat ‘fat-tailed.’” He also contends, as one would expect, that AQR portfolios are robust as against such fat tail events as should be expected, through diversification and (what is really the same thing) a judicious use of leverage.
Asness believes, though, that funds specifically oriented toward harvesting the tail risk, like Taleb’s, are a bad idea. Taleb, naturally, thinks they are a good idea, that Asness’ math is all wrong, and that diversification is overrated.
The best comment on this argument comes from RCM Alternatives—who is right depends on the particular investor to whom appeal is to be made. Some investors will want more “structural and real-time” protection from nasty sell-offs, and will be, rationally, attracted by the fat tail funds’ ability to deliver “right now” in such an event. Others will be willing to work with a longer horizon, and they will, also rationally, be happy with “long-term positive expectancy strategies like trend following.”
Now this brings us around to a fascinating issue about machine learning. If betting on the fat tails through out-of-the-money options is a good strategy—meaning a strategy marketable to rational people and institutions—then it seems natural to ask, can an algorithm do it? Is it valuable to filter out human judgment at the trading level here?
Good News for the Coders
As it happens, Taleb has a paper that addresses this issue, forthcoming in the International Journal of Forecasting. It has some good news for ambitious coders. Taleb writes that his key function, g(x) [where g is a non-linear payoff function that “resembles financial derivatives term sheets] maps to various machine learning functions that produce exhaustive non-linearities.” His contention, specifically, is that many (human) misjudgments take place because humans focus on a variable (x) when they ought to be focused instead on the function of that variable, g(x).
The point of portfolio management, then, is not about making good forecasts, but about changing g for the better. Algorithms can be relied upon to keep their eyes on the ball here.
More intriguing, yet oddly confined to a footnote, is Taleb’s point in this connection that “machine learning of cross-entropy to gauge the difference between the distribution of a variable and the forecast” would be an effective way of capturing the nonlinearities of g().
Cross-entropy involves the calculation of the difference between two probability distributions, calculating the total entropy between them. It is used in machine learning as a loss function. The gist of this is that machines can learn over time to engage in information-theory judo, to make use of the loss of information to organize information and to capture those alpha promising nonlinearities.