Any discussion of the application of deep neural networks to predicting financial markets needs to navigate a sociological mine-field. Quants, machine learning researchers and financial econometricians have competing perspectives which seem to have been forged from a number of factors including the juxtaposition of different academic disciplines and the theory versus the practice.

In our forthcoming Journal of Algorithmic Finance article, we show how to use deep learning to predict directional movements of CME futures prices at 5 minute intervals. Deep neural networks, as primarily developed for image and speech recognition applications, and now heralded in applications such as AlphaGo and self-driving cars, require a large number of features to justify the use of multiple hidden perceptron layers.

[Source: Dixon, Matthew Francis, Klabjan, Diego and Bang, Jin Hoon, Classification-Based Financial Markets Prediction Using Deep Neural Networks (July 18, 2016). Forthcoming in the Journal of Algorithmic Finance]

**Re-opening Pandora’s Box**

The question that we set out to address is how can deep neural networks be applied to the prediction of futures markets? The more fundamental question of economic value of machine learning is both complex and contentious with many prominent algorithmic finance researchers concluding that trading costs outweigh any methodological benefits. Yet it is well known in the capital markets industry that a dominant use case for machine learning is alpha generation.

There exists a spectrum of opinion best characterized by its poles. On the one end, we have computer scientists predicating the notion that we set a machine to process some input and leave it to find hidden patterns which yield trading profits. The proverbial hammer looking for a nail. At the other end, we have financial prediction experts asserting that the question of utility of deep learning, considered in isolation, is somewhat limited; Which events to predict in conjunction with choice of input data is by far the most skillful aspect of successfully applying machine learning.The methodology is secondary.

Finance academics are on the fence. Much of their skepticism of machine learning derives from inadequate assessment of over-fitting by studies in the 1990s. But since then, we’ve witnessed an exponential growth in the amount of data to train models, orders of magnitude improvement in micro-processor capability not to mention advances in techniques for avoiding and measuring over-fitting, such as k-fold cross validation, node black-out and bias variance trade-off analysis. Antiquated dynamic models of financial econometrics remain the standard. Surely it’s time to open Pandora’s box?

Finding alpha is essentially a question of time series analysis applied to very noisy datasets. While there are arguably similar types of non-financial applications where Recurrent Neural Networks (RNNs) with Long Short Term Memory (LTSM) have been successful, the question of how to define the recurrence for financial market prediction is non-trivial. Our approach includes the memory in the features to avoid having to define recurrence. However it is an open question as to which is better.

We shall break off one piece of this puzzle, here, and discuss the applicability of deep learning in particular. Like most academic studies, we were limited by the amount of data we could access and this defined the scope of our studies. We begin with ‘longer’ time-frame predictions, at 5 minutes, and then advance to using order book pressure to prediction near-time movement.

**Market Wide Prediction**

One efficient approach to generating a sufficient number of features from historical prices is to simply lag prices and calculate rolling moving averages, varying the lag and window size for each feature. This may results in a few hundred features but isn’t sufficient for deep neural networks.

By predicting multiple instruments with one model, we are able to scale the number of features to nearly ten thousand. Not only is this sufficient for deep neural networks but it provides the added benefit of capturing market wide effects, not necessarily apparent from fitting a model to a single instrument.

The graph below shows the distribution of classification accuracy for each 0f 43 CME listed front-month futures contracts over ten out-of-sample walk forward optimization windows. We have used a three state classifier to represent up, flat and down movements. The red line in each bar represents the average result over the backtesting horizon. In most cases, the classifier outperforms white noise (33%), but the variability is significant. The results also require market knowledge to interpret their significance. For example, a market with little liquidity may yield higher classification accuracy by virtue of smaller price dispersion.

[Source: Dixon, Matthew Francis, Klabjan, Diego and Bang, Jin Hoon, Classification-Based Financial Markets Prediction Using Deep Neural Networks (July 18, 2016). Forthcoming in the Journal of Algorithmic Finance]

**Machine Learning is just Matrix Linear Algebra**

Using off the shelf tools such as Tensor flow and Theano is convenient but shifts the burden of configurability to the user. We prefer to maintain our own implementation, designed specifically for prediction across multiple instruments. Using the Intel Xeon Phi, we have designed and implemented a C++ version of a deep neural network which uses Intel MKL libraries to efficiently map matrix computations, the computational bottle-neck, to multi and many-core Intel Xeon architectures. Training a deep neural network with 7 layers over approximately 10 years of 43 CME futures prices at 5 minute intervals can take days on a single processor. Our implementation runs 11.4x faster on the Knights Corner generation of the Intel Xeon Phi co-processor. Further details of this implementation are described in our 8th Workshop on High Performance Computational Finance paper.

**Using Market-Microstructure is Key**

The above study concluded that *consistent* high prediction accuracy from price history alone is elusive. We turned to using the order book to extract current and lagged features and are able to generate a sufficient number of features for a single instrument to warrant using deep learning. Prediction time frames from book pressure and aggressor information is very short and low latency implementations are required. At 1-2 millisecond lead times, we are able to obtain substantially higher classification accuracy than the use of price history alone. The naive approach to labeling, detailed in the paper, does not lead to trading profits. However, market knowledge can be applied to construct more elaborate labeling approaches which do yield trading profits.

**Summary**

Successful application of deep learning to predicting price movements in any market requires integration of market knowledge with machine learning expertise. Deep learning needs a large number of features and the quant researcher needs to be innovative in their construction. Prediction from level I data almost certainly requires a market wide prediction approach. With level II data, the key discriminatory factor is how the microstructure memory is encoded. Labeling approaches which yield profitable trading signals need to be carefully designed too. We’ve shown that deep learning leads to high accuracy in predicting extreme price movements over very short time-frames.