You've been asked to model historic FTSE data with less than a 2% error. You've been given a hour!

armstrongWebb
Mar 20, 2021
4 min read

Updated: Mar 25, 2021

OK, it's a pretty unlikely scenario. But, it raises an important question, how does one model a complex data pattern that is dependent on a large number of separate variables; 100 in this instance? And model it quickly!

Read on...

The FTSE 100 index - historical data from last 300 days

The above chart ends on Friday March 19, 2021 (day 296).

As you can see, its shape doesn't lend itself to modelling with a simple linear regression. Clearly, a mathematical model would be non-linear and dependent on a multitude of factors, for example, interest and exchange rates, inflation, performance of the underlying companies and their weighted contribution to the index. In other words, it would require some form of multi-variate model.

You could mimic the FTSE 100 by knowing the relevant prices and weights of the underlying stocks. But, just to make it is more challenging, let's assume that stock data is unavailable - the only data that you have available to you is 5 years of historical FTSE 100 index data. This simplifies it, because we are only dependent on one variable, the trend of the FTSE itself - hence, we are looking at a univariate model.

Forget the underlying drivers, just deal with the data

Given the focus of these posts, you won't be surprised to know that we will look to AI to help deliver a solution! In a later blog post, we will revisit this problem using Facebooks's trend forecasting tool, FB Prophet.

But, before we look at how AI can crack the problem, let's think about how we could approach the problem by only knowing historical FTSE 100 index data. Taking a commonsense approach....

Imagine that yesterday's FTSE 100 opening index was 6500 and today's is 6504. What is tomorrow's likely to be? Well, 6508 doesn't sound unreasonable, based on the difference between today and yesterday's index. So far, so reasonable.

However, what if I then told you that 2 days' ago, the opening value was 6520. Would that alter your view of tomorrow's opening index? Ahh, you might then decide that some form of average change is more appropriate, and it might be a reduction, say, back down to 6500.

What we are doing is using earlier data to influence what we believe today's index will be. How far back should we go to determine impact on today's index value? That depends on the volatility of the data; going back too far may not reveal any useful information...but, it might, if for example it is cyclical.

How far forward should we go with the model? Estimating one day's opening index is likely to result in a lower overall error than, say, estimating ten days; simply because each of our day's estimate carries an error with it. This will only compound unless we have actual open indices to compare with to see if the propagating error is too large.

To summarise, we can estimate the next day's opening index by inferring from previous days' actual opening indices. Of course, if we are analysing historical data, we know the actual opening index for each historic day. So, we can compare our estimate with the actual (historic) index value and calculate the error. If the error is unacceptable (ie more than the mean target of 2%), we need to tweak our way of estimating the next (historic) day's index and hopefully it will improve.

And that, with an oversimplification, is how we use AI (Machine Learning) to model it. We don't define any equations, we define the model and let it learn for itself. But, first, there is an issue that we need to address, 'sequencing'...

Enter AI - machine learning: RNN and LSTM - to ensure it remembers the past...

A 'typical' Neural Network (NN) will forecast an outcome based on set of scenarios, but it has no sense of relationship between these scenarios.

For example, if presented with the phrase 'opened a new ____', a simple NN might predict the missing word as 'gate'. Which sounds reasonable.

However, if that phrase were part of the following two sentences,

'Mary trained as a chef. Today she opened a new ____'

- 'cafe' is likely to be far more appropriate.

However, this requires an understanding of 'sequence' - and the relationship between two sets of data, or two sentences in this example. And this is precisely what we need to model the FTSE 100 index data. Enter the Recurrent Neural Network, or RNN.

RNNs are more complex NNs, so I won't explain them here. An RNN not only bases its prediction on the current scenario, but it also makes use of the previous scenario - this enables us to use Machine Learning to predict the current FTSE 100 index based on past data.

One of the down-sides of RNNs is that if the scenarios (data) are far apart, the relationship tends to break down. Enter the greatly improved LSTM, or Long Short-Term Memory, originally developed in 1997. This is even more complex, so I definitely won't be explaining it!

The Machine Learning Model

I put together (and debugged!) the model, with multiple LSTM layers, in about 45 minutes.

Referring back to our commonsense approach, earlier, the model 'looks back' at the previous 60 days' indices to predict the 61st day's index. There is nothing magical about using the previous 60 days, but it is far enough back to smooth out sudden changes and provide information about a possible trend. Using different values may have produced better - or worse - model outcomes.

I ran the model on Google's cloud-based Colab system. This enabled me to use a 'graphical processing unit', which carries out parallel processing - speeding-up the model iterations process by a factor of 5. It took about 20 seconds to run in total - across 5 years' data, iterating 300 times and with more than 71,000 tunable parameters available to tweak during each iteration!

The results

I ran the model twice. Firstly, with 100 iterations and then 300 iterations. As you can see, the 300 iterations produced an improvement in mean error (down from 2.3% to 1.7%) and a large improvement in reducing the maximum error (from 38% down to 17%). Unsurprisingly, the greatest error occurred when there was the largest change in the value of the Index.

100 Iterations...

300 Iterations..

As you can see, it now provides a pretty good match to the historic data. And has a mean error of less than 2%. Job done!

Could I use this model to make a ton of money?

Very unlikely. Remember, this is modelling historic data. As soon as the prediction moves into the future the prediction errors compound.

You've been asked to model historic FTSE data with less than a 2% error. You've been given a hour!

Recent Posts

Comments