Result Summary

The intention of this page is to easily observe our generalized conclusions from this project and state future improvements on this model. More specific results can be found under the Results Specifics tab.

Results Summary

In terms of the market as a whole, ZeroR returned approximately 50% accuracy in each model. This is an interesting result that implies that without any prior knowledge of the stock, the stock has a 50% chance of going up or down. This is a popular hypothesis in modern portfolio theory, and this was confirmed in our results.

In terms of model specific results, the most significant result was obtained using Weekly Year + as the data set, which includes price and volume data and its sector movement, as well as attributes that are derived from that data such as the stock's volatility and beta. When performed on this data set, logistic regression returns a 67% accuracy, which is substantial for modeling something as unpredictable as stocks using purely technical analysis -- portfolios can be constructed on around this type of accuracy using proper hedging.

However, while our models managed to outperform ZeroR by a non-insignificant margin every time, we found that for the most part, the model ended up showing characteristics of overfitting despite being subjected to 10-fold cross validation. Essentially, we were able to consistently build models that managed to get results based on the data we had, but the weights and implications given to the attributes in the models were contradictory depending on the data set we used, as can be seen in more detail in Results Specifics. One result was a proponent of momentum trading, which is that if a stock has a positive return over 5% over the past 6 months, the stock will continue to go up the 2 months afterwards with a 56% accuracy, whereas another result stated if a stock has had good general health relative to the market, then the share will over the next two months with a 67% accuracy (most likely because of investors selling after a stock high). While we acknowledge that the latter theory has a much higher accuracy and is therefore more likely, there is still an apparent contradiction. While profits can be made using predictions from our model, there isn't a clear strategy that the model proved to be the cause of it.

This can be attributed to weaknesses in the model or overfitting, however, the more likely cause is that the Efficient Market Hypothesis is true. Given solely a stock's price price and volume data (as well as information such as sector), then all information available to the public has been priced into its current price, and therefore its future direction is unpredictable.

Future Improvements

There is definitely room for improvements in this model, and they are listed as follows:

Run the same data through the classifier multiple times, but change the target attribute each time to see how previous pricing data can predict price movement in different timeframes, e.g. predict how the stock will do the next day vs how the stock will do over the next month.
Use a neural network classifier. For our project, we shied away from using neural networks because it was a lot slower and had lower accuracy, which was off-putting. However, due to its ability to do well with linearly inseparable data and because of its presence in current literature, there is significant