Application of Machine Learning to Construct Investment Portfolios

Application of Machine Learning to Construct Investment Portfolios
Speaker: Dr. Eugene Pinsky, Associate Professor of the Practice
Moderated by: Kia Teymourian, Assistant Professor of Computer Science
January 28, 2021

Abstract: One of the primary applications of machine learning methods is to uncover patterns from historical relationships and trends in the data. In unsupervised learning, we apply these methods and let the system itself find structure in the data. The most widely used algorithm in such learning is kmeans clustering that looks for patterns in data and partitions the data points into k groups (“clusters”). The key to using such a method is the existence of similarity (distance) measure with resulting clusters containing “close” points. With clustering, we discover patterns in data and classify data according to cluster membership. This provides a way to describe even the large data sets in much simpler terms.

Such clustering methods have been successfully used in a variety of areas. In many applications, data points could have many dimensions and evolve over time. It is of interest, therefore, not only to identify the clusters but to examine the evolution of the data patterns in time. To address this problem, we divide time into periods, add time periods as an extra description to data points and apply k-means clustering to the resulting data. With this extension, the data patterns within the same time periods are reflected in the same cluster membership, whereas temporal evolution of these patterns can be described by paths in the appropriate (cluster, time) space. As a result, the time behavior of points in a large data set can be described in a much simpler way by simply specifying a trajectory of clusters that these points occupy in the successive time periods. To measure the difference of patterns in time, we propose to use the Hamming distance of their trajectories. This measure is easy to compute and visualize. It has a simple analytical interpretation as the duration of time when data points exhibit different patterns (different clusters).

We illustrate this approach by considering an important practical problem in financial industry: choosing mutual funds to construct diversified investment portfolios. The traditional approach of constructing such portfolios involves computing the correlations between the component funds. However, these correlations change over time, especially during significant market events. This makes it difficult to construct such portfolios in practice. We use historical daily data on thousands of mutual funds over multi-year period. We then apply k-means clustering to the vectors of daily returns for each fund and year. This allows to represent annual correlation patterns of funds by clusters and to describe changing fund correlations by cluster paths. We use Hamming distance of cluster trajectories as a measure portfolio diversification. It has a simple intuitive interpretation as the average time when funds exhibit different performance return patterns. We present some examples and compare the portfolios computed by traditional approach and by clustering.

The proposed method can be used in many applications where we need to analyze and visualize evolution of large multi-dimensional data patterns over time.

View all posts