The Road to Financial Freedom Starts Here!

Tuesday, April 17, 2012

Making Forecasts with Linear Regression, Part 1

PERSONAL FINANCE 101

The tricky thing about investing is that we don't know what's going to happen in the future--if we did then we'd all be millionaires (is this actually true? bonus points to anyone who can tell me why it is not). This is why forecasting is one of the most critical--although, unfortunately, often also the most mishandled--aspects of business and financial management. For example, almost all of an enterprise's processes--budgeting, strategic planning, operations, marketing, financial management--heavily rely on a well-thought out sales forecast. Also, when people talk about stock price movements we almost always hear the phrase "earnings projections" mentioned in the same breath because essentially, the price or worth of a stock or any asset is just a measure of how much cash flows it can generate in the future.

There are perhaps a million ways to forecast something, ranging from the mundane (e.g., applying the latest growth rate to this year's numbers) to the esoteric (e.g., autoregressive integrated moving average models). What I'll discuss in this post falls somewhere in between: forecasting with linear regression is simple enough that it can be easily performed with spreadsheet programs and powerful enough that it can be pretty reliable given the right data set. Unfortunately, linear regression is not enough for what most of you are actually itching to do: forecast stock prices. If you really want to do that, you might want to learn the "esoteric" method that I mentioned above.

Throughout this post, to be fair to those who have managed to exorcise Microsoft out of their lives, I will be using Google Spreadsheets.

Step 0. What are you trying to forecast?

I will try to use linear regression to answer a very relevant and timely question: when will Investor Juan reach 50,000 monthly pageviews?

Step 1. Take a look at the data--literally!

The basic premise of linear regression forecasting is that the data follow a trend that somewhat resembles a straight line and that there's reason to believe that this linear trend will persist in the foreseeable future. This premise implies two requirements. First, that there is enough data to be able to clearly distinguish a trend. In this example, I will use Investor Juan's pageviews in the past 20 months.

Second, the data should exhibit a clear linear trend, even if the trend is not perfectly linear (as is always the case). Regression analysis, in general, evaluates the possible linear relationship between two variables--an independent variable and a dependent variable; in using linear regression for forecasting, we just use "time" as the independent variable. Therefore, before we process the numbers, we need to first plot the data (using a "scatter" plot or graph) to "see" whether two variables have a somewhat linear relationship.

For example, from the graph below, it's easy to see that the two variables have a positive linear relationship; the plot shows that in general, high grip strength is associated with high arm strength, and vice versa (notice that I just said "associated with" and not "caused by"; that's an important limitation of regression analysis--it can only show correlation but not causation).


In this second example, the relationship is not so clear. While we can still use regression analysis to derive a line that best describes the relationship, the result will not be meaningful.


So would our data pass this inspection "test"? If you use create a scatter plot using the pageview data I provided above, you'll see that monthly pageviews--our dependent variable of interest--seems to increase in a linear fashion over time, making linear regression an appropriate tool for forecasting.


In Part 2, I will continue the procedure by identifying and defining the linear trend using Google Spreadsheets functions and using the results to forecast.