Curve Fitting

Linear Regression Using MATLABĀ®

Curve Fitting:

Linear Regression Using MATLAB

Introduction

In this exercise, we will briefly present the use of MATLAB® to perform linear regression, or fitting a straight line to a set of data. The form of the equation of a straight line is:

\[y = mx + b\]

(1)

In equation (1), x is the abscissa and y is the ordinate. M is the slope of the line, and b is the y-intercept.

In this exercise, we will assume a basic familiarity with the MATLAB desktop and workspace. Some background information about MATLAB is provided at the link in the related materials section above.

Use of MATLAB to fit a straight line to a set of data will be presented in the context of an example. We will assume that we have measured a set of five data points as follows:

(1.3, 2.1), (2.4, 5.2), (3.1, 6.4), (4.2, 7.7), (5.6, 11.2)

Above, the data are presented of pairs of \({\rm{(x,y)}}\) points. Steps for using this data to create a straight-line curve fit using MATLAB are presented below.

Define Variables

We first need to save our data in the workspace. To do this, we will create two variables, x and y. The variable x will contain the “x” values in the above data pairs, and the variable y will contain the “y” values in the above data pairs. To create our variables, we type the following at the command prompt1:

\[x{\rm{ }} = {\rm{ }}\left[ {1.3,{\rm{ }}2.4,{\rm{ }}3.1,{\rm{ }}4.2,{\rm{ }}5.6} \right]\] and \[y{\rm{ }} = {\rm{ }}\left[ {2.1,{\rm{ }}5.2,{\rm{ }}6.4,{\rm{ }}7.7,{\rm{ }}11.2} \right]\]

Plot the Data Itself

The MATLAB function to plot a set of data is plot. In its simplest incarnation, the plot function accepts two variables, containing the data to be plotted. The variables should be in the order x, y—the data in the x variable goes on the x-axis and the data in the variable y goes on the y-axis. A string providing some formatting options can follow these variables. We will start out by just plotting our data as circles on the graph; circles are denoted by a lower-case “o”. To create the plot, type the following at the command prompt:

  • plot(x,y,'o')

This should cause a figure window to open, and the plot to be displayed. My result is shown below:

Create a Straight-line Curve Fit to the Data

MATLAB's polyfit function performs least-squares curve fitting. Polyfit will fit an arbitrary-order polynomial to a set of data. General syntax for the function is:

  • P = polyfit(x,y,n)

Where x and y are vectors containing the data to be fit, n is the order of polynomial to be fit to the data (a straight line is a first order polynomial, so we will always set n = 1). The function returns a vector containing the coefficients of the polynomial which provides a least-squares fit to the data. For n = 1 a two-element vector will be returned; the first element of the vector will be the slope of the line (m, in equation (1)) and the second element will be the y-intercept of the line (b, in equation (1)).

To create a best fit straight line for our data, type the following at the command prompt:

  • P = polyfit(x,y,1)

MATLAB should respond with:

  • P = 1.9984 - 0.1145

So, for our equation, m = 1.9984 and b is -0.1145. Thus, the equation for our line is:

  • y = 1.9984x - 0.1145

Plot the Curve Fit Over the Data

We will create another variable in the workspace, called y_fit, which contains the values of the best fit line at the x data points we acquired. To do this, type:

  • y_fit = p(1)*x+p(2)

At the command prompt. In the above, p(1) means to take the first element in the variable p; this is the slope of the curve. P(2) in the above is the second element in the variable p;the y-intercept.

The result of the above command should be:

  • y_fit = 2.4833 4.6815 6.0804 8.2786 11.0763

So, for our equation, m = 1.9984 and b is -0.1145. Thus, the equation for our line is:

      y = 1.9984x - 0.1145

Now, to plot the curve fit, we will add a line to our previous plot. In order for the new line to be added to the plot, rather than have it replace the previous figure, we will “hold” the figure. To hold the figure, type:

  • hold on

At the command prompt. Now, to plot the curve fit, type:

  • plot(x, y_fit)

Your resulting plot should look like this:

Calculate a Correlation Coefficient

MATLAB's corrcoef function provides the correlation coefficient of two data sets. Possible syntax for using this function is:

  • r = corrcoef(x,y)

Where x and y are vectors containing the data. This use of the function will return a \({\rm{2}} \times {\rm{2}}\) matrix; it will have the following form:

  • \(r = \left[ {\begin{array}{*{20}{c}}{{r_{xx}}} &{{r_{xy}}}\\{{r_{yx}}}&{{r_{yy}}}\end{array}} \right]\)

This matrix provides correlations between all possible combinations of the data provided to the function. \({r_{xx}}\) is the correlation between the x data and itself. Likewise, \({r_{yy}}\) is the correlation between the y data and itself. Since data is always perfectly correlated with itself, \({r_{xx}}\) = \({r_{yy}}\) = 1 always. \({r_{xy}}\) is the correlation between the x data and the y data, and \({r_{yx}}\) is the correlation between the y data and the x data. For us, \({r_{xy}} = {r_{yx}}\). Thus, either the \({r_{xy}}\) or \({r_{yx}}\) terms will give us the correlation coefficient.

For our example, the result of typing r = corrcoef(x,y) is:

  • r = 1.0000 0.9902 0.9902 1.0000

  • 1 The command prompt is >> in the command window. The command window is the big blank space in the center of the MATLAB desktop.
  • Other product and company names mentioned herein are trademarks or trade names of their respective companies. © 2014 Digilent Inc. All rights reserved.