In this exercise, we will briefly present the use of MATLAB^{®} to
perform linear regression, or fitting a straight line to a set of data. The form
of the equation of a straight line is:

\[y = mx + b\] |
(1) |

In equation (1), *x* is the abscissa and *y* is the ordinate.
*M* is the slope of the line, and *b* is the y-intercept.

In this exercise, we will assume a basic familiarity with the MATLAB
**desktop** and **workspace**. Some background information about MATLAB is
provided at the link in the related materials section above.

Use of MATLAB to fit a straight line to a set of data will be presented in the context of an example. We will assume that we have measured a set of five data points as follows:

(1.3, 2.1), (2.4, 5.2), (3.1, 6.4), (4.2, 7.7), (5.6, 11.2)

Above, the data are presented of pairs of \({\rm{(x,y)}}\) points. Steps for using this data to create a straight-line curve fit using MATLAB are presented below.

We first need to save our data in the workspace. To do this, we will create two
variables, *x* and *y*. The variable *x* will contain the
“*x*” values in the above data pairs, and the variable
*y* will contain the “*y*” values in the above data
pairs. To create our variables, we type the following at the command
prompt^{1}:

\[x{\rm{ }} = {\rm{ }}\left[ {1.3,{\rm{ }}2.4,{\rm{ }}3.1,{\rm{ }}4.2,{\rm{ }}5.6} \right]\] | and | \[y{\rm{ }} = {\rm{ }}\left[ {2.1,{\rm{ }}5.2,{\rm{ }}6.4,{\rm{ }}7.7,{\rm{ }}11.2} \right]\] |

The MATLAB function to plot a set of data is plot. In
its simplest incarnation, the plot function accepts
two variables, containing the data to be plotted. The variables should be in the
order *x*, *y*—the data in the *x* variable goes on the
x-axis and the data in the variable *y* goes on the y-axis. A string
providing some formatting options can follow these variables. We will start out
by just plotting our data as circles on the graph; circles are denoted by a
lower-case “o”. To create the plot, type the following at the command
prompt:

- plot(x,y,'o')

This should cause a figure window to open, and the plot to be displayed. My result is shown below:

MATLAB's polyfit function performs least-squares curve fitting. Polyfit will fit an arbitrary-order polynomial to a set of data. General syntax for the function is:

- P = polyfit(x,y,n)

Where *x* and *y* are vectors containing the data to be fit,
*n* is the order of polynomial to be fit to the data (a straight line is a
first order polynomial, so we will always set n = 1). The function returns a
vector containing the coefficients of the polynomial which provides a
least-squares fit to the data. For n = 1 a two-element vector will be returned;
the first element of the vector will be the slope of the line (*m*, in
equation (1)) and the second element will be the y-intercept of the line
(*b*, in equation (1)).

To create a best fit straight line for our data, type the following at the command prompt:

- P = polyfit(x,y,1)

MATLAB should respond with:

- P = 1.9984 - 0.1145

So, for our equation, m = 1.9984 and b is -0.1145. Thus, the equation for our line is:

- y = 1.9984x - 0.1145

We will create another variable in the workspace, called *y_fit*, which
contains the values of the best fit line at the *x* data points we
acquired. To do this, type:

- y_fit = p(1)*x+p(2)

At the command prompt. In the above, p(1) means to take the first element in the
variable p; this is the slope of the curve. P(2) in the above is the second
element in the variable *p*;the y-intercept.

The result of the above command should be:

- y_fit = 2.4833 4.6815 6.0804 8.2786 11.0763

So, for our equation, m = 1.9984 and b is -0.1145. Thus, the equation for our line is:

- y = 1.9984x - 0.1145

Now, to plot the curve fit, we will *add* a line to our previous plot. In
order for the new line to be added to the plot, rather than have it
*replace* the previous figure, we will “hold” the figure. To
hold the figure, type:

- hold on

At the command prompt. Now, to plot the curve fit, type:

- plot(x, y_fit)

Your resulting plot should look like this:

MATLAB's corrcoef function provides the correlation coefficient of two data sets. Possible syntax for using this function is:

- r = corrcoef(x,y)

Where *x* and *y* are vectors containing the data. This use of the
function will return a \({\rm{2}} \times {\rm{2}}\) matrix; it will have the
following form:

- \(r = \left[ {\begin{array}{*{20}{c}}{{r_{xx}}} &{{r_{xy}}}\\{{r_{yx}}}&{{r_{yy}}}\end{array}} \right]\)

This matrix provides correlations between all possible combinations of the data
provided to the function. \({r_{xx}}\) is the correlation between the *x*
data and itself. Likewise, \({r_{yy}}\) is the correlation between the *y*
data and itself. Since data is always perfectly correlated with itself,
\({r_{xx}}\) = \({r_{yy}}\) = 1 always. \({r_{xy}}\) is the correlation between
the *x* data and the *y* data, and \({r_{yx}}\) is the correlation
between the *y* data and the *x* data. For us,
\({r_{xy}} = {r_{yx}}\). Thus, either the \({r_{xy}}\) or \({r_{yx}}\) terms will
give us the correlation coefficient.

For our example, the result of typing r = corrcoef(x,y) is:

- r = 1.0000 0.9902 0.9902 1.0000

^{1}The command prompt is**>>**in the command window. The command window is the big blank space in the center of the MATLAB desktop.