Regression


A regression is to find a best fit line to describe the form of the relationship. The name of the line is called the least squares regression line and it shows the relationship. The line is given in the form of an equation:
y = mx + b for a straight line regression
y = ax2 + bx + c for a quadratic regression
y = abx + c for an exponential regression

Usually, the regression line will be created using a set of data. Subsequently, the equation of the line found through the regression process will be used to make a prediction of an output (y) value for a proposed value of the input (x).

R (Correlation Coefficient )
Measures the strength of the linear relationship. It doesn’t show a curved relationship even if it’s a strong one. As R is closer to 0 a straight line is a poorer description of the data [bad fit], but when its close to either -1 or 1, it’s a strong fit. Below are some examples of correlation coefficients. The one all the way in the lower right corner when r = -.99 has the best fit of all 6 graphs. With such a high r, it shows how well the points actually fit to the line. The one in the upper left corner on the other hand has the worst fit since r = 0, it demonstrates no fit.
Picture_2.png
R^2 is known as the coefficient of determination which is the proportion of the y values explained by the least squares regression line. A high R^2 is a good linear fit.

*Linear

The equation is, the slope = R * standard deviation of y / standard deviation of x , the means of x and y are points on the least squares regression line.

From the regression line, you can calculate the residuals. A residual is the predicted value from the regression line.
A residual plot is a SCATTER PLOT of all the residuals.

How to find Regression line
1. Use a calculator and type your 2 lists of equal length.
2. Go to the CALC section under STATS and choose 4: LinReg (ax+b) , press enter
3. Type your 2 lists, separated by a comma and then VARS, Y-VARS, Function and then Y1
4. your R and R^2 will appear and your least squares regression is what y=
RegressionLines_0.png
This regression line is a good fit because it represents the linear relationship between x and y .

Picture_1.png
This scatterplot would not have a good fit to a linear regression line because the points demonstrate a nonlinear relationship.


To determine if the line is good or not
- R
- Residual plot
If the residual plot shows no systematic pattern, it is good.

Picture_4.png
This residual plot shows the residuals (distance of observed points from the predicted point on the regression line)



*Quadratic

if the graph is quadratic/ exponential, have to change it to make it have a good linear fit

power functions, when x is raised to a power, y= 4x^3
the points that have a strong linear fit for a power function would be (log(x), log(y)); these are the points for a strong linear fit.

*Exponential

Exponential function, y = 3^x
A strong linear fit for exponential = (x, log (y))

How to Find the Equation Using the Calculator
1. Use a calculator and type your 2 lists of equal length.
2. Go to the CALC section under STATS and choose 0: ExpReg, press enter
-Use this function on the calculator because if you were to use LinReg, then the line would not be a good representation of the data.
3. Type your 2 lists, separated by a comma and then VARS, Y-VARS, Function and then Y1
4. Reexpress the equation to make it linear using logarithms


Example1 :

wife
husband
22
25
32
25
50
51
25
25
33
38
27
30
45
60
47
54
30
31

1. Find the equation of the Least Squares Regression Line, correlation coefficient, and coefficient of determination.
2. Using the Least Squares Regression Line, what is the predicted age of the husband whose wife is 50? What is the value of the residual?

Example 2:

Time
Difference in temp
10
68
20
36
30
20
40
10
50
6
60
4
1. What is the equation of the exponential graph?
2. Reexpress the equation as linear fit using logarithms.
3. Use the equation to predict the difference in temperature after 45 minutes.

Answers:
Example 1
1. LSRL: y = 1.244X - 5.317 (equation that best matches the data) R = .921 (correlation coeff), R^2 = .849
2. y = (1.244)(50) - 5.317 = 56.883 years old
Residual = O - P
Residual = 51 - 56.883
Residual = -5.883

Example 2
1. LSRL: y = (114.055)(.944)X
2. ln(y) = -0.0576X + 4.737
3. ln(y) = (-0.0576)(45) + 4.737
ln(y) = 2.145
y = e^2.145
y = 8.542

Harder Questions


A researcher uses a regression equation to predict home heating bills (dollar cost), based on home size (square feet). The correlation between predicted bills and home size is 0.70. What is the correct interpretation of this finding?
(A) 70% of the variability in home heating bills can be explained by home size.
(B) 49% of the variability in home heating bills can be explained by home size.
(C) For each added square foot of home size, heating bills increased by 70 cents.
(D) For each added square foot of home size, heating bills increased by 49 cents.

Answer: b- r = .7 therefore r^2 = .49 r^2 explains the proportion of values the least squares regression line represents.


http://www.devexpress.com/Help/?document=XtraCharts/CustomDocument6231.htm&levelup=true
http://en.wikipedia.org/wiki/Linear_regression#Applications_of_linear_regressionhttp://www.stat.tamu.edu/~pkohli/303s mmer/ch10.pdfhttp://www.stat.tamu.edu/~pkohli/303summer/ch10.pdfhttp://stattrek.com/AP-Statistics-1/Regression.aspx?Tutorial=Stat