Complete Guide to Regression & Correlation | NEB Grade 12 Business Math
Section 1: Introduction to Regression and Correlation
What is Regression?
The literal meaning of the term 'regression' is moving backward or returning to the average value[cite: 534]. Originally developed by Sir Francis Galton to study the heights of fathers and sons, regression analysis is now a mathematical measure of the average relationship between two or more variables[cite: 535, 538]. It is primarily considered as a device used to predict the value of one variable when the value of the other variable is known[cite: 8, 537].
Types of Regression
The analysis which is restricted to studying only two variables at a time is known as a simple linear regression analysis[cite: 10, 539]. Within this, there are always two lines of regression[cite: 555]:
- Regression line of Y on X: Used to estimate the dependent variable Y for any given value of the independent variable X[cite: 556].
- Regression line of X on Y: Used to estimate the dependent variable X for any given value of the independent variable Y[cite: 556].
Essential Formulas to be Used
Here are the fundamental formulas required to solve regression and correlation problems in the NEB curriculum.
1. Regression Equations
Using the deviations from the arithmetic means:
- Regression equation of Y on X: $$Y-\overline{Y}=b_{yx}(X-\overline{X})$$ [cite: 107, 484]
- Regression equation of X on Y: $$X-\overline{X}=b_{xy}(Y-\overline{Y})$$ [cite: 111, 491]
2. Regression Coefficients (Direct Method)
When calculating directly from the given data set:
- Coefficient of Y on X: $$b_{yx}=\frac{n\Sigma xy-\Sigma x\Sigma y}{n\Sigma x^2-(\Sigma x)^2}$$ [cite: 89, 125]
- Coefficient of X on Y: $$b_{xy}=\frac{n\Sigma xy-\Sigma x\Sigma y}{n\Sigma y^2-(\Sigma y)^2}$$ [cite: 117, 126]
3. Correlation Coefficient
The correlation coefficient between two variables is the geometric mean between the two regression coefficients[cite: 141, 716].
- Formula: $$r=\sqrt{b_{yx}\cdot b_{xy}}$$ [cite: 155, 716]
Section 2: Case Question & Solution (Easy Level)
Case Question: Baby Weight Prediction
The following table gives the normal weight of a baby during the first six months of life[cite: 236]:
| Age in month (x) | 0 | 2 | 3 | 5 | 6 |
|---|---|---|---|---|---|
| Weight (y) | 5 | 7 | 8 | 10 | 12 |
Requirement: Estimate the weight of a baby at the age of 4 months[cite: 238].
💡 Tips & Tricks for Easy Questions
- Identify Variables Correctly: The variable you need to predict is always the dependent variable (Y). The variable you are given to make the prediction is the independent variable (X). Here, age is X, and weight is Y[cite: 240].
- Use the Assumed Mean Method: When numbers are simple but you want to avoid large totals, take an assumed mean (a and b) from the middle of your X and Y values. This creates smaller calculation variables:
u = x - aandv = y - b[cite: 122].
Step-by-Step Solution
Let x and y represent the age and the weight respectively[cite: 240]. We will use the assumed mean method, taking assumed mean for x as 3 (a = 3) and for y as 8 (b = 8)[cite: 242].
Step 1: Computation Table
Let's calculate the required sums for our formula[cite: 242]:
| x | y | u = x - 3 | v = y - 8 | u2 | uv |
|---|---|---|---|---|---|
| 0 | 5 | -3 | -3 | 9 | 9 |
| 2 | 7 | -1 | -1 | 1 | 1 |
| 3 | 8 | 0 | 0 | 0 | 0 |
| 5 | 10 | 2 | 2 | 4 | 4 |
| 6 | 12 | 3 | 4 | 9 | 12 |
| n = 5 | Σu = 1 | Σv = 2 | Σu2 = 23 | Σuv = 26 |
Step 2: Calculate the Means
- Mean of x (x̄) = a + (Σu / n) = 3 + (1 / 5) = 3.2 [cite: 245]
- Mean of y (ȳ) = b + (Σv / n) = 8 + (2 / 5) = 8.4 [cite: 246]
Step 3: Calculate Regression Coefficient (byx)
Formula: byx = (nΣuv - ΣuΣv) / (nΣu2 - (Σu)2) [cite: 247]
byx = (5 × 26 - 1 × 2) / (5 × 23 - (1)2) = 128 / 114 = 1.12 [cite: 248]
Step 4: Formulate Regression Equation of Y on X
Formula: y - ȳ = byx(x - x̄) [cite: 250]
- y - 8.4 = 1.12(x - 3.2) [cite: 254]
- y - 8.4 = 1.12x - 3.584 [cite: 254]
- y = 1.12x + 4.82 [cite: 255]
Step 5: Estimate the Required Value
To find the weight at 4 months, substitute x = 4 into the equation[cite: 256]:
y = 1.12(4) + 4.82 = 9.3 [cite: 256]
Section 3: Case Question & Solution (Medium Level)
Case Question: Intersecting Regression Lines
Two lines of regression are $x+2y=5$ and $2x+3y=8$[cite: 166]. It is given that the variance of the x-series is $\sigma_x^2=12$[cite: 169].
Requirement: Calculate the arithmetic means $\overline{x}$ and $\overline{y}$, the regression coefficients $b_{yx}$ and $b_{xy}$, the correlation coefficient $r$, and the variance of y, $\sigma_y^2$[cite: 170].
💡 Tips & Tricks for Medium Questions
- The Intersection Property: The two regression lines always intersect at the exact point of their arithmetic means $(\overline{x}, \overline{y})$[cite: 175]. Solving the two given equations simultaneously will immediately give you the means!
- Identifying the Lines: You usually have to guess which equation is "y on x" and which is "x on y". Express one as $y = mx + c$ and the other as $x = my + c$. The slopes are your $b_{yx}$ and $b_{xy}$.
- The Ultimate Check: The product of the two regression coefficients must be less than or equal to 1 ($r^2 = b_{yx} \cdot b_{xy} \le 1$)[cite: 142]. If your product is greater than 1, you guessed the lines wrong and need to swap them!
Step-by-Step Solution
Step 1: Calculate the Arithmetic Means
Since the two lines of regression pass through the point $(\overline{x}, \overline{y})$, we can replace $x$ and $y$ with their means[cite: 175]:
(i) $\overline{x} + 2\overline{y} = 5$ [cite: 178]
(ii) $2\overline{x} + 3\overline{y} = 8$ [cite: 179]
Multiplying equation (i) by 2 gives $2\overline{x} + 4\overline{y} = 10$. Subtracting equation (ii) from this new equation yields $\overline{y} = 2$. Substituting $\overline{y} = 2$ back into equation (i) gives $\overline{x} + 4 = 5$, which means $\overline{x} = 1$.
Result: $\overline{x} = 1$, $\overline{y} = 2$[cite: 184].
Step 2: Find the Regression Coefficients
Let's assume the first equation is the regression line of y on x. We must arrange it in the form $y = a + bx$[cite: 69]:
$x + 2y = 5$
$2y = -x + 5$
$y = -\frac{1}{2}x + \frac{5}{2}$ [cite: 181]
Therefore, the slope is the regression coefficient: $b_{yx} = -\frac{1}{2}$[cite: 186].
Now, we assume the second equation is the regression line of x on y. We arrange it in the form $x = a + by$:
$2x + 3y = 8$
$2x = -3y + 8$
$x = -\frac{3}{2}y + \frac{8}{2}$ [cite: 182]
Therefore, the slope is the regression coefficient: $b_{xy} = -\frac{3}{2}$[cite: 187].
Step 3: Calculate the Correlation Coefficient ($r$)
The correlation coefficient is the geometric mean of the regression coefficients[cite: 141]. Since both regression coefficients are negative, $r$ must also be negative[cite: 143, 144].
$$r = -\sqrt{b_{yx} \cdot b_{xy}}$$ [cite: 189]
$$r = -\sqrt{(-\frac{1}{2}) \times (-\frac{3}{2})}$$ [cite: 190]
$$r = -\frac{\sqrt{3}}{2} \approx -0.87$$ [cite: 190]
Step 4: Calculate the Variance of Y ($\sigma_y^2$)
We know the formula for the regression coefficient $b_{yx}$ in terms of standard deviations[cite: 93]:
$$b_{yx} = r \frac{\sigma_y}{\sigma_x} = -\frac{1}{2}$$ [cite: 192]
Squaring both sides of the equation to use our given variance ($\sigma_x^2 = 12$)[cite: 169]:
$$r^2 \frac{\sigma_y^2}{\sigma_x^2} = \frac{1}{4}$$ [cite: 193]
Substitute $r^2 = \frac{3}{4}$ and $\sigma_x^2 = 12$ into the equation:
$$(\frac{3}{4}) \times \frac{\sigma_y^2}{12} = \frac{1}{4}$$ [cite: 194]
Solving for $\sigma_y^2$:
$$\frac{3\sigma_y^2}{48} = \frac{1}{4}$$
$$12\sigma_y^2 = 48$$
$$\sigma_y^2 = 4$$ [cite: 194]
- $\overline{x} = 1$, $\overline{y} = 2$
- $b_{yx} = -0.5$, $b_{xy} = -1.5$
- $r = -0.87$
- $\sigma_y^2 = 4$
Section 4: Case Question & Solution (Hard Level)
Case Question: Academic Performance Analysis
The marks secured in Economics and Statistics by 10 students are given below:
| Marks in Economics (X) | 25 | 28 | 35 | 32 | 31 | 36 | 29 | 38 | 34 | 32 |
|---|---|---|---|---|---|---|---|---|---|---|
| Marks in Statistics (Y) | 43 | 46 | 49 | 41 | 36 | 32 | 31 | 30 | 33 | 39 |
Requirements:
i) Calculate the correlation coefficient between the marks in Economics and Statistics.
ii) Find the regression equation of Y on X (Statistics on Economics).
iii) Estimate the most likely marks in Statistics when the marks in Economics are 30.
🔥 Advanced Tricks for Long Questions
- The "Actual Mean" Trick: Before creating a massive table with X2, Y2, and XY (which results in huge numbers like 49 × 49 = 2401), quickly calculate the average (mean) of X and Y. If they are whole numbers, use the Actual Mean Method!
- How it works: Find deviations:
x = X - Meanandy = Y - Mean. Your new numbers will be tiny (like -2, 3, 0), making squaring and multiplying incredibly easy and error-free. - Self-Check: The sum of your deviations (Σx and Σy) MUST equal exactly 0. If they don't, you made an addition error!
Step-by-Step Solution
Step 1: Check the Means
First, let's find the sums of X and Y to check their means:
- ΣX = 25 + 28 + 35 + ... + 32 = 320
- ΣY = 43 + 46 + 49 + ... + 39 = 380
- Mean of X (X̄) = 320 / 10 = 32
- Mean of Y (Ȳ) = 380 / 10 = 38
Since both means are whole numbers, we will use the Actual Mean Method.
Step 2: Computation Table
We calculate deviations: x = X - 32 and y = Y - 38.
| X | Y | x = X - 32 | y = Y - 38 | x2 | y2 | xy |
|---|---|---|---|---|---|---|
| 25 | 43 | -7 | 5 | 49 | 25 | -35 |
| 28 | 46 | -4 | 8 | 16 | 64 | -32 |
| 35 | 49 | 3 | 11 | 9 | 121 | 33 |
| 32 | 41 | 0 | 3 | 0 | 9 | 0 |
| 31 | 36 | -1 | -2 | 1 | 4 | 2 |
| 36 | 32 | 4 | -6 | 16 | 36 | -24 |
| 29 | 31 | -3 | -7 | 9 | 49 | 21 |
| 38 | 30 | 6 | -8 | 36 | 64 | -48 |
| 34 | 33 | 2 | -5 | 4 | 25 | -10 |
| 32 | 39 | 0 | 1 | 0 | 1 | 0 |
| Σx = 0 | Σy = 0 | Σx2 = 140 | Σy2 = 398 | Σxy = -93 |
Step 3: Calculate Correlation Coefficient (r)
Using the actual mean formula:
r = Σxy / √(Σx2 × Σy2)
r = -93 / √(140 × 398)
r = -93 / √(55720)
r = -93 / 236.05
r = -0.394 (There is a low degree of negative correlation)
Step 4: Regression Equation of Y on X
First, find the regression coefficient (byx) using the actual mean formula:
byx = Σxy / Σx2 = -93 / 140 = -0.664
Now, use the regression equation formula:
Y - Ȳ = byx(X - X̄)
Y - 38 = -0.664(X - 32)
Y - 38 = -0.664X + 21.248
Y = -0.664X + 21.248 + 38
Y = 59.248 - 0.664X
Step 5: Estimate the Required Value
We need to estimate the marks in Statistics (Y) when marks in Economics (X) is 30.
Y = 59.248 - 0.664(30)
Y = 59.248 - 19.92
Y = 39.328
Section 5: Case Question & Solution (Expert Level)
Case Question: Predicting Marks with Standard Deviation
Given the following data, calculate the mark in Mathematics obtained by a student who has secured 80 marks in English[cite: 259, 260]:
| Subject | Math | English |
|---|---|---|
| Mean mark | 80 [cite: 263] | 64 [cite: 264] |
| Standard Deviation (s.d.) | 3 [cite: 266] | 4 [cite: 267] |
The correlation coefficient between the marks of Mathematics and English is -0.40[cite: 268].
🎓 Pro Tips for Summary Data Questions
- No Table Needed: When you are given means, standard deviations, and the correlation coefficient, do not try to build a data table. You must use the direct formula that links these three statistical measures.
- The Golden Formula: The regression coefficient can be found directly using:
b = r × (σdependent / σindependent). - Identify the Target: You are asked to calculate the mark in Math based on English. Therefore, Math is your dependent variable (x) and English is your independent variable (y). You need the regression equation of x on y[cite: 270, 273].
[Image of a linear regression graph showing the line of best fit and data points]
Step-by-Step Solution
Step 1: Assign Variables and List Givens
Let x and y represent the marks in Math and English respectively[cite: 270].
- Mean of x (x̄) = 80 [cite: 271]
- Mean of y (ȳ) = 64 [cite: 271]
- Standard Deviation of x (σx) = 3 [cite: 271]
- Standard Deviation of y (σy) = 4 [cite: 271]
- Correlation Coefficient (r) = -0.40 [cite: 271]
Step 2: Calculate the Regression Coefficient of x on y (bxy)
Using the relationship between correlation and standard deviation:
bxy = r × (σx / σy) [cite: 272]
bxy = -0.40 × (3 / 4) [cite: 272]
bxy = -0.3 [cite: 272]
Step 3: Formulate the Regression Equation of x on y
The standard formula for the regression equation is:
x - x̄ = bxy(y - ȳ) [cite: 273]
Substitute the known values:
x - 80 = -0.3(y - 64) [cite: 274]
x - 80 = -0.3y + 19.2 [cite: 274]
x = -0.3y + 19.2 + 80
x = -0.3y + 99.2 [cite: 275]
Step 4: Estimate the Required Value
We are asked to find the mark in Math (x) when the student has secured 80 marks in English (y = 80)[cite: 259, 260, 276].
Substitute y = 80 into our new equation:
x = -0.3(80) + 99.2 [cite: 277]
x = -24 + 99.2
x = 75.2 [cite: 277]
End of the Comprehensive Guide to NEB Grade 12 Regression and Correlation.
Comments
Post a Comment