Complete Guide to Regression & Correlation | NEB Grade 12 Business Math

Section 1: Introduction to Regression and Correlation

What is Regression?

The literal meaning of the term 'regression' is moving backward or returning to the average value[cite: 534]. Originally developed by Sir Francis Galton to study the heights of fathers and sons, regression analysis is now a mathematical measure of the average relationship between two or more variables[cite: 535, 538]. It is primarily considered as a device used to predict the value of one variable when the value of the other variable is known[cite: 8, 537].

Types of Regression

The analysis which is restricted to studying only two variables at a time is known as a simple linear regression analysis[cite: 10, 539]. Within this, there are always two lines of regression[cite: 555]:

  • Regression line of Y on X: Used to estimate the dependent variable Y for any given value of the independent variable X[cite: 556].
  • Regression line of X on Y: Used to estimate the dependent variable X for any given value of the independent variable Y[cite: 556].

Essential Formulas to be Used

Here are the fundamental formulas required to solve regression and correlation problems in the NEB curriculum.

1. Regression Equations

Using the deviations from the arithmetic means:

  • Regression equation of Y on X: $$Y-\overline{Y}=b_{yx}(X-\overline{X})$$ [cite: 107, 484]
  • Regression equation of X on Y: $$X-\overline{X}=b_{xy}(Y-\overline{Y})$$ [cite: 111, 491]

2. Regression Coefficients (Direct Method)

When calculating directly from the given data set:

  • Coefficient of Y on X: $$b_{yx}=\frac{n\Sigma xy-\Sigma x\Sigma y}{n\Sigma x^2-(\Sigma x)^2}$$ [cite: 89, 125]
  • Coefficient of X on Y: $$b_{xy}=\frac{n\Sigma xy-\Sigma x\Sigma y}{n\Sigma y^2-(\Sigma y)^2}$$ [cite: 117, 126]

3. Correlation Coefficient

The correlation coefficient between two variables is the geometric mean between the two regression coefficients[cite: 141, 716].

  • Formula: $$r=\sqrt{b_{yx}\cdot b_{xy}}$$ [cite: 155, 716]
💡 Quick Tip: Both regression coefficients will always have the same sign. The sign of the correlation coefficient depends entirely upon the sign of these regression coefficients[cite: 143, 144]. If both are positive, correlation is positive; if both are negative, correlation is negative.

Section 2: Case Question & Solution (Easy Level)

Case Question: Baby Weight Prediction

The following table gives the normal weight of a baby during the first six months of life[cite: 236]:

Age in month (x) 02356
Weight (y) 5781012

Requirement: Estimate the weight of a baby at the age of 4 months[cite: 238].

💡 Tips & Tricks for Easy Questions

  • Identify Variables Correctly: The variable you need to predict is always the dependent variable (Y). The variable you are given to make the prediction is the independent variable (X). Here, age is X, and weight is Y[cite: 240].
  • Use the Assumed Mean Method: When numbers are simple but you want to avoid large totals, take an assumed mean (a and b) from the middle of your X and Y values. This creates smaller calculation variables: u = x - a and v = y - b[cite: 122].

Step-by-Step Solution

Let x and y represent the age and the weight respectively[cite: 240]. We will use the assumed mean method, taking assumed mean for x as 3 (a = 3) and for y as 8 (b = 8)[cite: 242].

Step 1: Computation Table

Let's calculate the required sums for our formula[cite: 242]:

x y u = x - 3 v = y - 8 u2 uv
05-3-399
27-1-111
380000
5102244
61234912
n = 5 Σu = 1 Σv = 2 Σu2 = 23 Σuv = 26

Step 2: Calculate the Means

  • Mean of x (x̄) = a + (Σu / n) = 3 + (1 / 5) = 3.2 [cite: 245]
  • Mean of y (ȳ) = b + (Σv / n) = 8 + (2 / 5) = 8.4 [cite: 246]

Step 3: Calculate Regression Coefficient (byx)

Formula: byx = (nΣuv - ΣuΣv) / (nΣu2 - (Σu)2) [cite: 247]

byx = (5 × 26 - 1 × 2) / (5 × 23 - (1)2) = 128 / 114 = 1.12 [cite: 248]

Step 4: Formulate Regression Equation of Y on X

Formula: y - ȳ = byx(x - x̄) [cite: 250]

  • y - 8.4 = 1.12(x - 3.2) [cite: 254]
  • y - 8.4 = 1.12x - 3.584 [cite: 254]
  • y = 1.12x + 4.82 [cite: 255]

Step 5: Estimate the Required Value

To find the weight at 4 months, substitute x = 4 into the equation[cite: 256]:

y = 1.12(4) + 4.82 = 9.3 [cite: 256]

Final Answer: At the age of 4 months, the estimated normal weight of the baby is 9.3 kg[cite: 257].

Section 3: Case Question & Solution (Medium Level)

Case Question: Intersecting Regression Lines

Two lines of regression are $x+2y=5$ and $2x+3y=8$[cite: 166]. It is given that the variance of the x-series is $\sigma_x^2=12$[cite: 169].

Requirement: Calculate the arithmetic means $\overline{x}$ and $\overline{y}$, the regression coefficients $b_{yx}$ and $b_{xy}$, the correlation coefficient $r$, and the variance of y, $\sigma_y^2$[cite: 170].

💡 Tips & Tricks for Medium Questions

  • The Intersection Property: The two regression lines always intersect at the exact point of their arithmetic means $(\overline{x}, \overline{y})$[cite: 175]. Solving the two given equations simultaneously will immediately give you the means!
  • Identifying the Lines: You usually have to guess which equation is "y on x" and which is "x on y". Express one as $y = mx + c$ and the other as $x = my + c$. The slopes are your $b_{yx}$ and $b_{xy}$.
  • The Ultimate Check: The product of the two regression coefficients must be less than or equal to 1 ($r^2 = b_{yx} \cdot b_{xy} \le 1$)[cite: 142]. If your product is greater than 1, you guessed the lines wrong and need to swap them!

Step-by-Step Solution

Step 1: Calculate the Arithmetic Means

Since the two lines of regression pass through the point $(\overline{x}, \overline{y})$, we can replace $x$ and $y$ with their means[cite: 175]:

(i) $\overline{x} + 2\overline{y} = 5$ [cite: 178]
(ii) $2\overline{x} + 3\overline{y} = 8$ [cite: 179]

Multiplying equation (i) by 2 gives $2\overline{x} + 4\overline{y} = 10$. Subtracting equation (ii) from this new equation yields $\overline{y} = 2$. Substituting $\overline{y} = 2$ back into equation (i) gives $\overline{x} + 4 = 5$, which means $\overline{x} = 1$.

Result: $\overline{x} = 1$, $\overline{y} = 2$[cite: 184].

Step 2: Find the Regression Coefficients

Let's assume the first equation is the regression line of y on x. We must arrange it in the form $y = a + bx$[cite: 69]:

$x + 2y = 5$
$2y = -x + 5$
$y = -\frac{1}{2}x + \frac{5}{2}$ [cite: 181]
Therefore, the slope is the regression coefficient: $b_{yx} = -\frac{1}{2}$[cite: 186].

Now, we assume the second equation is the regression line of x on y. We arrange it in the form $x = a + by$:

$2x + 3y = 8$
$2x = -3y + 8$
$x = -\frac{3}{2}y + \frac{8}{2}$ [cite: 182]
Therefore, the slope is the regression coefficient: $b_{xy} = -\frac{3}{2}$[cite: 187].

Check: Let's verify our assumption. $b_{yx} \cdot b_{xy} = (-\frac{1}{2}) \cdot (-\frac{3}{2}) = \frac{3}{4}$. Since $\frac{3}{4} \le 1$, our assumption for the lines was correct!

Step 3: Calculate the Correlation Coefficient ($r$)

The correlation coefficient is the geometric mean of the regression coefficients[cite: 141]. Since both regression coefficients are negative, $r$ must also be negative[cite: 143, 144].

$$r = -\sqrt{b_{yx} \cdot b_{xy}}$$ [cite: 189]
$$r = -\sqrt{(-\frac{1}{2}) \times (-\frac{3}{2})}$$ [cite: 190]
$$r = -\frac{\sqrt{3}}{2} \approx -0.87$$ [cite: 190]

Step 4: Calculate the Variance of Y ($\sigma_y^2$)

We know the formula for the regression coefficient $b_{yx}$ in terms of standard deviations[cite: 93]:

$$b_{yx} = r \frac{\sigma_y}{\sigma_x} = -\frac{1}{2}$$ [cite: 192]

Squaring both sides of the equation to use our given variance ($\sigma_x^2 = 12$)[cite: 169]:

$$r^2 \frac{\sigma_y^2}{\sigma_x^2} = \frac{1}{4}$$ [cite: 193]

Substitute $r^2 = \frac{3}{4}$ and $\sigma_x^2 = 12$ into the equation:

$$(\frac{3}{4}) \times \frac{\sigma_y^2}{12} = \frac{1}{4}$$ [cite: 194]

Solving for $\sigma_y^2$:

$$\frac{3\sigma_y^2}{48} = \frac{1}{4}$$
$$12\sigma_y^2 = 48$$
$$\sigma_y^2 = 4$$ [cite: 194]

Final Answers Summary:
  • $\overline{x} = 1$, $\overline{y} = 2$
  • $b_{yx} = -0.5$, $b_{xy} = -1.5$
  • $r = -0.87$
  • $\sigma_y^2 = 4$

Section 4: Case Question & Solution (Hard Level)

Case Question: Academic Performance Analysis

The marks secured in Economics and Statistics by 10 students are given below:

Marks in Economics (X) 25283532313629383432
Marks in Statistics (Y) 43464941363231303339

Requirements:
i) Calculate the correlation coefficient between the marks in Economics and Statistics.
ii) Find the regression equation of Y on X (Statistics on Economics).
iii) Estimate the most likely marks in Statistics when the marks in Economics are 30.

🔥 Advanced Tricks for Long Questions

  • The "Actual Mean" Trick: Before creating a massive table with X2, Y2, and XY (which results in huge numbers like 49 × 49 = 2401), quickly calculate the average (mean) of X and Y. If they are whole numbers, use the Actual Mean Method!
  • How it works: Find deviations: x = X - Mean and y = Y - Mean. Your new numbers will be tiny (like -2, 3, 0), making squaring and multiplying incredibly easy and error-free.
  • Self-Check: The sum of your deviations (Σx and Σy) MUST equal exactly 0. If they don't, you made an addition error!

Step-by-Step Solution

Step 1: Check the Means

First, let's find the sums of X and Y to check their means:

  • ΣX = 25 + 28 + 35 + ... + 32 = 320
  • ΣY = 43 + 46 + 49 + ... + 39 = 380
  • Mean of X (X̄) = 320 / 10 = 32
  • Mean of Y (Ȳ) = 380 / 10 = 38

Since both means are whole numbers, we will use the Actual Mean Method.

Step 2: Computation Table

We calculate deviations: x = X - 32 and y = Y - 38.

XY x = X - 32y = Y - 38 x2y2xy
2543-754925-35
2846-481664-32
3549311912133
324103090
3136-1-2142
36324-61636-24
2931-3-794921
38306-83664-48
34332-5425-10
323901010
Σx = 0Σy = 0 Σx2 = 140Σy2 = 398Σxy = -93

Step 3: Calculate Correlation Coefficient (r)

Using the actual mean formula:

r = Σxy / √(Σx2 × Σy2)

r = -93 / √(140 × 398)

r = -93 / √(55720)

r = -93 / 236.05

r = -0.394 (There is a low degree of negative correlation)

Step 4: Regression Equation of Y on X

First, find the regression coefficient (byx) using the actual mean formula:

byx = Σxy / Σx2 = -93 / 140 = -0.664

Now, use the regression equation formula:

Y - Ȳ = byx(X - X̄)

Y - 38 = -0.664(X - 32)

Y - 38 = -0.664X + 21.248

Y = -0.664X + 21.248 + 38

Y = 59.248 - 0.664X

Step 5: Estimate the Required Value

We need to estimate the marks in Statistics (Y) when marks in Economics (X) is 30.

Y = 59.248 - 0.664(30)

Y = 59.248 - 19.92

Y = 39.328

Final Conclusion: When a student scores 30 marks in Economics, their most likely marks in Statistics will be approximately 39.33.

Section 5: Case Question & Solution (Expert Level)

Case Question: Predicting Marks with Standard Deviation

Given the following data, calculate the mark in Mathematics obtained by a student who has secured 80 marks in English[cite: 259, 260]:

Subject Math English
Mean mark 80 [cite: 263] 64 [cite: 264]
Standard Deviation (s.d.) 3 [cite: 266] 4 [cite: 267]

The correlation coefficient between the marks of Mathematics and English is -0.40[cite: 268].

🎓 Pro Tips for Summary Data Questions

  • No Table Needed: When you are given means, standard deviations, and the correlation coefficient, do not try to build a data table. You must use the direct formula that links these three statistical measures.
  • The Golden Formula: The regression coefficient can be found directly using: b = r × (σdependent / σindependent).
  • Identify the Target: You are asked to calculate the mark in Math based on English. Therefore, Math is your dependent variable (x) and English is your independent variable (y). You need the regression equation of x on y[cite: 270, 273].

[Image of a linear regression graph showing the line of best fit and data points]

Step-by-Step Solution

Step 1: Assign Variables and List Givens

Let x and y represent the marks in Math and English respectively[cite: 270].

  • Mean of x (x̄) = 80 [cite: 271]
  • Mean of y (ȳ) = 64 [cite: 271]
  • Standard Deviation of x (σx) = 3 [cite: 271]
  • Standard Deviation of y (σy) = 4 [cite: 271]
  • Correlation Coefficient (r) = -0.40 [cite: 271]

Step 2: Calculate the Regression Coefficient of x on y (bxy)

Using the relationship between correlation and standard deviation:

bxy = r × (σx / σy) [cite: 272]

bxy = -0.40 × (3 / 4) [cite: 272]

bxy = -0.3 [cite: 272]

Step 3: Formulate the Regression Equation of x on y

The standard formula for the regression equation is:

x - x̄ = bxy(y - ȳ) [cite: 273]

Substitute the known values:

x - 80 = -0.3(y - 64) [cite: 274]

x - 80 = -0.3y + 19.2 [cite: 274]

x = -0.3y + 19.2 + 80

x = -0.3y + 99.2 [cite: 275]

Step 4: Estimate the Required Value

We are asked to find the mark in Math (x) when the student has secured 80 marks in English (y = 80)[cite: 259, 260, 276].

Substitute y = 80 into our new equation:

x = -0.3(80) + 99.2 [cite: 277]

x = -24 + 99.2

x = 75.2 [cite: 277]

Final Conclusion: The expected mark in Mathematics for a student who scores 80 in English is 75.2[cite: 278].

End of the Comprehensive Guide to NEB Grade 12 Regression and Correlation.

Comments

Popular posts from this blog

Teacher Index – Teaching Jobs & Terms and Conditions in Nepal

100 Ways to Earn Money from Nepal (2025) — Work From Home Ideas for Students & Professionals | Work From Home in Nepal: Best Online Jobs & Payment Methods (Payoneer, eSewa, Khalti)

Work From Home Nepal 2025 — 100 Proven Online Jobs, Sites & Payment Methods (Payoneer/eSewa/Khalti)