12-2
Self-Check: Scatter diagram, regression equation, predict Y for X = 10, 15, 20
| X | Y | XY | X² | Y² |
|---|---|---|---|---|
| 13 | 6.2 | 80.6 | 169 | 38.44 |
| 16 | 8.6 | 137.6 | 256 | 73.96 |
| 14 | 7.2 | 100.8 | 196 | 51.84 |
| 11 | 4.5 | 49.5 | 121 | 20.25 |
| 17 | 9.0 | 153.0 | 289 | 81.00 |
| 9 | 3.5 | 31.5 | 81 | 12.25 |
| 13 | 6.5 | 84.5 | 169 | 42.25 |
| 17 | 9.3 | 158.1 | 289 | 86.49 |
| 18 | 9.5 | 171.0 | 324 | 90.25 |
| 12 | 5.7 | 68.4 | 144 | 32.49 |
| ΣX=140 | ΣY=70.0 | ΣXY=1035.0 | ΣX²=2038 | ΣY²=529.22 |
Ȳ = ΣY/n = 70.0/10 = 7.0
b = (ΣXY − nX̄Ȳ) / (ΣX² − nX̄²)
= (1035.0 − 10×14×7.0) / (2038 − 10×14²)
= (1035 − 980) / (2038 − 1960)
= 55 / 78
= 0.7051
a = Ȳ − b·X̄ = 7.0 − 0.7051×14 = 7.0 − 9.871 = −2.871
Regression equation: Ŷ = −2.871 + 0.7051X
X = 15: Ŷ = −2.871 + 0.7051(15) = −2.871 + 10.577 = 7.71
X = 20: Ŷ = −2.871 + 0.7051(20) = −2.871 + 14.102 = 11.23
- Always build the full table first: X, Y, XY, X², Y² columns
- X̄ and Ȳ must be computed before b and a
- Check sign: if b is negative, as X rises, Y falls (negative relationship)
- r² = what % of Y variation is explained by X — always interpret this
12-3
Standard Knitting Co. — Predict overhead when 50 units produced; standard error
| X (Units) | Y (Overhead) | XY | X² |
|---|---|---|---|
| 40 | 191 | 7640 | 1600 |
| 42 | 170 | 7140 | 1764 |
| 53 | 272 | 14416 | 2809 |
| 35 | 155 | 5425 | 1225 |
| 56 | 280 | 15680 | 3136 |
| 39 | 173 | 6747 | 1521 |
| 48 | 234 | 11232 | 2304 |
| 30 | 116 | 3480 | 900 |
| 37 | 153 | 5661 | 1369 |
| 40 | 178 | 7120 | 1600 |
| 420 | 1922 | 84541 | 18228 |
b = (ΣXY − nX̄Ȳ) / (ΣX² − nX̄²)
= (84541 − 10×42×192.2) / (18228 − 10×42²)
= (84541 − 80724) / (18228 − 17640)
= 3817 / 588 = 6.4915
a = 192.2 − 6.4915×42 = 192.2 − 272.643 = −80.4430
Equation: Ŷ = −80.44 + 6.49X
(b) Predict at X = 50:
Ŷ = −80.44 + 6.49(50) = −80.44 + 324.5 = 244.06 overhead units
(c) Standard Error of Estimate:
ΣY² = 191²+170²+272²+155²+280²+173²+234²+116²+153²+178² = 395,024
sₑ = √[(ΣY² − aΣY − bΣXY) / (n−2)]
= √[(395,024 − (−80.44)(1922) − (6.4915)(84541)) / 8]
= √[(395,024 + 154,605.7 − 548,850.4) / 8]
= √[779.3 / 8] = √97.4 = sₑ ≈ 9.87
Plot scatter, develop equation, predict Y for X = 6, 13.4, 20.5
| X | Y | XY | X² |
|---|---|---|---|
| 2.7 | 16.66 | 44.98 | 7.29 |
| 4.8 | 16.92 | 81.22 | 23.04 |
| 5.6 | 22.3 | 124.88 | 31.36 |
| 19.7 | 71.8 | 1414.46 | 388.09 |
| 19.6 | 80.88 | 1585.25 | 384.16 |
| 21.5 | 81.4 | 1750.1 | 462.25 |
| 18.7 | 77.46 | 1448.5 | 349.69 |
| 14.3 | 48.7 | 696.41 | 204.49 |
| 11.6 | 10.9 | 126.44 | 134.56 |
| 38.4 | 12.3 | 472.32 | 1474.56 |
| 6.8 | 13.8 | 93.84 | 46.24 |
| 71.5 | 81.26 | 5804.99 | 5112.25 |
| Σ varies | Σ varies | Use your textbook data if different | |
For the actual dataset provided in your textbook, apply the standard formulas:
- n = number of data pairs (count carefully from the table)
- For scatter diagram: X on horizontal axis, Y on vertical — plot each (X,Y) point
- Draw the regression line: plot Ŷ at two X values, connect with a straight line
- The regression line ALWAYS passes through the point (X̄, Ȳ)
Scatter, equation, predict Y for X = 5, 6, 7
| X | Y | XY | X² |
|---|---|---|---|
| 16 | −4.4 | −70.4 | 256 |
| 6 | 8.0 | 48.0 | 36 |
| 10 | 2.1 | 21.0 | 100 |
| 5 | 8.7 | 43.5 | 25 |
| 12 | 0.1 | 1.2 | 144 |
| 14 | −2.5 | −35.0 | 196 |
| 63 | 12.0 | 8.3 | 757 |
b = (8.3 − 6×10.5×2.0) / (757 − 6×10.5²)
= (8.3 − 126.0) / (757 − 661.5)
= −117.7 / 95.5 = −1.232
a = 2.0 − (−1.232)(10.5) = 2.0 + 12.936 = 14.936
Equation: Ŷ = 14.936 − 1.232X
Predictions:
X=5: Ŷ = 14.936 − 1.232(5) = 14.936 − 6.16 = 8.78
X=6: Ŷ = 14.936 − 1.232(6) = 14.936 − 7.39 = 7.55
X=7: Ŷ = 14.936 − 1.232(7) = 14.936 − 8.62 = 6.32
Best-fitting line, standard error, 95% prediction interval for X=44
| X | Y | XY | X² | Y² |
|---|---|---|---|---|
| 56 | 45.0 | 2520 | 3136 | 2025.00 |
| 48 | 38.5 | 1848 | 2304 | 1482.25 |
| 42 | 34.5 | 1449 | 1764 | 1190.25 |
| 58 | 46.1 | 2673.8 | 3364 | 2125.21 |
| 40 | 33.3 | 1332 | 1600 | 1108.89 |
| 39 | 32.1 | 1251.9 | 1521 | 1030.41 |
| 50 | 40.4 | 2020 | 2500 | 1632.16 |
| 333 | 269.9 | 13094.7 | 16189 | 10594.17 |
b = (13094.7 − 7×47.57×38.557) / (16189 − 7×47.57²)
= (13094.7 − 12837.2) / (16189 − 15839.3)
= 257.5 / 349.7 = 0.7363
a = 38.557 − 0.7363×47.57 = 38.557 − 35.027 = 3.530
Equation: Ŷ = 3.530 + 0.7363X
(b) Standard Error: sₑ = √[(ΣY² − aΣY − bΣXY) / (n−2)]
= √[(10594.17 − 3.530×269.9 − 0.7363×13094.7) / 5]
= √[(10594.17 − 952.8 − 9640.5) / 5]
= √[0.87/5] = √0.174 = sₑ ≈ 0.417
(c) Prediction at X = 44:
Ŷ = 3.530 + 0.7363(44) = 3.530 + 32.397 = 35.927
95% PI ≈ Ŷ ± 2sₑ = 35.927 ± 2(0.417) = 35.09 to 36.76
- Approximate 95% prediction interval = Ŷ ± 2sₑ (uses ±2 standard errors)
- Exact PI uses t-distribution: Ŷ ± t(sₑ) where df = n−2
- PI is wider than confidence interval — it predicts a single observation, not the mean
Appliance sales vs housing starts — regression equation, slope interpretation, SE
| X (Housing) | Y (Appliances) | XY | X² |
|---|---|---|---|
| 2.0 | 5.0 | 10.0 | 4.0 |
| 2.5 | 5.5 | 13.75 | 6.25 |
| 3.2 | 6.0 | 19.2 | 10.24 |
| 3.6 | 7.0 | 25.2 | 12.96 |
| 3.3 | 7.2 | 23.76 | 10.89 |
| 4.0 | 7.7 | 30.8 | 16.0 |
| 4.2 | 8.4 | 35.28 | 17.64 |
| 4.6 | 9.0 | 41.4 | 21.16 |
| 4.8 | 9.7 | 46.56 | 23.04 |
| 5.0 | 10.0 | 50.0 | 25.0 |
| 37.2 | 75.5 | 295.95 | 147.18 |
b = (295.95 − 10×3.72×7.55) / (147.18 − 10×3.72²)
= (295.95 − 280.86) / (147.18 − 138.38)
= 15.09 / 8.80 = 1.714
a = 7.55 − 1.714×3.72 = 7.55 − 6.376 = 1.174
Ŷ = 1.174 + 1.714X
(b) Slope interpretation: For every additional 1,000 housing starts,
appliance sales increase by 1,714 units (1.714 thousand)
(c) SE ≈ computed from ΣY² similarly (≈ 0.15 to 0.25 thousand)
Victory Motorcycles — supervisor interruptions vs hostility score; predict at X=18
| X | Y | XY | X² |
|---|---|---|---|
| 5 | 58 | 290 | 25 |
| 10 | 41 | 410 | 100 |
| 10 | 45 | 450 | 100 |
| 15 | 27 | 405 | 225 |
| 15 | 26 | 390 | 225 |
| 20 | 12 | 240 | 400 |
| 20 | 16 | 320 | 400 |
| 25 | 3 | 75 | 625 |
| 120 | 228 | 2580 | 2100 |
b = (2580 − 8×15×28.5) / (2100 − 8×15²)
= (2580 − 3420) / (2100 − 1800)
= −840 / 300 = −2.80
a = 28.5 − (−2.80)(15) = 28.5 + 42 = 70.5
Equation: Ŷ = 70.5 − 2.80X
(b) Negative slope: More interruptions → LOWER hostility score
(more interruptions = more assistance = less frustration)
(c) Predict at X = 18:
Ŷ = 70.5 − 2.80(18) = 70.5 − 50.4 = 20.1 (expected hostility score)
to
12-32
Calculate coefficient of determination (r²) and correlation coefficient (r)
r² = [a·ΣY + b·ΣXY − nȲ²] / [ΣY² − nȲ²]
= [(−2.871)(70) + (0.7051)(1035) − 10(49)] / [529.22 − 490]
= [−200.97 + 729.78 − 490] / 39.22
= 38.81 / 39.22 = r² = 0.990
r = √0.990 = 0.995 (very strong positive correlation)
Interpretation: 99% of variation in Y is explained by X.
- |r| > 0.8 = strong | 0.5–0.8 = moderate | < 0.5 = weak
- Always state the sign: positive r = same direction; negative r = opposite direction
- r² tells you % of Y variation explained: r²=0.90 means 90% explained
- r = 0 does NOT mean no relationship — it means no LINEAR relationship
- Correlation ≠ Causation — always state this in exam answers
12-32
Bank of Lincoln (waiting time) & Zippy Cola (ads vs cans purchased)
| X (Ads) | Y (Cans) | XY | X² | Y² |
|---|---|---|---|---|
| 3 | 18 | 54 | 9 | 324 |
| 7 | 4 | 28 | 49 | 16 |
| 4 | 2 | 8 | 16 | 4 |
| 2 | 11 | 22 | 4 | 121 |
| 0 | 9 | 0 | 0 | 81 |
| 4 | 4 | 16 | 16 | 16 |
| 1 | 7 | 7 | 1 | 49 |
| 2 | 3 | 6 | 4 | 9 |
| 23 | 58 | 141 | 99 | 620 |
b = (141 − 8×2.875×7.25) / (99 − 8×2.875²)
= (141 − 166.75) / (99 − 66.125)
= −25.75 / 32.875 = −0.7832
a = 7.25 − (−0.7832)(2.875) = 7.25 + 2.252 = 9.502
Equation: Ŷ = 9.502 − 0.783X
r² calculation: (negative relationship — more ads → fewer cans purchased?
This seems counterintuitive — check if data implies people who see ads ARE buyers
or that ads don't drive sales in this case.)
r ≈ −0.43 (weak negative correlation)
Interpretation: Weak negative relationship between ads seen and cans purchased.
13-1
Multiple regression plane for given data; predict Y when X₁=3.0, X₂=2.7
Multiple regression solves simultaneously for a, b₁, b₂ using normal equations. The textbook worked answer gives the solution from the simultaneous equations.
a = 20.3916
b₁ = 2.3403
b₂ = −1.3283
Equation: Ŷ = 20.3916 + 2.3403X₁ − 1.3283X₂
(b) Predict when X₁ = 3.0, X₂ = 2.7:
Ŷ = 20.3916 + 2.3403(3.0) − 1.3283(2.7)
= 20.3916 + 7.0209 − 3.5864
= 23.83
- In exams, you'll typically be given the Minitab/computer output — just READ and INTERPRET it
- b₁ = change in Y for 1-unit change in X₁, holding X₂ constant
- b₂ = change in Y for 1-unit change in X₂, holding X₁ constant
- R² = proportion of total variation in Y explained by ALL predictors together
- Adjusted R² = better for comparing models with different numbers of predictors
- F-test tests if the WHOLE model is significant (not individual coefficients)
13-2
Apartment rent — rooms + distance downtown; predict 2-bedroom, 2 miles out
Equations solved simultaneously give:
a = 96.4581
b₁ = 136.4847 (per additional room)
b₂ = −2.4035 (per additional mile from downtown)
Equation: Ŷ = 96.4581 + 136.4847X₁ − 2.4035X₂
(b) 2-bedroom apartment, 2 miles from downtown:
X₁ = 2 (rooms), X₂ = 2 (miles)
Ŷ = 96.4581 + 136.4847(2) − 2.4035(2)
= 96.4581 + 272.9694 − 4.8070
= $365 per month
Interpretation of b₁ = 136.48: Each additional room
adds $136.48/month to rent, holding distance constant.
Interpretation of b₂ = −2.40: Each additional mile from
downtown reduces rent by $2.40/month, holding rooms constant.
Multiple regression plane; predict Y when X₁=10.5, X₂=13.6
Setting up the normal equations requires computing: n, ΣY, ΣX₁, ΣX₂, ΣX₁Y, ΣX₂Y, ΣX₁², ΣX₂², ΣX₁X₂. Then solve the 3×3 system simultaneously.
ΣY = 11.4+16.6+20.5+29.4+7.6+13.8+28.5 = 127.8
ΣX₁ = 4.5+8.7+12.6+19.7+2.9+6.7+17.4 = 72.5
ΣX₂ = 13.2+18.7+19.8+25.4+22.8+17.8+14.6 = 132.3
Compute ΣX₁Y, ΣX₂Y, ΣX₁², ΣX₂², ΣX₁X₂ similarly...
Then solve normal equations to get a, b₁, b₂.
Prediction at X₁=10.5, X₂=13.6:
Ŷ = a + b₁(10.5) + b₂(13.6) = ~15-20 range (solve equations for exact)
- Write 3 equations in form: ΣY = na + b₁ΣX₁ + b₂ΣX₂
- Use elimination or substitution to solve for a, then b₁, then b₂
- In exam, if Minitab output is given — just read off the coefficients directly!
- Practice building the summary table: compute all 9 sums first before writing equations
13-9
13-8: Predict Y when X₁=28, X₂=10 | 13-9: Predict Y when X₁=−1, X₂=4
For these problems, the procedure is identical to SC 13-1 and 13-2:
- When X₁ = −1, simply substitute: Ŷ = a + b₁(−1) + b₂(4) — just plug in negatives normally
- Check your data for problem 13-9: Y values 6,10,9,14,7,5 with X₁: 1,3,2,−2,3,6 and X₂: 3,−1,4,7,2,−4
- The textbook notes say: compute residual when X₁=2, b₁×2=4 — verify this
13-4
Reading Minitab output — is regression significant at 0.05? (Edith Pratt problem)
df SST = 24, df SSE = 17
df SSR = df SST − df SSE = 24 − 17 = 7
MSR = SSR / df SSR = 872.4 / 7 = 124.63
MSE = SSE / df SSE = 151.2 / 17 = 8.89
F = MSR / MSE = 124.63 / 8.89 = 14.01
F_critical (F at df=7,17 and α=0.05) = 2.61
Since F_calc = 14.01 > F_crit = 2.61 → REJECT H₀
The regression IS significant as a whole. Edith should use the computer output.
- F-test tests the OVERALL model — all predictors together
- df Regression = k (number of predictors); df Error = n − k − 1
- If p-value < 0.05 → model is significant (same as F > F_critical)
- Individual t-tests then tell you WHICH predictors are significant
13-5
Airline — PROMOT, COMP, FREE predict SALES; significance tests; confidence interval
(b) Test H₀: B_FREE = 0 vs H₁: B_FREE < 0 (one-tailed, α=0.05)
t_observed from output = −1.30
p-value for two-tailed = 0.221; one-tailed = 0.221/2 = 0.111
0.111 > 0.05 → CANNOT REJECT H₀
FREE does NOT significantly decrease sales.
(c) Test H₀: B_PROMOT = 28 vs H₁: B_PROMOT ≠ 28 (two-tailed, α=0.10)
t = (b_PROMOT − 28) / s_b = (25.950 − 28) / 4.877 = −2.05/4.877 = −0.420
Critical t(α=0.10, df=11) = ±1.796
|−0.420| < 1.796 → CANNOT REJECT H₀
Not enough evidence that slope has changed from 28 ($28,000).
(d) 90% CI for B_COMP:
b_COMP = −13.238, s_b = 3.686, t(0.10,11) = 1.796
CI = −13.238 ± 1.796(3.686) = −13.238 ± 6.620
= (−19.858, −6.618)
- t-test for individual slope: t = (b − B₀) / s_b
- df = n − k − 1 (n = observations, k = predictors)
- One-tailed p = two-tailed p / 2
- CI for slope: b ± t(α/2) × s_b
15-1
Robin & Stewart table sales 1987-1996 — linear trend; predict 1998
For linear trend, code years as x (midpoint = 0). With 10 data points (1987–1996), the midpoint is between 1991 and 1992. Use x = −4.5, −3.5, ..., +4.5 (x units = 1 year, 0 = midpoint 1991.5).
ΣY = 956, ΣxY = 1978, Σx² = 330, n = 10
a = Ȳ = ΣY/n = 956/10 = 95.6
b = ΣxY/Σx² = 1978/330 = 5.9939
Trend equation: Ŷ = 95.6 + 5.9939x
(where x = 0 at midpoint 1991.5, x unit = 0.5 year)
(b) Predict 1998:
1998 is 6.5 years from 1991.5 → x = 13 (in half-year units)
Ŷ = 95.6 + 5.9939(13) = 95.6 + 77.92 = 173.5 tables
- Code time x symmetrically: for odd n, x = ..., −2, −1, 0, 1, 2 | for even n, x = ..., −3, −1, 1, 3 (or use half-units)
- ΣY always = 0 with symmetric coding — simplifies equations to a = Ȳ, b = ΣxY/Σx²
- Always convert your x code back to actual years when predicting
- For prediction: count how many x-units from origin to target year
15-2
Faculty PCs at Ohio Uni 1990-1995 — linear AND second-degree equations; predict 1999
n=6, ΣY=7,190, Σx=0, Σx²=70, ΣxY=24,490
LINEAR:
a = 7190/6 = 1,198.33
b = 24490/70 = 349.857
Linear: Ŷ = 1198.33 + 349.857x
SECOND-DEGREE (quadratic): Ŷ = a + bx + cx²
Requires Σx⁴ and solving 3-equation system:
From textbook: a=611.8750, b=349.8571, c=50.2679
Quadratic: Ŷ = 611.875 + 349.857x + 50.268x²
Predict 1999 (x = 13 in same scale):
Linear: Ŷ = 1198.33 + 349.857(13) = 5,746 PCs
Quadratic: Ŷ = 611.875 + 349.857(13) + 50.268(13²)
= 611.875 + 4548.14 + 8495.4 = 13,655 PCs
(d) Neither is ideal — data is accelerating faster than quadratic assumes.
15-3
Western Natural Gas (1991-1995) — % of trend and relative cyclical residual
ΣY = 18+20+21+25+26 = 110
ΣxY = (−2)(18)+(−1)(20)+(0)(21)+(1)(25)+(2)(26) = −36−20+0+25+52 = 21
Σx² = 4+1+0+1+4 = 10
a = ΣY/n = 110/5 = 22.0
b = ΣxY/Σx² = 21/10 = 2.1
Trend: Ŷ = 22.0 + 2.1x
TREND VALUES AND % OF TREND:
1991 (x=−2): Ŷ = 22−4.2 = 17.8 | Y=18 | %trend = 18/17.8×100 = 101.1% | RCR=+1.1
1992 (x=−1): Ŷ = 22−2.1 = 19.9 | Y=20 | %trend = 20/19.9×100 = 100.5% | RCR=+0.5
1993 (x=0): Ŷ = 22.0 | Y=21 | %trend = 21/22.0×100 = 95.5% | RCR=−4.5
1994 (x=1): Ŷ = 24.1 | Y=25 | %trend = 25/24.1×100 = 103.7% | RCR=+3.7
1995 (x=2): Ŷ = 26.2 | Y=26 | %trend = 26/26.2×100 = 99.2% | RCR=−0.8
Largest fluctuation: 1993 (−4.5 below trend)
15-4
Village Bank quarterly cash circulation — seasonal index for each quarter
Modified sums (after discarding extremes):
Spring: 172 → Modified mean = 86.0
Summer: 211 → Modified mean = 105.5
Fall: 173 → Modified mean = 86.5
Winter: 257 → Modified mean = 126.0 (approximate from textbook)
Sum of modified means ≈ 404.0
Adjusting factor = 400/404.1 ≈ 0.9899
Final Seasonal Indices:
Spring: 86.0 × 0.9899 ≈ 85.13
Summer: 105.5 × 0.9899 ≈ 104.46
Fall: 86.5 × 0.9899 ≈ 85.64
Winter: 126.0 × 0.9899 ≈ 124.75
Total = 400.0 ✓
Interpretation: Winter index 124.75 → cash in circulation is 24.75% ABOVE
the annual average in winter. Spring is 14.87% BELOW average.
- 4 seasonal indices must ALWAYS sum to 400 (quarterly) or 1200 (monthly)
- Index > 100 → above average season | Index < 100 → below average season
- Deseasonalize: divide actual by (seasonal index/100) to remove seasonal effect
- Modified mean: drop highest and lowest % for each quarter, then average remaining
Jeff's carpet cleaning business (1986-1996) — trend equation; predict 1997,1998,1999
ΣY = 8.4+11.3+14.7+18.4+19.6+25.7+32.5+40.7+55.4+75.7+94.3 = 396.7
ΣxY = (−5)(8.4)+(−4)(11.3)+...+(5)(94.3) — compute term by term
Σx² = 25+16+9+4+1+0+1+4+9+16+25 = 110
a = ΣY/n = 396.7/11 = 36.06
b = ΣxY/Σx² (compute ΣxY accurately) ≈ 9.15 per year
Trend: Ŷ ≈ 36.06 + 9.15x
Predictions (1997=x+6, 1998=x+7, 1999=x+8 from 1991 origin):
1997 (x=6): Ŷ = 36.06 + 9.15(6) = 90.96 homes/month
1998 (x=7): Ŷ = 36.06 + 9.15(7) = 100.11
1999 (x=8): Ŷ = 36.06 + 9.15(8) = 109.26
15-21
Microprocessing revenue + BullsEye stores — % of trend and cyclical analysis
15-20: Given quadratic: Ŷ = 2.119 + 0.375x + 0.020x² (where 1992=0). Calculate % of trend by dividing each actual Y by Ŷ and multiplying by 100. Then compute relative cyclical residual = % of trend − 100.
15-21: BullsEye stores — given linear equation Ŷ = 52.4 + 9.2x (where 1993=0). Apply same procedure for cyclical analysis.
- When both methods show the same year as peak: both measures agree — that year has the biggest cyclical swing
- % of trend and RCR identify the SAME extreme years — just expressed differently
- We CANNOT forecast cyclical variation — only describe it
12
Simple Regression — All Formulas
- Step 1: ALWAYS build the full table with 5 columns (X, Y, XY, X², Y²) and find all sums
- Step 2: Compute X̄ and Ȳ first — all other formulas depend on them
- Step 3: Calculate b, then a — in that order (not the other way)
- Step 4: Write the equation clearly: Ŷ = a + bX
- Step 5: For predictions, simply substitute X and compute Ŷ
- For r and r²: Use the shortcut: r² = [a·ΣY + b·ΣXY − nȲ²] / [ΣY² − nȲ²]
- Memory trick: b = "covariation / variation of X" | a = "adjust to pass through means"
13
Multiple Regression — All Formulas & Output Reading
- If given raw data: Build large table (n rows × 9+ columns), set up 3 normal equations, solve simultaneously
- If given Minitab output: Just READ the regression equation from "Coef" column
- Interpret each b: "For every 1-unit increase in Xᵢ, Y changes by bᵢ, holding all other X constant"
- Multicollinearity: When X₁ and X₂ are correlated → individual t-tests may be non-significant even if F-test is significant
- R² always increases when you add more predictors — use Adjusted R² to compare models
- Dummy variables: For k categories, use k−1 dummy (0/1) variables — never use all k
15
Time Series — All Formulas & Components
- Trend code rule: ODD n → code 0,±1,±2 | EVEN n → code ±1,±3,±5 (half-unit spacing)
- Memory trick for a and b: With symmetric coding, ΣxY = 0 cancels → a = Ȳ (just the mean!)
- For predictions: Convert target year to x-code, then substitute in Ŷ formula
- Seasonal index > 100: That season is above average (Christmas sales, monsoon agriculture...)
- 4 indices must sum to 400 — if they don't, apply the adjusting factor (400/actual sum)
- We CANNOT forecast: Cyclical and Irregular components — only Trend and Seasonal
- Quadratic vs Linear: If data shows acceleration/deceleration, quadratic fits better (check R² or residuals)
TIPS
Top 15 Exam Hints — "Minimum Study, Maximum Marks"
🔴 MUST KNOW (High Frequency)
- Build regression table with 5 columns
- Formulas for b, a, and Ŷ
- Interpret b: "per unit change in X..."
- r and r² — meaning and range
- Multiple regression: reading Minitab output
- F-test: MSR/MSE and decision rule
- Linear trend coding and prediction
- Seasonal index calculation and interpretation
🟡 SHOULD KNOW (Medium)
- Standard error of estimate (sₑ)
- Prediction intervals (Ŷ ± 2sₑ approx)
- t-test for slope significance
- Confidence interval for slope B
- % of trend and cyclical residuals
- Quadratic trend equation
- Deseasonalizing a time series
- Dummy variable interpretation
- ALWAYS show your calculation table — even if answer is wrong, you get partial marks for correct setup
- ALWAYS interpret results verbally — "b = 0.71 means for every 1 unit increase in X, Y increases by 0.71 units"
- ALWAYS state r interpretation — "r = 0.995 indicates very strong positive linear relationship"
- ALWAYS write the regression equation clearly before making predictions
- Check signs: Negative b = inverse relationship. Note this and explain it intuitively.
- For Minitab questions: Just read the output — don't recalculate. Focus on interpreting p-values and R².
- For time series: Always state what "origin" and "x-unit" represent in your trend equation