Skip to main content
  1. Grade 12 Mathematics/

Statistics

Statistics: The Logic of Correlation (~20 marks, Paper 2)
#

Statistics is worth ~20 marks in Paper 2. In Grade 12, the focus shifts from univariate data to bivariate data — the relationship between two variables.


The Big Idea: Relationships Between Variables
#

If we spend more on advertising, do we sell more products? Statistics gives you three tools to answer this:

  • Scatter Plot: A map of dots that shows if there is a pattern.
  • Correlation ($r$): A number between $-1$ and $1$ that tells us how strong the relationship is.
  • Regression: The line of “best fit” that lets us predict future outcomes.

The Correlation Coefficient ($r$)
#

$r$ valueStrengthDirectionWhat it looks like
$r = 1$PerfectPositiveAll points on an upward line
$0.8 \leq r < 1$StrongPositivePoints tightly clustered, rising
$0.5 \leq r < 0.8$ModeratePositiveVisible upward trend, some scatter
$0 < r < 0.5$WeakPositiveSlight upward trend, lots of scatter
$r = 0$NoneNo pattern at all
$-1 \leq r < 0$NegativeNegativeAs $x$ increases, $y$ decreases

💡 Correlation ≠ Causation: Just because two variables are correlated doesn’t mean one CAUSES the other. Ice cream sales and drowning rates are correlated — but ice cream doesn’t cause drowning. Both are caused by hot weather.


The Least Squares Regression Line
#

$$\hat{y} = a + bx$$

Where:

  • $b = \frac{n\sum xy - (\sum x)(\sum y)}{n\sum x^2 - (\sum x)^2}$ (gradient)
  • $a = \bar{y} - b\bar{x}$ (y-intercept)

⚠️ You don’t calculate $a$ and $b$ by hand in the exam — your calculator does it. But you MUST know the formula exists and understand what $a$ and $b$ mean.

The regression line ALWAYS passes through the point $(\bar{x};\, \bar{y})$.


Interpolation vs Extrapolation
#

TypeDefinitionReliability
InterpolationPredicting WITHIN the range of the data✅ Reliable
ExtrapolationPredicting OUTSIDE the range of the data⚠️ Unreliable — the trend may not continue

Deep Dives (click into each)
#


🚨 Common Mistakes
#

  1. Confusing correlation and causation: Strong $r$ does NOT prove one variable causes the other. State that there is “a strong positive/negative correlation” — never say “causes”.
  2. Extrapolating too far: Using the regression line to predict values far beyond the data range is unreliable.
  3. Outlier on the scatter plot: One outlier can drastically change $r$ and the regression line. Identify and comment on outliers.
  4. Calculator mode: Make sure your calculator is in STAT (regression) mode with the correct data entered. Double-check by verifying $\bar{x}$ and $\bar{y}$.
  5. Not interpreting $r$ in context: Don’t just state $r = 0.85$. Say “There is a strong positive correlation between $x$ and $y$.”

🔗 Related topics:

  • Probability — statistics and probability are complementary branches of data science

📌 Grade 11 foundation: Statistics: Standard Deviation — measures of spread for univariate data


⏮️ Euclidean Geometry | 🏠 Back to Grade 12